Atlas Example — Whitespace Trap – Atlas WordPress Test Environment

Attack 01 · ASCII art preservation

Five-line ASCII frame inside <pre>. Pipes and pluses must align column-for-column or it collapses to a smear.

+-----------+
|  N E G    |
|   A T I   |
|    V E    |
+-----------+

Attack 02 · Python 4-space indentation

Indentation is the language. If the parser collapses leading runs, the function body merges into the def line and the meaning is destroyed.

def negative_space(text):
    if not text:
        return None
    for ch in text:
        if ch.isspace():
            yield ch
    return text.strip()

Attack 03 · C with TAB indentation

Tab characters (U+0009) used for indentation. Tabs are \s-class, so the collapse regex eats them along with spaces.

int main(int argc, char **argv) {
	for (int i = 0; i < argc; i++) {
		printf("%s\n", argv[i]);
	}
	return 0;
}

Attack 04 · Shell prompt (pre + code)

Progressive indentation; if collapsed every line begins flush-left.

$ npx tsx convert-to-atlas-blocks.ts
  reading index.html...
    parsing JSDOM tree...
      collapsing whitespace...
        loss detected.

Attack 05 · Inline code with significant edge spaces

Compare leading-spaces against trailing-spaces and both-sides . The leading and trailing runs carry meaning (they appear inside <code> elements).

Attack 06 · Textarea with embedded newlines

HTML textareas preserve their literal contents. The form below is a fictional submission box; the placeholder text contains line breaks.

Attack 07 · NBSP soup

The phrase below contains five non-breaking spaces between the two words. They are not ASCII space (U+0020); they are U+00A0 and must survive: word word.

Attack 08 · Mixed line endings (CRLF / CR / LF)

The block below was authored on three different operating systems and concatenated.

Line ending marker: CRLF
Line ending marker: CR-only
Line ending marker: LF
Final line no terminator

Attack 09 · Invisible characters

Zero-width space splits a word: splitwordagain. Word-joiner glues a number: 1⁠000⁠000. Soft-hyphen permits hyphenation: extraordinarilylongword.

Hair-thin-em spaces in sequence: hair[ ]thin[ ]em[ ].

Tabs mid-paragraph: before after double-tab-after.

Attack 10 · Combining diacritics

Decomposed forms (NFD): café señor vieṭnamese. Lonely combining mark at start: ́lonely-mark-at-start.

Attack 11 · Bidirectional controls

RTL embedding: english ‫RTL-CONTENT-HERE‬ more english.

Attack 12 · ZWJ emoji families & ZWNJ

A four-person family glyph is built from four codepoints joined by ZWJ: 👨‍👩‍👧‍👦. Without the joiners, you would see four separate figures.

Zero-width non-joiner: letter‌joined.

Attack 13 · CSS white-space property on a div

This is a plain <div> with white-space: pre applied via the stylesheet. In a real browser the leading spaces and newlines render verbatim; the parser does not honour the CSS rule.

Four leading spaces. Eight leading spaces. Twelve leading spaces. Sixteen.

And here, white-space: pre-wrap:

pre-wrap preserves runs but wraps on overflow.

And white-space: pre-line:

pre-line collapses spaces but preserves line breaks.

Attack 14 · Significant whitespace between inline elements

Between the two emphasised tokens there are exactly two ASCII spaces, not one: word x y word. A naive collapse merges them into a single space.

Between bold and italic with a tab character: bold italic.

Attack 15 · Empty / whitespace-only text nodes

Three siblings follow this paragraph: a paragraph, a whitespace-only text node containing only spaces and a tab, and another paragraph. The whitespace-only node should be ignored unless wrapped in a pre.

First sibling.

Last sibling.

Attack 16 · Pre containing inline children

A <pre> whose immediate children include a <code> element AND a <span>. The structural layout matters: the syntax-highlighter would normally wrap each token in a span, and the parser must preserve all the inter-token whitespace.

function	trap(x) {
	return x.trim();
}