Negative Space
A typography editorial. Every gap, tab, and zero-width character below is deliberate.
Attack 01 · ASCII art preservation
Five-line ASCII frame inside <pre>. Pipes and pluses must align column-for-column or it collapses to a smear.
+-----------+ | N E G | | A T I | | V E | +-----------+
Attack 02 · Python 4-space indentation
Indentation is the language. If the parser collapses leading runs, the function body merges into the def line and the meaning is destroyed.
def negative_space(text):
if not text:
return None
for ch in text:
if ch.isspace():
yield ch
return text.strip()
Attack 03 · C with TAB indentation
Tab characters (U+0009) used for indentation. Tabs are \s-class, so the collapse regex eats them along with spaces.
int main(int argc, char **argv) {
for (int i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
Attack 04 · Shell prompt (pre + code)
Progressive indentation; if collapsed every line begins flush-left.
$ npx tsx convert-to-atlas-blocks.ts
reading index.html...
parsing JSDOM tree...
collapsing whitespace...
loss detected.
Attack 05 · Inline code with significant edge spaces
Compare leading-spaces against trailing-spaces and both-sides . The leading and trailing runs carry meaning (they appear inside <code> elements).
Attack 06 · Textarea with embedded newlines
HTML textareas preserve their literal contents. The form below is a fictional submission box; the placeholder text contains line breaks.
Attack 07 · NBSP soup
The phrase below contains five non-breaking spaces between the two words. They are not ASCII space (U+0020); they are U+00A0 and must survive: word word.
Attack 08 · Mixed line endings (CRLF / CR / LF)
The block below was authored on three different operating systems and concatenated.
Line ending marker: CRLF Line ending marker: CR-only Line ending marker: LF Final line no terminator
Attack 09 · Invisible characters
Zero-width space splits a word: splitwordagain. Word-joiner glues a number: 1000000. Soft-hyphen permits hyphenation: extraordinarilylongword.
Hair-thin-em spaces in sequence: hair[ ]thin[ ]em[ ].
Tabs mid-paragraph: before after double-tab-after.
Attack 10 · Combining diacritics
Decomposed forms (NFD): café señor vieṭnamese. Lonely combining mark at start: ́lonely-mark-at-start.
Attack 11 · Bidirectional controls
RTL embedding: english RTL-CONTENT-HERE more english.
Attack 12 · ZWJ emoji families & ZWNJ
A four-person family glyph is built from four codepoints joined by ZWJ: 👨👩👧👦. Without the joiners, you would see four separate figures.
Zero-width non-joiner: letterjoined.
Attack 13 · CSS white-space property on a div
This is a plain <div> with white-space: pre applied via the stylesheet. In a real browser the leading spaces and newlines render verbatim; the parser does not honour the CSS rule.
And here, white-space: pre-wrap:
And white-space: pre-line:
Attack 14 · Significant whitespace between inline elements
Between the two emphasised tokens there are exactly two ASCII spaces, not one: word x y word. A naive collapse merges them into a single space.
Between bold and italic with a tab character: bold italic.
Attack 15 · Empty / whitespace-only text nodes
Three siblings follow this paragraph: a paragraph, a whitespace-only text node containing only spaces and a tab, and another paragraph. The whitespace-only node should be ignored unless wrapped in a pre.
First sibling.
Last sibling.
Attack 16 · Pre containing inline children
A <pre> whose immediate children include a <code> element AND a <span>. The structural layout matters: the syntax-highlighter would normally wrap each token in a span, and the parser must preserve all the inter-token whitespace.
function trap(x) {
return x.trim();
}