Next: , Previous: , Up: Text   [Contents][Index]


5.1.9 Input Format

Organize input to GNU troff into lines separated by the Unix newline character (U+000A), using the character encoding it recognizes: ISO Latin-1 (8859-1). A document encoded in ISO 646:1991 IRV (US-ASCII), or, equivalently, uses only code points from the “C0 Controls” and “Basic Latin” parts of the Unicode character set is also a valid ISO Latin-1 document; the standards are interchangeable in their first 128 code points.34

Some control characters (from the sets “C0 Controls” and “C1 Controls” as Unicode describes them) are invalid as input characters. GNU troff discards them upon reading.35 It processes a character sequence “foo”, followed by an invalid character and then “bar”, as “foobar”.

Invalid input characters comprise 0x00, 0x0B, 0x0D0x1F, and 0x800x9F.36 GNU troff uses some of these code points for internal purposes, making non-trivial the extension of the program to accept UTF-8 or other encodings that use characters from these ranges.