Input Encoding

Without exception, a well-formed mdoc document consists only of ASCII printable characters, the space character, the newline character, and in some cases the tab character. Most modern formatters allow for CR+LF newlines \r\n, but this is not portable. Modern formatters also accomodate for unlimit to line length; this is not necessarily the case for legacy formatters.

Unilaterally, the backslash \ is always interpreted as the beginning of an escape sequence. If an escape precedes a newline, it escapes the current line:

.Em This is considered one \
line of input.

Macro Line

Formally speaking, a macro line is one beginning with a control character. In mdoc, this is traditionally the . character, although historical documents may also use the ' character. This notation extends back to the historical RUNOFF utility.

Control Words:

Input generally consists of English text, 360 or fewer characters to a line. Control words must begin a new line, and begin with a period so that they may be distinguished from other text. RUNOFF does not print the control words.

A line with only a control character followed by zero or more whitespace characters is stripped from input.

A macro line may, in some circumstances, contain more macros. The first macro — the one following the control character — may then be distinguished as the line macro.

On macro lines the following non-alphanumeric characters are syntactically meaningful as follows. These characters are collectively called reserved characters.

! punctuation
" control character (quotation)
( punctuation
) punctuation
, punctuation
- control character (macro argument)
. punctuation
: punctuation
; punctuation
? punctuation
[ punctuation
\ control character (escape sequence)
] punctuation
| punctuation

To pass these characters along as literal text, you must either escape or quote them.

If an unescaped space character is encountered on a macro line, it is used to delimit macros, macro arguments, and flags. Multiple consecutive space characters have no effect on output.

.Em Hello,     World
.Em Hello, World

The spaces between Hello, and world delimit arguments in this case, and produce the same output of Hello, World without extra spaces.

Text Line

A text line is any line not beginning with a control character. Text lines are never parsed for macros and may consist of printable ASCII character. Text lines are concatenated together when forming output, so unless in certain circumstances, newlines are stripped from input. Using a blank text line as a vertical separator is not portable.

If a space character is encountered on a text line, it is reproduced verbatim in the output.

Hello,     World
Hello, World

The spaces between Hello, and world will be reproduced in both cases as-is. However, it is considered non-portable to use spaces on a text-line to shape output: HTML, for example, by default collapses whitespace. Secondly, consider whether controlled spacing between text in an otherwise free-form text sequence is appropriate. In most space-retaining cases, such as in source code examples, you're better off with a literal display mode such as covered at the end of this section.

Do not use the space-retaining feature to create double-spaces following a sentential period! See Sentential Punctuation for how to do this properly.

If the first letter of a text line is a space character, the output line shall be preceded by a newline. This creates the effect of an implicit literal display.

Hello, World.
  The newline, leading spaces, and in-line    spacing are retained.
This is free-form text.

The portability of this behaviour is unknown. For greater portability (and semantic annotation), a literal display mode should be opened instead with, for example, the Bd literal:

Hello, World.
.Bd -literal -compact
  The newline and leading spaces are retained.
.Ed
While this is not.

In this example, the compact flag prevents leading vertical space. To effect a vertical space following the literal display, use a Pp.

Consider the following example:
.Bd -literal
int a_function(int *foo, int bar) {
    *foo += bar;
}
.Ed
.Pp
This is subsequent text.

Last edited by $Author: kristaps $ on $Date: 2011/12/25 14:44:21 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.