Preface

A utility without a manual is of no utility at all.

This is a guide for writing UNIX manuals in the mdoc language. If you're new to writing UNIX manuals, or you want to learn about best practises for high-quality manuals, this book may benefit your work.

To those unfamiliar with UNIX, mdoc is a language for documenting utilities, programming functions, file and wire formats, hardware device interfaces, and so on. By a language I mean a structured, machine-readable document format such as HTML, the primary language of web pages; or RTF, used by word processors. man is the utility for querying documents in mdoc and other languages, collectively called man pages.

The following, for example, is a fragment of man output for the cat command.

NAME

cat — concatenate and print files

SYNOPSIS

cat	[-benstuv] [file ...]

DESCRIPTION

The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (‘-') or absent, cat reads from the standard input.

Why mdoc? After all, there are plenty of other UNIX manual languages out there, from the historical man to DocBook. In short, mdoc is:

portable, as any modern UNIX system can format it without needing clumsy toolchains;
expressive, capturing the semantic content of manpages instead of just presentation cues;
concise, making line-based source control painless; and
well-documented, well-supported, and actively maintained by a community of knowledgable developers.

No other format can boast all of these points at once.

In fact, although I've mentioned UNIX several times already, mdoc isn't exclusively tied to UNIX. Although UNIX and mdoc are historically linked, open source mdoc tools exist for any operating system. Furthermore, the documentation capabilities of mdoc apply to computing systems in general — not just UNIX.

In this book, however, I'll assume you are casually familiar with man and its output. This will allow us to focus on manuals with the same formatted output in mind. Thus, if you're unfamiliar with the man utility, this is a good time to read an introductory text on the subject (such as a UNIX beginner's guide), or at the very least, read the output of man man (the manual page of the man command).

This is not a canonical reference! The mdoc language is not standardised. For official reference, consult the manual distributed with your target computer system with man mdoc. This work primarily addresses the elements of mdoc common to any UNIX deployment, noting common pitfalls in portability.

Contents

History

Tutorial Introduction

Let's begin with practical examples of mdoc.

The intended audience of this part is somebody who has never written a mdoc manual. Although you may be tempted to jump to the chapter relevant to your manual type (for example, a command or function library), it's best to read the chapters in order. I'll explain mdoc syntax as we go.

If you've already written a few manuals, you may want to read this part anyway: beyond explaining technical mdoc language concepts by example, I'll also introduce some best practises and discuss portability between various mdoc environments.

I'll frequently refer to the screen output of mdoc documents as displayed with the UNIX man utility. Furthermore, I'll refer to command invocation in the traditional UNIX way — on the command line. In short, a bit of UNIX knowledge will help to avoid confusion. But I'll briefly introduce invocation syntaxes as the need arises.

Contents

History

Commands

Commands are the way in which a user operates her computer. Already I've noted the man command: if you've interacted with a UNIX system, you've probably run at least man intro or man man to learn about your system.

In this chapter, I'll discuss how to document these commands with mdoc.

This may be unfamiliar if you're accustomed to graphical interfaces — all of our examples will refer to command-line, text-based commands. If your target environment isn't a UNIX system, it's a good idea to read these examples anyway, as as they will expose the rudimentary structure of the mdoc language. As mentioned before, reading an introductory text on UNIX will help avoid confusion.

Let's begin by making a mental checklist for the criteria that make a good manual for a command. This checklist arises by inverting what a manual reader expects in opening a manual: what does the command do and how do I operate it?

Do I describe the calling syntax of the command?
Do I describe each flag and argument of the command?
Do I describe the command's operation?
Do I describe the command's exit status?
Do I describe referenced files and environment variables?

Above all, the best litmus test is whether a colleague or friend can read your manual and be able to use your command without any assistance on your part. Don't be discouraged by how this can take several tries to get right!

I'll begin with a simple command, hi, which prints hello, world to the screen. I'll then add some command-line arguments to this command. By the time you finish this chapter, you should have a grasp of mdoc syntax and some of its more widely-used macros.

In this text, I'll refer to the invocation of commands as cmd flag farg arg. Here, cmd refers to the command invocation name, flag is a flag (or switch) to that command, farg is an argument to a flag (not all flags have arguments), and arg is an argument to the command.

The dash in front of flag indicates a flag, while the square brackets around flag farg indicate an optional part of the invocation. Since arg is not bracketed, it is a mandatory part of the invocation.

This convention is formalised by the POSIX.1-2008 standard (Base Definitions, sec. 12.1), so you can expect to see it often in the UNIX world.

Contents

History

Simple Command

Consider a simple UNIX command hi that prints hello, world and exits. Let's create a manual page hi.1 documenting this command. In this example, I'll begin with the full manual. In later examples, we'll build up the manual piece by piece.

.Dd May 30, 2011
.Dt HI 1
.Os
.Sh NAME
.Nm hi
.Nd print \(dqhello, world\(dq
.Sh SYNOPSIS
.Nm
.Sh DESCRIPTION
Print
.Qq hello, world
and exit.

How to display this manual page depends on the system you're using.

Traditionally, the command for formatting UNIX manuals for a terminal is nroff. For now, let's stick with that.

To display output, you must invoke nroff as nroff -mandoc file. The mandoc flag indicates that input is in either mdoc or man format. Hereafter, I'll refer to nroff simply as the formatter to avoid confusion, as there are many available mdoc formatters.

NAME

hi — print "hello, world"

SYNOPSIS

hi

DESCRIPTION

Print “hello, world” and exit.

Let's start by studying the input and output. We can see most of the text translated into output, for instance, the capitalised NAME input is left-justified and in bold text. Same with SYNOPSIS and DESCRIPTION, although the .Sh text before this terms is missing. We can even see the output sentence Print "hello, world" and exit spread over lines 10–12:

Print
.Qq hello, world
and exit.

Let's take a closer look at this fragment.

The .Qq is part of mdoc's instruction syntax. Input lines beginning with a dot are instructions to the formatter called mdoc macros, or just macros for short. The macro name is a terse two or three-character word following the dot, for example, Qq. The name of a macro tersely hints at its function. The words following the Qq to the end of line are arguments in the scope of the macro.

Scope, a technical term in the field of programming languages, refers to the body of input within the context of an instruction or variable. In mdoc, a macro's scope is the block of text and instructions in the formatting context of that macro. Looking at the input and output, we can infer the scope of Qq by seeing what's surrounded by quotes (the formatting, in this case).

.Qq hello, world

Print “hello, world” and exit.

As we explore more and more macros in this book, we'll see that each macro follows one of a handful of scope rules. It's already clear that Qq is limited in scope to its invocation line. But notice that the formatter recognised the content between Sh macros as requiring indentation. So it's clear that mdoc also has a concept of multi-line scope. In fact, Sh has both line arguments, for the name of the section; and multi-line arguments, for section content.

.Sh SECTION 1
Section text.
.Sh SECTION 2
New section text.

Furthermore, the existence of Qq within the Sh scope means that scopes may be nested. In the next section we'll see how multiple macros may even be specified on a single line.

.Sh SECTION 1
Section text.
.Sh SECTION 2
.Qq Section text nested in a quote.

We can visualise this scoping as follows, with an outer scope and inner scope:

.Sh SECTION 2
.Qq Section text nested in a quote.

Now let's return to hi.1 with this new knowledge of macros and scopes.

We see seven macros in total, Dd, Dt, Os, Sh, Nm, Nd, and Qq. We know now that Qq encloses its arguments in double-quotes, Sh begins a named section with indented multi-line arguments.

Of the remaining macros, Dd accepts the last modification date of the manual in month day, year format. Dt refers to the manual's title, HI, and its category, 1. Numbered manual categories are UNIX conventions, but applicable to any operating system. We'll explore more standard categories throughout this book. Note that HI is uppercase: by convention, Dt should always accept a capitalised document title. We'll talk more about titles and sections in later chapters of this book. For now, let's assume that a category number identifies the topic of the manual, where 1 refers to utilities.

Next, Os indicates the operating system of the system running the formatter. If left unspecified, the formatter will return the current operating system (e.g., OpenBSD 4.9, Linux 2.6.32-5, or Microsoft Windows XP).

.Dd May 30, 2011
.Dt HI 1
.Os \" Current operating system.

Note that text following the \" marker is an mdoc comment, which has the following syntax:

Text. \" Comment to end of the line.
.\" Extending across the full line.

Comments are line-scoped, like Qq:

.\" .Sh NAME

Moving along, Nm accepts the manual's name. This differs from the title, Dt, in that a single manual may document multiple components. We'll see examples of this in later chapters. Finally, Nd accepts a brief, one-line description of the command.

.Sh NAME
.Nm hi
.Nd print \(dqhello, world\(dq

You can see that we re-invoke Nm in the SYNOPSIS, only without arguments. The formatter is smart enough to fill in its argument with the last supplied argument, in this case being hi.

Since our simple command has no command-line arguments, its invocation is simply the command name.

.Sh SYNOPSIS
.Nm

Piecing this all together, we now have the following.

.Dd May 30, 2011
.Dt HI 1
.Os
.Sh NAME
.Nm hi
.Nd print \(dqhello, world\(dq
.Sh SYNOPSIS
.Nm
.Sh DESCRIPTION
Print
.Qq hello, world
and exit.

In this example, you've noticed that \(dqhello, world\(dq has the same behaviour of the Qq invocation. In mdoc, quotation marks signify literal strings. Thus, we used an escape character \(dq to render ".

You may ask why not just use Qq, such as

.Nd print
.Qq hello, world

For the time being, assume that Nd must have its scope on the invocation line. Strictly-speaking, we could have written

.Nd print "hello, world"

but this encourages dangerous behaviour in assuming that quoted arguments may not affect output. This isn't always the case! We'll see later how quoted terms on macro lines change the grouping of arguments — at times non-intuitively.

Before moving on to the next section, let's look quickly over our checklist for a well-formed manual.

Did I describe the calling syntax of the command?: Yes. It was only the name of the macro (no arguments or flags).
Did I describe each flag and argument of the command?: There were none, so yes.
Did I describe the command's operation?: Yes, it prints hello, world and exits.
Did I describe the command's exit status?: No, we only mentioned that it exits.
Did I describe referenced files and environment variables?: This is not applicable.

To the effect of the exit status, let's modify the DESCRIPTION slightly for clarity.

.Sh DESCRIPTION
Print
.Qq hello, world
and exit 0.

Of course, our command must actually do so! For simplicity's sake, let's assume that this is the case.

With our simple, well-documented example in mind, let's move on to a more realistic UNIX command.

Contents

History

Elaborate Command

Most UNIX commands have flags, arguments, return values, environmental variables, and so on. So let's expand upon our example to include arguments for writing to an output file and a flag for outputting in uppercase letters. Furthermore, we'll accept an optional prefix string on the command-line, and return non-zero on failure.

This changes two parts of our manual: the SYNOPSIS section, where we'll record the invocation syntax of our command; and the DESCRIPTION, where we'll describe the command-line options. We'll also add a new section, EXIT STATUS, to describe the non-zero exit on failure.

Let's start by documenting our command-line options in the SYNOPSIS section:

.Sh SYNOPSIS
.Nm
.Op Fl C
.Op Fl o Ar output
.Op Ar prefix

The output renders as follows:

SYNOPSIS

hello

[-C] [-o output] [prefix]

Already, we begin to see the output take shape with the C and o characters, and the prefix. It's also clear that the Op macro surrounds its arguments in square brackets, just as Qq surrounded its line in double-quotes.

But how did the formatter know to prefix the C and o with a dash, or underline the arguments output and prefix?

It's obvious this has something to do with Fl and Ar.

Macro lines may in fact consist of multiple macros — sometimes nesting further macros, sometimes closing prior scopes to begin one anew. The Fl and Ar words are macros nested within the scope of Op. However, while Op contains both of these child scopes, the Ar macro closes out the Fl scope and begin its own.

.Op Fl C

.Op Fl o Ar output

.Op Ar prefix

Outer parts are an outer scope, while inner parts are an inner scope. Now it's easy to see how Fl prefixes only the C with a dash and not the arguments following: its scope is closed out by Ar.

Note that to document a flag Ar, we would need to quote its arguments as Fl "Ar" (we'll later learn how to escape arguments with zero-width spaces to accomplish the same). As there are many mdoc macros, a popular novice mistake is to unknowingly invoke a macro when expecting to print text.

With our command syntax documented, let's document the arguments themselves. To do so, we detail the meaning of flags and arguments in the DESCRIPTION section.

The
.Nm
function prints
.Qq hello, world
and returns.
.Pp
Its arguments are as follows:
.Bl -tag -width Ds
.It Fl C
Print only uppercase letters.
.It Fl o Ar output
Write to file
.Ar output .
.It Ar prefix
Prefix the output with
.Ar prefix .
.El

Immediately, we see the introduction of several new macros: Pp, Bl, It, and El. More interestingly, we notice the text on the Bl begins with a dash, just as when passing arguments on a command line. This is the first instance of a macro that accepts flags.

The rendered output of this fragment is as follows.

Its arguments are as follows:

It should be clear that the Pp macro, which always stands alone, introduces a vertical paragraph break.

Earlier, I introduced the concept of a multi-line scope for Sh, which was closed and re-opened by subsequent invocations of Sh. In this fragment, the Bl macro (for begin list) is explicitly closed out by the El macro (end list). This is an example of explicit scope closure, versus the implicit scope closure of Sh sequences.

Predictably, the Bl and El enclosure consists of list items, begun by the multi-line It macro lines. Like Sh, the It macro has its scope closed by subsequent invocations of It. As expected, its scope also closes when the surrounding list is closed with El.

Until now, we've discussed only macros and macro arguments. But a handful of macros — Bl included — also accept flags which themselves may have arguments. In our example, the tag flag to Bl stipulates a tagged list. A tagged list entry consists of two parts: a tag and data, similar to the <DL> descriptive lists in HTML consisting of a key and data. Bl accepts a second flag, width, which accepts the argument Ds. This instructs the formatter that the tag portion of the list has width Ds, which is shorthand for default spacing.

Next, let's look closer at the input line

.Ar prefix .

Note that it's correctly rendered with the period flushed up against the text, whereas the period is space-separated in the input. (The period itself isn't font-decorated, although this is difficult to see in the media you're reading.)

By making the punctuation a separate argument, we distinguish it from the term prefix, and thus it is not underlined. The formatter is smart enough to distinguish standalone punctuation. When writing an mdoc manual, punctuation should always be separated from macro arguments unless it's part of the argument itself. This allows the formatter to correctly intuit end-of-line spacing.

If we hadn't done so, the formatter wouldn't distinguish period from word. This is more intuitive when re-using the familiar Qq.

.Qq first .

.Qq second.

We can now see the difference in the placement of punctuation:

“first”. “second.”

Let's piece this all together. You'll recognise the Dd, Dt, and Os macros from the last section, although the Dt argument has changed with our command name.

.Dd May 30, 2011
.Dt HELLO 1
.Os
.Sh NAME
.Nm hello
.Nd print \(dqhello, world\(dq
.Sh SYNOPSIS
.Nm
.Op Fl C
.Op Fl o Ar output
.Op Ar prefix
.Sh DESCRIPTION
The
.Nm
function prints
.Qq hello, world
and returns.
.Pp
Its arguments are as follows:
.Bl -tag -width Ds
.It Fl C
Print only uppercase letters.
.It Fl o Ar output
Write to file
.Ar output .
.It Ar prefix
Prefix the output with
.Ar prefix .
.El

Notice that we don't repeat the Op macros in the DESCRIPTION, although we stipulate them in the SYNOPSIS. This is because we document the flags and arguments themselves in the DESCRIPTION, not the calling syntax of the command.

Finally, let's accommodate for command errors by stipulating the exit status of the command. To do this, we add a new section to the end of the manual, EXIT STATUS, consisting of a single macro. We didn't add this to hi.1 because we didn't stipulate any exit state; however, it's good practise to always include this section, even if your command only exits in one way.

.Sh EXIT STATUS
.Ex -std

The Ex macro is special in that it always accepts a flag, std. This is by convention. Although you can specify an argument to Ex, it works like Nm without arguments in that it reproduces the name of the document as last invoked with Nm. It prints a standardised message about the exit status of the command.

EXIT STATUS

The hello utility exits 0 on success, and >0 if an error occurs.

With our manual complete, let's go over our checklist.

Did I describe the calling syntax of the command?: Yes, including flags and arguments.
Did I describe each flag and argument of the command?: Yes for all flags and arguments.
Did I describe the command's operation?: Yes, that it prints hello, world.
Did I describe the command's exit status?: Yes, that it returns a non-zero exit code on failure.
Did I describe referenced files and environment variables?: This is not applicable to this manual.

Of course, most real manuals have many other useful bits of information, such as author names, referenced standards, files, and so on. I'll describe these in detail in later chapters of this book.

Contents

History

Case Study

I now introduce a case study of a real-world manual, in particular the echo utility from OpenBSD. The original file may be viewed on-line at src/bin/echo/echo.1, file version 1.20. I choose this mainly because of its simplicity.

.\" $OpenBSD: echo.1,v 1.20 2010/09/03 09:53:20 jmc Exp $
.\" $NetBSD: echo.1,v 1.7 1995/03/21 09:04:26 cgd Exp $

These initial comments are automatically created by the source-control system cvs, which fills in information about the last editor. I'll talk about revision control and those funny dollar-sign enclosures in Part 3. These particular comments indicate that the file was initially imported from NetBSD in 1995, where it was last edited by cgd (a system name, not the user's real name). It was last edited in OpenBSD, its current form, by jmc in 2010.

If you're keeping your manual under source control, it's usually a good idea to begin your file with a similar line.

.\" $Id$

A tab character separates the comment marker from the text. Again, this will be covered later in this book — don't worry if it looks strange.

.\" Copyright (c) 1990, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" This code is derived from software contributed to Berkeley by
.\" the Institute of Electrical and Electronics Engineers, Inc.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.

This long comment is the license and copyright of the source file. Of course, our use of this source file is compatible with the license, as may be read from the text itself!

.\" @(#)echo.1 8.1 (Berkeley) 7/22/93

This comment is of historical note. The @(#) sequence was inserted by the sccs utility (Source Code Control System). Although this utility is part of POSIX.1-2008, it has mostly been replaced by cvs. You'll probably never encounter this string in your own manuals, so it's safe to disregard.

At this point the manual content itself begins.

.Dd $Mdocdate: September 3 2010 $
.Dt ECHO 1
.Os

This indicates that the manual's title is ECHO in category 1 (utilities) for the current installed operating system. The $Mdocdate$ enclosure is similar to that as defined at the top of the file with $OpenBSD$.

.Sh NAME
.Nm echo
.Nd write arguments to the standard output

This documents a single command, the echo command, which does as mentioned.

.Sh SYNOPSIS
.Nm echo
.Op Fl n
.Op Ar string ...

The command accepts a single optional flag, n, and an arbitrary number of optional arguments string. Note that re-stating the command name for the Nm is superfluous in this case.

.Sh DESCRIPTION
The
.Nm
utility writes any specified operands, separated by single blank
.Pq Sq \ \&
characters and followed by a newline
.Pq Sq \en
character, to the standard
output.
When no operands are given, only the newline is written.

The DESCRIPTION opens with a brief explanation of the utility and its output. The strange set of backslash-escaped characters \ \& is required to make the doubly-nested macros Pq and Sq (parenthesise and single-quote, respectively) correctly enclose a single space.

.Pp
The options are as follows:
.Bl -tag -width Ds
.It Fl n
Do not print the trailing newline character.
.El

This follows the standard form of documenting flags and arguments as a term/definition list. Each one — in this case only one — is documented in alphabetical order.

.Sh EXIT STATUS
.Ex -std echo

Notes the standard exit sequence. Note that the argument to Ex is superfluous, as only one command is listed for the manual.

.Sh SEE ALSO
.Xr csh 1 ,
.Xr ksh 1 ,
.Xr printf 1

Although these weren't cited in other sections of the manual, the author felt it necessary to link to them. This is probably because both csh and ksh include internal implementations of a function by the same name.

.Sh STANDARDS
The
.Nm
utility is compliant with the
.St -p1003.1-2008
specification.
.Pp
The flag
.Op Fl n
is an extension to that specification.
.Pp
.Nm
also exists as a built-in to
.Xr csh 1
and
.Xr ksh 1 ,
though with a different syntax.

This last section fully describes the utility's conformance to the POSIX standard, which is very important to those writing portable utilities. The St macro expands into the relevant standard's full name, IEEE Std 1003.1-2008 (“POSIX.1”). For a full list of standards, consult your local documentation for the macro.

Contents

History

Functions

Programming functions are a significant part of the UNIX canon, from the system call interface to the C library. If you're a C or C++ developer, chances are you've at least glanced through the manuals of functions such as socket, printf, or memmove.

In general, the mdoc macros used for documenting programming functions are the same as those used for Commands; however, there are some domain-specific bits to annotate the various parts of function versus command invocation. You'll see that each command invocation macro, such as Fl for a flag, has an analogue for programming functions, such as the Fa, for a function argument.

The mdoc format is used primarily for the C language and Fortran, but it works with C++, Perl, Tcl, and other imperative languages. In fact, most any language with functions (or subroutines) and variables will work, typed or not. In this book, I focus exclusively on the C language. This is due to the overwhelming presence of C libraries and functions documented with mdoc.

Before beginning, we need to change our mental checklist for a complete manual. Since function manuals can document more than just function parts, our manual must grow to account for complexity.

Do I describe the preprocessing and linking information?
Do I describe the calling syntax of each function and variable?
Do I describe the type of each function and variable?
Do I describe each argument in each calling syntax?
Do I describe each function's operation?
Do I describe each function's side effects?

A function is any callable instruction, be it a C function, routine, or macro. A variable may also be a C variable or macro. I'll consistently use the function and variable terminology in this book.

In general, you don't have to be knowledgeable of C to understand this section, but it helps to have a grasp of basic programming structure, such as functions, function prototypes, and header files. In any event, I'll refer to a header file as a text file consisting of function prototypes. Header files for the C language, such as in our examples, end with the .h suffix. A C function prototype indicates the calling syntax of a function, such as the following.

int
isspace(int c);

In this, the C function isspace, notationally referred to as isspace, has a single argument int c (an integer named c) and returns a value of type int (another integer). Multiple arguments are comma-separated.

Contents

History

Simple Function

Let's study a simple C function, hi, which prints hello, world just like in prior sections. We begin with the familiar first macros.

.Dd May 30, 2011
.Dt HI 3
.Os
.Sh NAME
.Nm hi
.Nd print \(dqhello, world\(dq

All that's changed is the manual category, from 1 to 3. We'll discuss manual categories later in this book. Suffice to say that programming functions and libraries (not system calls!) are in category 3.

The calling syntax of our function is documented in the SYNOPSIS section. Assume that our function prototype is within the header file hi.h as void hi(void), which, in non-programming terms, declares that a function hi accepts no arguments and does not return a value.

.Sh SYNOPSIS
.In hi.h
.Ft void
.Fn hi

This introduces three unfamiliar macros. The In macro specifies an include file that interfacing programmes will need to include. The Ft and Fn macros collectively document the function (return) type and function name. Since not all languages have return types, the Ft macro is optional in this context.

SYNOPSIS

#include <hi.h>

void
hi();

By now it comes as no surprise that Ft is scoped to the end of its line, as is Fn in this example. In fact, both of these macros are syntactically similar to the Ar and Fl found in the first chapter: their scopes are closed by subsequent macros on the same line.

Since our function has no arguments or return values, all we need to do is add some bits in the DESCRIPTION section to complete this manual.

.Dd May 30, 2011
.Dt HI 3
.Os
.Sh NAME
.Nm hi
.Nd print \(dqhello, world\(dq
.Sh SYNOPSIS
.In hi.h
.Ft void
.Fn hi
.Sh DESCRIPTION
The
.Fn hi
function prints
.Qq hello, world
and returns.
.Pp
It has no arguments.

Here, you'll notice a difference between a function and command manual. It's clear that we refer back to our invoked command using Fn instead of Nm. Why is this? The Nm macro, when used in the body of a manual, refers to the command name, not the manual name, as we used Nm to annotate that utility name in the SYNOPSIS. In functions, we use the Fn macro. The difference is that Fn won't repeat the manual name if used without arguments. This complexity is simply the result of poor planning in the mdoc language.

Let's visualise the output so far:

NAME

hi — print "hello, world"

SYNOPSIS

#include <hi.h>

void
hi();

DESCRIPTION

The hi() function prints “hello, world” and returns.

It has no arguments.

Lastly, let's stipulate the function return value in its own section, RETURN VALUES. This mirrors the EXIT STATUS introduced for hello.1. Although we don't have a return value, it's good practise to include this section anyway.

.Sh RETURN VALUES
The
.Fn hi
function does not return a value.

Although this example is instructive, it's quite simple. Let's review our checklist before moving on.

Did I describe the preprocessing and linking information?: Yes, a header file. There is no linking information.
Did I describe the calling syntax of each function and variable?: Yes, the hi function.
Did I describe the type of each function and variable?: Yes, as hi has neither type nor value.
Did I describe each argument in each calling syntax?: This does not apply, as it has none.
Did I describe each function's operation?: Yes, in that it prints hello, world.
Did I describe each function's side effects?: This does not apply, as it has none.

Very few real-world functions are so simple. In the next section, we introduce a more practical function with types and arguments.

Contents

History

Elaborate Function

Let's also study a function form of the elaborate command example. Again, I'll use C as my example. Since this is a bit more involved, you may feel a little lost if you're not familiar with C programming. I'll keep the technical jargon to a minimum.

Let's re-write hi as hello, accepting a Boolean (zero or one) integer of whether to capitalise, and an optional character string (a word) prefix. Let's also stipulate an integer return value.

.Sh SYNOPSIS
.In hello.h
.Ft int
.Fo hello
.Fa "int C"
.Fa "const char *prefix"
.Fc

If you're not familiar with C, the const char * and int parts are part of the C language. Note that the C and prefix names haven't changed.

The include file (In) and function return type (Ft) are unchanged but for the type (int instead of void). I've added an explicit-scope macro pair Fo and Fc, syntactically like Bl and El, that encloses the function's arguments.

This renders as follows. Note that the formatter is smart enough to comma-separate the Fa macros.

SYNOPSIS

#include <hello.h>

int
hello(int C, const char *prefix);

It's clear that the Fo macro accepts the function name (as Fn did for the simple example), but it also opens a function prototype scope. This scope is closed by Fc. The contained Fa macros are for function arguments.

If you're wondering why I didn't use Fn as in the last section, it's a matter of readability. Using Fn puts everything on one long line, such as the following.

.Sh SYNOPSIS
.In hello.h
.Ft int
.Fn hello "int C" "const char *prefix"

This works with two arguments, but can quickly run into long lines. In general, your mdoc manual shouldn't exceed 78 characters per line. Shorter lines are useful when managing manuals in cvs or other version management systems — we'll discuss this in later sections of this book.

The quoted arguments to Fa may seem superfluous, but each argument to the Fa, for the C language, refers to a type and variable name. Since one may specify several arguments to a single Fa, the quotes are necessary for signifying a single argument type and name.

.Sh SYNOPSIS
.In hello.h
.Ft int
.Fo hello
.Fa "int C" "const char *prefix"
.Fc

This renders as follows, with the Fa scope highlighted.

SYNOPSIS

#include <hi.h>

void
hello(int C, const char *prefix);

In the C language, function prototypes don't necessarily need named function arguments. However, it's good practise to name arguments when documenting them in the SYNOPSIS so that we can consistently refer to them later on in the manual. Let's refer to them now in the DESCRIPTION, where we document our arguments.

Note that there are no set conventions for documenting function arguments in the DESCRIPTION body. Sometimes this is done within the flow of a regular sentence. Other times, as below, we'll introduce each argument as part of a list.

.Sh DESCRIPTION
The
.Fn hello
function prints
.Qq hello, world .
.Pp
It accepts the following arguments:
.Bl -tag -width Ds
.It Fa "int C"
Non-zero if the output should be uppercase.
.It Fa "const char *prefix"
A prefix to precede the output or NULL for no prefix.
.El

Here, we see the familiar Bl and El list enclosure. Notice how I re-use the Fa macro to document individual arguments, just like I re-used Fl and Ar when documenting command-line flags and arguments. In the last section, I mentioned why we use Fn instead of Nm for re-stating the name.

This renders as follows.

DESCRIPTION

The hello() function prints “hello, world”.

It accepts the following arguments:

Finally, let's add a section documenting the return value of this function. This will differ from the simple example in that we actually return a value.

.Sh RETURN VALUES
The
.Fn hello
function returns 1 on success, 0 on failure.

Piecing this example together, we have the following the following respectable C function manual.

.Dd May 30, 2011
.Dt HELLO 3
.Os
.Sh NAME
.Nm hello
.Nd print \(dqhello, world\(dq
.Sh SYNOPSIS
.In hello.h
.Ft int
.Fo hello
.Fa "int C" "const char *prefix"
.Fc
.Sh DESCRIPTION
The
.Fn hello
function prints
.Qq hello, world .
.Pp
It accepts the following arguments:
.Bl -tag -width Ds
.It Fa "int C"
Non-zero if the output should be uppercase.
.It Fa "const char *prefix"
A prefix to precede the output or NULL for no prefix.
.El
.Sh RETURN VALUES
The
.Nm
function returns 1 on success, 0 on failure.

Running through our checklist, we see that we've described preprocessor information with the header file macro In; function calling syntax and types in the SYNOPSIS; and arguments in the DESCRIPTION along with function operation. This contains all we need to know about the function.

Contents

History

Case Study

I now introduce a case study of a real-world function manual, in particular the manual for the strtonum function, which is an extension to the C Standard Library found in OpenBSD. The original file may be viewed on-line at src/lib/libc/stdlib/strtonum.3, file version 1.14.

In this case study, I've chosen a manual with some bad behaviour — not broken, but bad. This is intentional to show how real-world manuals deviate from recommended forms. I'll explicitly note each instance of bad behaviour as we explore the manual's contents.

.\" $OpenBSD: strtonum.3,v 1.14 2007/05/31 19:19:31 jmc Exp $
.\"
.\" Copyright (c) 2004 Ted Unangst
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

This is the standard comment header to manual files in OpenBSD. Indeed, most distributed manuals begin with a copyright notice, then a license. The $OpenBSD$ line is automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the ISC license.

.Dd $Mdocdate: May 31 2007 $
.Dt STRTONUM 3
.Os

These three standard macros establish the last modified date, manual title (same as the single documented function but capitalised), manual category 3 (functions), and the default operating system. The $Mdocdate$ line, like the $OpenBSD$ line, is automatically updated by cvs whenever the document is committed to the source repository.

.Sh NAME
.Nm strtonum
.Nd "reliably convert string value to an integer"

Declares a single documented function, strtonum, and its purpose. The quotations within the Nd macro are superfluous: like Qq macro studied earlier, Nd accepts an arbitrary number of arguments to format. Quotation, in grouping these as one argument, serves little but to pass in whitespace (there is no special whitespace to pass in).

.Sh SYNOPSIS
.Fd #include <stdlib.h>
.Ft long long
.Fo strtonum
.Fa "const char *nptr"
.Fa "long long minval"
.Fa "long long maxval"
.Fa "const char **errstr"
.Fc

This declares the function prototype and calling syntax. First, let's examine the new Fd macro. The use of this macro for a header inclusion is obsolete: new manuals should always use In. This makes it easier for parsers to understand a header file — and possibly link to it — instead of being a generic preprocessor statement. The re-written form would begin as follows:

.Sh SYNOPSIS
.In stdlib.h

Moving along, we see that each function argument includes its name (e.g., nptr and minval). While not common in header file prototypes, this allows later references of function invocation in the manual to refer back to the prototype for type and context information. In the previous section, we discussed the relevance of quotation with Fa: the same is done here.

While we could have used Fn, it would have created an overly long input line. Using Fn instead of Fo is purely a matter of convenience and has no effect on parsing or formatting.

.Sh DESCRIPTION
The
.Fn strtonum
function converts the string in
.Fa nptr
to a
.Li long long
value.

In the SYNOPSIS, the Fa included the full type information. Here, however, we use Fa with just its name, nptr. We could have done the same in the SYNOPSIS, but the C language includes all type information in its prototypes.

The Li macro here isn't good practise: since the long long refers to a type, it should be of type Vt. This behaviour — using a presentation macro instead of a semantic one — is a holder from legacy manual forms that are purely presentational. If you find yourself applying a style, think twice whether it's a good idea!

The
.Fn strtonum
function was designed to facilitate safe, robust programming
and overcome the shortcomings of the
.Xr atoi 3
and
.Xr strtol 3
family of interfaces.
.Pp
The string may begin with an arbitrary amount of whitespace
(as determined by
.Xr isspace 3 )
followed by a single optional
.Ql +
or
.Ql -
sign.
.Pp
The remainder of the string is converted to a
.Li long long
value according to base 10.
.Pp
The value obtained is then checked against the provided
.Fa minval
and
.Fa maxval
bounds.
If
.Fa errstr
is non-null,
.Fn strtonum
stores an error string in
.Fa *errstr
indicating the failure.

The remainder of the DESCRIPTION section has completely captured the calling syntax and behaviour of the function. The usage of Ql macro is simply to set aside non-alphanumeric letters from the regular stream of text.

.Sh RETURN VALUES
The
.Fn strtonum
function returns the result of the conversion,
unless the value would exceed the provided bounds or is invalid.
On error, 0 is returned,
.Va errno
is set, and
.Fa errstr
will point to an error message.
.Fa *errstr
will be set to
.Dv NULL
on success;
this fact can be used to differentiate
a successful return of 0 from an error.

Since this function returns a rather tricky error message, it's necessary to describe the effects of both the return value and the passed-in arguments.

.Sh EXAMPLES
Using
.Fn strtonum
correctly is meant to be simpler than the alternative functions.
.Bd -literal -offset indent
int iterations;
const char *errstr;

iterations = strtonum(optarg, 1, 64, &errstr);
if (errstr)
errx(1, "number of iterations is %s: %s", errstr, optarg);
.Ed
.Pp
The above example will guarantee that the value of iterations is between
1 and 64 (inclusive).

Many manual readers jump directly to the EXAMPLES section to gain an understanding of your function. Thus, not only must the example compile and run, it must also demonstrate as many parts of the function as possible. In the case of strtonum, an error condition and a non-error condition are documented. However, the header file inclusion(s) are missing, which may mislead readers. In particular, the non-standard errx function requires the err.h header file.

.Sh ERRORS
.Bl -tag -width Er
.It Bq Er ERANGE
The given string was out of range.
.It Bq Er EINVAL
The given string did not consist solely of digit characters.
.It Bq Er EINVAL
.Ar minval
was larger than
.Ar maxval .
.El
.Pp
If an error occurs,
.Fa errstr
will be set to one of the following strings:
.Pp
.Bl -tag -width "too largeXX" -compact
.It too large
The result was larger than the provided maximum value.
.It too small
The result was smaller than the provided minimum value.
.It invalid
The string did not consist solely of digit characters.
.El

The ERRORS section will be rigorously covered in the section on System Calls. In brief, since the errno global error variable is set, each possible value must be documented in a list using the Er macro. These are always enclosed within Bq.

Furthermore, the error string in errstr must also be documented.

.Sh SEE ALSO
.Xr atof 3 ,
.Xr atoi 3 ,
.Xr atol 3 ,
.Xr atoll 3 ,
.Xr sscanf 3 ,
.Xr strtod 3 ,
.Xr strtol 3 ,
.Xr strtoul 3

This section collects all references to other manuals made elsewhere in this manual, then adds more for completion. Note that the entries are alphabetically sorted.

.Sh STANDARDS
.Fn strtonum
is an
.Ox
extension.
The existing alternatives, such as
.Xr atoi 3
and
.Xr strtol 3 ,
are either impossible or difficult to use safely.
.Sh HISTORY
The
.Fn strtonum
function first appeared in
.Ox 3.6 .

Since this function is included in OpenBSD's C Standard Library, the fact that the function is not standard must absolutely be documented. In this, the Ox macro indicates the OpenBSD operating system (each BSD UNIX operating system has its own macro).

Contents

History

Function Library

I've mentioned several times that the name provided to Nm doesn't necessarily refer to the title of the manual in Dt. Let's study a simple function library, using both hi and hello, which demonstrates this concept.

A function library is a collection of object files, which consist mainly of programming functions, within a single file called a library. On most UNIX systems, you can find libraries installed in /usr/lib and/or /usr/local/lib, ending in .a, .la, .dylib, or .so (followed by version numbers).

This example applies to any number of functions belonging to the same library — not necessarily all functions in the library. In fact, one commonly finds large libraries spread over many manuals, each of which contain several similar functions.

In general, it's not a good idea to document your library in one manpage: the best form is to have one manpage per function (or associated functions). Larger libraries will often have an index-style manpage, but that's different. It's not a good idea because it requires the reader to grep through the page to find the function that interests them. It might be easier for you to maintain one file when your API changes, but you might alienate users by requiring them to jump through your manpage to find their function. Keep the trade-off in mind!

For simplicity's sake, I'll call this C function library libgreeting, implying that the installed library is called libgreeting.a or libgreeting.so. It will consist of two header files, hi.h and hello.h, containing the function prototypes for hi and hello, respectively.

Let's begin with the first few macros, which are also called the manual prologue.

.Dd May 30, 2011
.Dt GREETING 3
.Os

Note that I've changed the document title to be GREETING instead of choosing between function names. This is because the manual documents the entire function library, not just one particular function. In general, a function library should have its name not include the leading lib.

It's a good rule of thumb that the Dt title of your document matches its filename.

Next, I'll list the names of the functions being documented. I also change the description of the manual to accommodate for the functionality of both documented functions.

.Sh NAME
.Nm hello ,
.Nm hi
.Nd print greeting messages

Here I've used Nm twice to indicate that the manual documents two functions. In doing so, I'll have to be careful when invoking Nm in later parts of the manual, as it will produce hello if I don't specify a name, and this is probably not desired (nor should it be depended upon, as I may re-order the names).

If we were only documenting a single function in a library, we would only assign Nm and Nd to the relevant function and not that of the library.

It's easiest to alphabetise the function names in the NAME section, but there are many ways to order the names: order of importance, most often called, etc. We must also be sure to comma-separate each name, leaving the last invocation without a comma. Let's look at the output so far.

NAME

hello, hi — print greeting messages

Even though that is hard to maintain and not very useful, some operating systems, for example FreeBSD and NetBSD, require a LIBRARY section for base system libraries. For portable libraries, do not include such a section.

.Sh LIBRARY
.Lb greeting

This uses the macro Lb, which accepts the name of the library omitting the starting lib. This macro is not portable because the list of known library names is system dependent, so it will produce different output on different systems, which is not desirable for a manual page.

NAME

hello, hi — print greeting messages

LIBRARY

library “greeting”

The SYNOPSIS section will simply be a collection of the calling syntaxes for both functions, which we've already studied. If we were only documenting one function, would list only that function here.

.Sh SYNOPSIS
.In hello.h
.Ft int
.Fo hello
.Fa "int C" "const char *prefix"
.Fc
.In hi.h
.Ft void
.Fn hi

Note that I've listed both include files prior to the function prototypes. This is familiar to C programmers, where functions may have multiple include files that need a specific order. The functions are listed in the same order as their Nm listing.

Let's examine the output so far.

NAME

hello, hi — print greeting messages

LIBRARY

library “greeting”

SYNOPSIS

#include <hello.h>
#include <hi.h>

int
hello(int C, const char *prefix);

void
hi();

Already, a manual reader has lots of pertinent information: the name of the library, its header file, and the function calling syntax. Let's continue in documenting the functions and their arguments, but this time, we'll do so in a different style than before.

Instead of using lists, we describe each function as a free-form stream of text. We depend on the SYNOPSIS to hint the reader as to the function argument types; there's no need to re-state them.

.Sh DESCRIPTION
The
.Fn hi
and
.Fn hello
functions print out greeting messages.
.Pp
The
.Fn hi
function accepts no arguments and prints out
.Qq hello, world .
.Pp
The
.Fn hello
function accepts a value
.Fa C ,
which if non-zero indicates output should be uppercase; and
.Fa prefix ,
which, if non-NULL, shall be prefixed to the output.
The
.Fa prefix
argument, which may be NULL.

Notice how each sentence in this fragment ends on its own line, for example,

which, if non-NULL, shall be prefixed to the output.
The
.Fa prefix

By doing so, the formatter is able to recognise the end of sentence and correctly handle sentential spacing. In most cases, this means adding two spaces between the period and subsequent text. From this follows a rule of thumb, new sentence, new line.

In this DESCRIPTION we've captured what each function does and what its arguments are. What remains are return values.

.Sh RETURN VALUES
The
.Fn hello
function returns 1 on success, 0 on failure.

If you're writing functions that returns -1 on failure (setting the errno) and 0 on success (such as a system call), then you can also use the Rv macro.

Let's collect these fragments into a single document and see if it's enough to use as a programming reference.

NAME

hello, hi — print greeting messages

LIBRARY

library “greeting”

SYNOPSIS

#include <hello.h>
#include <hi.h>

int
hello(int C, const char *prefix);

void
hi();

DESCRIPTION

The hi() and hello() functions print out greeting messages.

The hi() function accepts no arguments and prints out “hello, world”.

The hello() function accepts a value C, which if non-zero indicates output should be uppercase; and prefix, which, if non-NULL, shall be prefixed to the output. The prefix argument, which may be NULL.

RETURN VALUES

The hello() function returns 1 on success, 0 on failure.

We'll use our mental checklist as a guide. First we stipulated linking information with the Lb macro. Then we introduced the calling syntax of each function, naming their arguments. We also stipulated the necessary header files in the order they'd be included in source files. In the DESCRIPTION, we described each function and its arguments in full. Lastly, we documented return values in the RETURN VALUES section. If the function does not return a value, there's no need to mention it.

From this information, a programmer should be able to interface with our library.

Contents

History

Case Study

I now introduce a case study of a real-world function library manual, in particular the manual for the getc, fgetc, getw, and getchar functions from OpenBSD. The original file may be viewed on-line at src/lib/libc/stdio/getc.3, file version 1.12. This is not the manual for the full function library, but only a handful of similar functions.

.\" $OpenBSD: getc.3,v 1.12 2007/05/31 19:19:31 jmc Exp $
.\"
.\" Copyright (c) 1990, 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" This code is derived from software contributed to Berkeley by
.\" Chris Torek and the American National Standards Committee X3,
.\" on Information Processing Systems.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.

This is the standard comment header to manual files in OpenBSD. The $OpenBSD$ line is automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the BSD license.

.Dd $Mdocdate: May 31 2007 $
.Dt GETC 3
.Os

This classifies our manual in category 3 as a function or function library. The title of the manual, GETC, is chosen as the most general of those functions listed below in the NAME section.

.Sh NAME
.Nm fgetc ,
.Nm getc ,
.Nm getchar ,
.Nm getw
.Nd get next character or word from input stream

Lists (alphabetically) all the functions that will be documented, and some general notes about their collective function. We next jump down into the SYNOPSIS; since this set of functions is part of the C Standard Library, it needs no special linking information.

.Sh SYNOPSIS
.Fd #include <stdio.h>
.Ft int
.Fn fgetc "FILE *stream"
.Ft int
.Fn getc "FILE *stream"
.Ft int
.Fn getchar "void"
.Ft int
.Fn getw "FILE *stream"

This documents the calling syntax of all functions. Note that the Fd macro is used instead of the In macro. This invocation is historically relevant, but new manuals should always use In.

.In stdio.h

Next, each function and its arguments is explained as a free-flowing paragraph. This was probably chosen instead of using a list item for each argument (with Bl) due to the small number of arguments.

.Sh DESCRIPTION
The
.Fn fgetc
function obtains the next input character (if present) from the stream
pointed at by
.Fa stream ,
or the next character pushed back on the stream via
.Xr ungetc 3 .
.Pp
The
.Fn getc
function acts essentially identically to
.Fn fgetc ,
but is a macro that expands in-line.
.Pp
The
.Fn getchar
function is equivalent to
.Fn getc
with the argument
.Em stdin .
.Pp
The
.Fn getw
function obtains the next
.Li int
(if present)
from the stream pointed at by
.Fa stream .

The usage of the Em macro is not correct: the Va or Dv macro would have been more appropriate. The same applies to the Li. The mdoc language is semantic, so using presentation macros such as Li and Em is discouraged.

.Sh RETURN VALUES
If successful, these routines return the next requested object from the
.Fa stream .
If the stream is at end-of-file or a read error occurs, the routines return
.Dv EOF .
The routines
.Xr feof 3 and
.Xr ferror 3
must be used to distinguish between end-of-file and error.
If an error occurs, the global variable
.Va errno
is set to indicate the error.
The end-of-file condition is remembered, even on a terminal, and all
subsequent attempts to read will return
.Dv EOF
until the condition is cleared with
.Xr clearerr 3 .
.Sh SEE ALSO
.Xr ferror 3 ,
.Xr fopen 3 ,
.Xr fread 3 ,
.Xr putc 3 ,
.Xr ungetc 3

All possible return values are correctly documented in the RETURN VALUES section and relevant functions cross-linked in the SEE ALSO section. Note that the cross-linked manuals are also alphabetically sorted.

.Sh STANDARDS
The
.Fn fgetc ,
.Fn getc ,
and
.Fn getchar
functions conform to
.St -ansiC .

Noting standards conformance is extremely important: it allows programmers and administrators to depend on your component in a cross-platform fashion. These functions are part of the C Standard Library.

.Sh BUGS
Since
.Dv EOF
is a valid integer value,
.Xr feof 3
and
.Xr ferror 3
must be used to check for failure after calling
.Fn getw .
.Pp
Since the size and byte order of an
.Vt int
may vary from one machine to another,
.Fn getw
is not recommended for portable applications.

The BUGS section should be used very carefully — bugs preferably should be fixed. In this section, design bugs have been documented. Whether the CAVEATS section would be more appropriate is up to the manual author.

We found several inconsistent uses of mdoc in this manual. In general, if you find unusual or erroneous macros or styles in UNIX manuals, notify the authors! A bug in a manual is just as important as a bug in the code.

Contents

History

System Call

A system call differs from a user-land function in that it triggers the operating system kernel to perform some operation. This usually applies to I/O, such as reading from files or sockets with write. Other than that, system calls are no different than regular functions — they're invoked, have return values, and so on.

In mdoc, however, a system call is a special function consisting of at least one section not found in ordinary function manuals.

The first difference between ordinary functions and system calls is the manual category. Let's study a function khello, kernel hello, which is similar to the hello function described earlier.

.Dd May 30, 2011
.Dt KHELLO 2
.Os

All system calls are in category 2. Furthermore, unless under special circumstances, system call are each accorded their own manual.

I'll use the same descriptive text as in the hello example. Note that for system calls, the hello.h header file should be in the compiler's standard include path. This is usually /usr/include on UNIX systems.

.Sh NAME
.Nm hello
.Nd print greeting messages
.Sh SYNOPSIS
.In hello.h
.Ft int
.Fo hello
.Fa "int C" "const char *prefix"
.Fc
.Sh DESCRIPTION
The
.Nm
function prints out a greeting message.
.Pp
It accepts a value
.Fa C ,
which if non-zero indicates output should be uppercase; and
.Fa prefix ,
which, if not
.Dv NULL ,
shall be prefixed to the output.
The
.Fa prefix
argument, if not
.Dv NULL ,
must be nil-terminated.

You'll notice I've omitted the LIBRARY section in this example, as system calls by definition aren't a part of a library. Furthermore, I've used the Dv macro to annotate the term NULL as a constant variable.

Let's examine the output so far.

NAME

hello — print greeting messages

SYNOPSIS

#include <hello.h>

int
hello(int C, const char *prefix);

DESCRIPTION

The hello function prints out a greeting message.

It accepts a value C, which if non-zero indicates output should be uppercase; and prefix, which, if not NULL, shall be prefixed to the output. The prefix argument, if not NULL, must be nil-terminated.

In the hello example, I included a section RETURN VALUES detailing the return value of the function. System calls, however, usually return a standard value and have a side effect of setting the C library errno variable when invoked within a C language context. This is documented with a special macro Rv.

.Sh RETURN VALUES
.Rv -std

The std flag is by convention always specified. This macro will produce standard text regarding the errno value and that the function returns -1 on failure and 0 on success.

If you have multiple functions specified in your manual, you must list them individually as arguments to Rv.

Next, the possible values of errno must be specified in the ERRORS section as a list. Let's assume that EFAULT may be set if the pointer is invalid.

.Sh ERRORS
.Bl -tag -width Er
.It Er EFAULT
.Fa prefix
points outside the allocated address space.
.El

The syntax of this list differs from lists we've already encountered. Earlier we used the special term Ds as an argument to width to specify a generic width. Here, we used Er, which is also specified at the start of each list tag (lines beginning with It).

The macro Er specifies a possible value of errno. There are many standard variable names for errno values, such as EFAULT used in our example. When we stipulate this as the argument of width, the formatter is able to translate this into a generic width of most Er macro contents.

You should avoid using this construct unless it's in a conventional way, as it is here.

If your system call is part of an operating system, it's common to add some lines as to when it was added. Let's assume you're adding the function to a fictional Foo OS. Most modern UNIX operating systems have their own macros, such as Bx for BSD UNIX. Be sure to note the version of the operating system.

.Sh HISTORY
The
.Nm
function call appeared in Foo OS version 1.0.

Let's put all of these sections together and preview the output.

NAME

hello — print greeting messages

SYNOPSIS

#include <hello.h>

int
hello(int C, const char *prefix);

DESCRIPTION

The hello function prints out a greeting message.

It accepts a value C, which if non-zero indicates output should be uppercase; and prefix, which, if not NULL, shall be prefixed to the output. The prefix argument, if not NULL, must be nil-terminated.

RETURN VALUES

The hello() function returns the value 0 if successful; otherwise the value -1 is returned and the global variable errno is set to indicate the error.

ERRORS

HISTORY

The hello function call appeared in Foo OS version 1.0.

We can make sure the manual is complete by reviewing the checklist for function documentation.

First we implied linking information by using category two (which does not need to be specially linked). Then we introduced the calling syntax of the function, naming its arguments. We also stipulated the necessary header files. In the DESCRIPTION, we described the function and its arguments in full. Lastly, we documented return values in the RETURN VALUES section and the errors set in ERRORS.

We also added a HISTORY section, which isn't mentioned as part of our checklist but is considered good practise for system calls. In general, a note on historical information is useful to put your component in the general context of related machinery.

Contents

History

Case Study

I now introduce a case study of a real-world system call manual, in particular the manual for the fsync function from OpenBSD. The original file may be viewed on-line at src/lib/libc/sys/fsync.2, file version 1.9.

.\" $OpenBSD: fsync.2,v 1.9 2011/04/29 07:12:44 jmc Exp $
.\" $NetBSD: fsync.2,v 1.4 1995/02/27 12:32:38 cgd Exp $
.\"
.\" Copyright (c) 1983, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)fsync.2 8.1 (Berkeley) 6/4/93

The cvs identifiers (both from the current system, OpenBSD, and the import source system, NetBSD), copyright, license, and sccs identifier (from the original system) are presented in the usual way: the the $OpenBSD$ and $NetBSD$ lines are automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the BSD license.

.Dd $Mdocdate: April 29 2011 $
.Dt FSYNC 2
.Os

The manual's last-modified date is maintained with the automatically-updated $Mdocdate$ sequence. Its title is set to the single function's capitalised form, category 2 for system calls under the current operating system.

.Sh NAME
.Nm fsync
.Nd "synchronize a file's in-core state with that on disk"

The Nd macro's arguments are superfluously quoted again.

.Sh SYNOPSIS
.Fd #include <unistd.h>
.Ft int
.Fn fsync "int fd"

Again, in historical manuals, Fd is sometimes used instead of the modern In macro. Note also the inclusion of the function argument's name, fd, where regular C prototypes would usually only include the type.

.Sh DESCRIPTION
.Fn fsync
causes all modified data and attributes of
.Fa fd
to be moved to a permanent storage device.
This normally results in all in-core modified copies
of buffers for the associated file to be written to a disk.
.Pp
.Fn fsync
should be used by programs that require a file to be in a known state,
for example, in building a simple transaction facility.

Since fsync is a simple function, its description is fairly straightforward. The single function argument fd is fully described as well.

.Sh RETURN VALUES
A 0 value is returned on success.
A \-1 value indicates an error.

This is not correct, as it omits information on the errno global error being set. The Rv macro should be used instead.

.Sh ERRORS
The
.Fn fsync
fails if:
.Bl -tag -width Er
.It Bq Er EBADF
.Fa fd
is not a valid descriptor.
.It Bq Er EINVAL
.Fa fd
refers to a socket, not to a file.
.It Bq Er EIO
An I/O error occurred while reading from or writing to the file system.
.El

Most (if not all) system calls set the errno global error upon failure. This, erroneously, was not mentioned in the RETURN VALUES section, but is documented here.

.Sh SEE ALSO
.Xr sync 2 ,
.Xr sync 8
.Sh HISTORY
The
.Fn fsync
function call appeared in
.Bx 4.2 .

Note that the cross-references in SEE ALSO are ordered first by section, then alphabetically. The Bx is referenced as the origin of the system call. The STANDARDS section is sorely missing, as fsync is a function specified by POSIX.1-2008 standard.

We again found several inconsistent uses of mdoc in this case study. Let this serve as a reminder that if you find bad or unusual mdoc in your manuals, notify the authors! A bug in a manual is just as important as a bug in the code.

Contents

History

Manual Syntax and Structure

In the last part, I introduced some mdoc language syntax by way of example. We covered Commands and Functions. In this part, I'll study the structure of the UNIX manual itself.

Historically, the syntax and structure of mdoc derive from roff, a text processing language predating even UNIX. mdoc was in fact a bundle of macros expanded by a formatter into roff — not a separate language. Only recently has mdoc been mature enough to consider as a standalone language.

The general syntax of roff (and thus mdoc) can be traced to the RUNOFF command from the mid-sixties! The conventions of section names and manual categories were formalised later, in the early seventies, with the Version 1 AT&T UNIX Programmer's Manual.

Although the focus of this book is obviously on mdoc, a great deal of its idiosyncrasies derive from roff, so we'll spend some time discussing seemingly-unnecessary complexity in the context of general text processing.

I reiterate that this is not a canonical mdoc reference: mdoc is not a standard, and varies in subtle ways across formatters and operating systems. In this part, I'll discuss only the portable parts of mdoc.

Contents

History

Syntax

Before studying the structure of mdoc manuals, let's review the language we've seen so far. Foremost, we've noticed that mdoc documents consist only of printable ASCII characters. We noted that a period at the beginning of a line indicates a mdoc macro:

.Qq hello, world

It's safe to say, in this case, that mdoc is line-oriented in that programme flow is in part governed by position on a line. In the case of Qq, we saw how the macro extends to the end of the line. This is also the first notion of scope, specifically scoping to the end of line. We then saw examples where scope covers multiple lines and accommodates for nested macros as well as text.

.Sh DESCRIPTION
The
.Nm
utility...

We were briefly introduced to the concept of macros accepting flags and flag arguments.

.Bl -tag -width Ds
.It List key.
List value.
.El

Finally, we noted that double-quotes have special semantic significance, which led to the topic of escaped terms such as \(dq for a double-quote character. We also saw how punctuation is treated in special ways when lying at line boundaries.

End of sentence, end of line.
Same goes with
.Em macros .

In this chapter, we'll formalise these concepts. I'll draw my terminology from the literature of formal languages and grammar, but it's not necessary to be familiar with the terms beforehand.

Contents

History

Input Encoding

Without exception, a well-formed mdoc document consists only of ASCII printable characters, the space character, the newline character, and in some cases the tab character. Most modern formatters allow for CR+LF newlines \r\n, but this is not portable. Modern formatters also accommodate for unlimit to line length; this is not necessarily the case for legacy formatters.

Unilaterally, the backslash \ is always interpreted as the beginning of an escape sequence. If an escape precedes a newline, it escapes the current line:

.Em This is considered one \
line of input.

Macro Line

Formally speaking, a macro line is one beginning with a control character. In mdoc, this is traditionally the . character, although historical documents may also use the ' character. This notation extends back to the historical RUNOFF utility.

Control Words:

Input generally consists of English text, 360 or fewer characters to a line. Control words must begin a new line, and begin with a period so that they may be distinguished from other text. RUNOFF does not print the control words.

A line with only a control character followed by zero or more whitespace characters is stripped from input.

A macro line may, in some circumstances, contain more macros. The first macro — the one following the control character — may then be distinguished as the line macro.

On macro lines the following non-alphanumeric characters are syntactically meaningful as follows. These characters are collectively called reserved characters.

!	punctuation
"	control character (quotation)
(	punctuation
)	punctuation
,	punctuation
-	control character (macro argument)
.	punctuation
:	punctuation
;	punctuation
?	punctuation
[	punctuation
\	control character (escape sequence)
]	punctuation
\|	punctuation

To pass these characters along as literal text, you must either escape or quote them.

If an unescaped space character is encountered on a macro line, it is used to delimit macros, macro arguments, and flags. Multiple consecutive space characters have no effect on output.

.Em Hello, World
.Em Hello, World

The spaces between Hello, and world delimit arguments in this case, and produce the same output of Hello, World without extra spaces.

Text Line

A text line is any line not beginning with a control character. Text lines are never parsed for macros and may consist of printable ASCII character. Text lines are concatenated together when forming output, so unless in certain circumstances, newlines are stripped from input. Using a blank text line as a vertical separator is not portable.

If a space character is encountered on a text line, it is reproduced verbatim in the output.

Hello, World
Hello, World

The spaces between Hello, and world will be reproduced in both cases as-is. However, it is considered non-portable to use spaces on a text-line to shape output: HTML, for example, by default collapses whitespace. Secondly, consider whether controlled spacing between text in an otherwise free-form text sequence is appropriate. In most space-retaining cases, such as in source code examples, you're better off with a literal display mode such as covered at the end of this section.

Do not use the space-retaining feature to create double-spaces following a sentential period! See Sentential Punctuation for how to do this properly.

If the first letter of a text line is a space character, the output line shall be preceded by a newline. This creates the effect of an implicit literal display.

Hello, World.
The newline, leading spaces, and in-line spacing are retained.
This is free-form text.

The portability of this behaviour is unknown. For greater portability (and semantic annotation), a literal display mode should be opened instead with, for example, the Bd literal:

Hello, World.
.Bd -literal -compact
The newline and leading spaces are retained.
.Ed
While this is not.

In this example, the compact flag prevents leading vertical space. To effect a vertical space following the literal display, use a Pp.

Consider the following example:
.Bd -literal
int a_function(int *foo, int bar) {
*foo += bar;
}
.Ed
.Pp
This is subsequent text.

Contents

History

Escape Sequences

An escape sequence is any grouping of characters following a backslash \. This may happen anywhere in input. What follows the escape sequence syntactically depends upon the first letter. The following sections describe common escape sequences. The use of any other sequence is strongly discouraged for portable manuals; in fact, the use of any escape beyond \& should be strongly avoided: it makes manuals in different output formats inconsistent depending on their methods of glyph rendering.

Special Characters

Special characters allow the encoding of non-ASCII characters and, in macro lines, the use of reserved characters. Special characters may be invoked anywhere in input.

There are three forms of special character, distinguished by the number of letters in the sequence.

\n	one-letter
\(nn	two-letter
\[N]	n-letter

The n-letter form may be used to express any of the others. For example, \& (a zero-width space) is equivalent to \[&]. The most common escape sequence is in fact \&, a non-printing, zero-width space. When preceding a word, it automatically causes it to be rendered as regular text:

The following flags are also macros:
.Fl \&Ar

If the Ar were not preceded with an escape, it would have be interpreted as the Ar macro instead of the flags Ar. An alternative to this is to quote the argument (see Quotation). The zero-width escape is found more readily in literal contexts beginning with a period, such as

.Bd -literal
\&.Fl Ar
.Ed

Predefined Strings

An alternative form of special character is the predefined string. These are legacy roff constructs of an escape sequence that may be programmatically set or unset. The syntax for predefined strings follows:

\*n	one-letter
\*(nn	two-letter
\*[N]	n-letter

The use of predefined strings is discouraged in portable manuals, as available strings may differ between implementations and formatters.

Contents

History

Comments

Comments — words in an mdoc document not interpreted by the formatter — are indicated by the special character \".

Regular text. \" In a comment.
.Em A macro . \" Another comment.

The comment extends from the special character to the end of the line. If the newline is escaped, the comment only applies to the current line. In other words, the newline escape is commented.

Not in a comment, \" in a comment \
Not in a comment.

A comment may span an entire line if it's specified as a pseudo-macro, that is, following the control character ..

.\" This is a full-line comment.

Contents

History

Macros

So far we've considered several different types of macros. A macro is usually a terse, two or three-character sequence specified on a macro line. In this section I formalise the various types of macros, categorised by their scope rules. As with many other languages, macros instructions are either scoped to one line (following a single instruction), which I call in-line; or to multiple lines (bracketed between instructions), which I call blocked. Block macros are usually invoked on a line of their own, as with Bd, but may also be invoked within a line.

Generally speaking, a macro is syntactically defined as having a macro name, and optionally flags and with optional flag arguments. The arguments to a macro depend on its scope rules.

Name Flag Arg

The hyphen - indicates a macro flag only when the preceding macro accepts arguments.

In-line Macro

An in-line macro's arguments are scoped to the current line. Its scope may also be closed out by subsequent macros: an in-line macro can never contain a nested macro. For a complete reference, see In-Line macros in the mdoc reference.

Name Flag Flag Arg Arg...

Not all in-line macros accept arguments, and some in-line macros accept a fixed number of arguments.

For example, the regular way of structuring command-line arguments, as described initially in the Elaborate Function guide, is a command flag, followed by flag arguments, followed by regular arguments. We can put most invocation forms on one line as follows.

.Op Fl W Ns Ar level

In this example, Ar, Fl, and Ns are in-line macros. The Op is a block partial-implicit. The Fl macro opens within the Op and is closed by the Ns, which accepts no arguments at all. This suppresses the space between the flag and its arguments (this alternative style is used at times, but discouraged). The arguments are opened by Ar and close at the end of the line.

The following is an example of macros with a fixed number of arguments:

.Xr mandoc 1 Ap s

The Xr macro accepts the mandoc and 1 arguments, then reverts to accepting text. The Ap accepts no arguments, so it immediately reverts to the trailing text.

Finally, an example of an in-line macro accepting flags follows:

.St ansiC

The argument to St specifies the standard to be printed.

Block Partial Implicit

A block partial macro is similar to an in-line macro in that its scope is restricted to the current line: it is implicitly closed by the end of line (as opposed to block partial explicit macros) and partial in that it only extends to the current line (as opposed to block full implicit macros). Unlike an in-line macro, it accepts nested macros (hence block macro). For a complete reference, see Block Partial Implicit macros in the mdoc reference.

Name Flag Flag Arg Arg|Macro...

The scope of a partial block macro is always closed by the end of the line; any macros between it and the end of line are interpreted as nested macros. We began this book with the block partial implicit macro Qq. The nested qualities of this macro category may be seen by embedding Qq and Pq

.Pq Qq Parenthesised quotation .

Be warned. If you open but do not close a block partial explicit macro before the end of the line, behaviour is not always well-defined as the scope is broken.

Block Full Implicit

A macro seen early on, the Sh macro, is block full implicit. Unlike block partial implicit macros, these consist of multiple lines (they are blocks) and treat the line arguments and multi-line arguments differently (full). For a complete reference, see Block Full Implicit macros in the mdoc reference.

.Begin Flag Flag Arg Arg...
Arg...

The scope of Begin is closed out implicitly — by one of several possible macros or the end of file. The notion of a full macro is obvious when considering Sh:

.Sh SECTION 1
Sectional text.

.Sh SECTION 2
Sectional text.

In this, the macro must separately decorate its line arguments and multi-line arguments. In this case, the line arguments must be bolded while the multi-line arguments must be indented. The Sh macro is closed out by a subsequent Sh or the end of file. Compare this to Ss, which closes out with a subsequent Sh, Ss, or end of file.

Block Partial Explicit

The simplest multi-line macro is the block partial explicit, which is opened and closed by two separate (explicit) macros. It is partial because it does not differentiate between arguments on the current line or subsequent lines, as opposed to block full explicit macros. The pair of macros involved in a full block macro are called the beginning and ending macros. For a complete reference, see Block Partial Explicit macros in the mdoc reference.

.Begin Flag Flag Arg Arg...
Arg...
.End

One must be careful, in full block macros, not to break the scope of other block macros, or behaviour is undefined.

We have not yet considered a block partial explicit macro pair in this book. But we can do so by considering Oo and Oc. This pair of macros, for option open and option close, extend the behaviour of Op to multiple lines.

.Fl W Oo
warn|error|fatal
.Oc

Block Full Explicit

The block full explicit macros are full in the sense that arguments on the macro line and arguments following are treated differently (like block full implicit macros). The earliest example of this is the Bl. These macros are explicitly closed by a closing macro and may contain nested macros. For a complete reference, see Block Full Explicit macros in the mdoc reference.

Consider the Bd macro, which does not accept line arguments (most block full explicit macros do not accept line arguments). It is manually closed by Ed.

.Bd ragged offset indent

Display text.
More display text.

.Ed

In this example, the text between the Bd and Ed are treated specially.

Contents

History

Punctuation

The mdoc language, in descending from the type-setting language roff, has significant type-setting capabilities. Punctuation is treated specially in all mdoc documents, both in terms of macro and text lines.

The following characters are considered punctuation:

!	ending sentence
"	ending enclosure
(	opening enclosure
)	ending enclosure
,	ending
.	ending sentence
:	ending
;	ending
?	ending sentence
[	opening enclosure
]	ending enclosure
\|	intervening

These are treated specially by the formatter when used in macro lines and at the end of text lines.

Sentential Punctuation

End of Sentence, End of Line.

The end of a sentence should always be at the end of a line. This way, the formatter can recognise a sentence by the punctuation used and insert the correct amount of spaces. If supported by the output media (HTML, for example, does not), all modern mdoc formatters use English spacing to mark sentence boundaries. The ending sentence punctuation in the punctuation table marks an end of sentence.

In text lines, sentence punctuation should always occur at the end of the line.

End of sentence.
End of line.
("Even with nested sentences.")

Note, in the last sentence, that the formatter will recognise sentence punctuation even when followed by ending enclosure punctuation as noted in the punctuation table.

However, take care that non-sentence punctuation, such as for abbreviations, does not happen to fall at the line boundary.

Paging Dr.
Freud.

In this case, the formatter will interpret Dr. as ending a sentence. In this event, you can either restructure your line or add a zero-width escape following the period.

Paging Dr.\&
Freud.

Macro lines are slightly more complicated. The same rules apply, but punctuation marks must be separated by spaces. The formatter will understand the role of the punctuation and remove the spaces accordingly, or reorder sentence and closing punctuation.

Text (parenthesised
.Em text ) .
.Qq Properly period-closed quotation .

The punctuation may be escaped by either a trailing escape, as in the text case, or a preceding escape. In this case it is not considered punctuation, but regular text. Note that this will also cause an intervening space to be printed.

.Em End of sentence .
.Em Not end of sentence \&.
.Em Not end of sentence .\&

Regular Punctuation

Non-sentential text line punctuation — commas, parenthesis, quotes, etc.— is a matter of literal printing.

Some text (punctuation), another "clause".

The rules for macro lines are the same but for in-line macros, which might decorate individual terms with text. In this case, punctuation as a standalone argument is specially treated in that it is not decorated, and whitespace removed according to the punctuation type (opening, closing).

.Em ( Nicely spaced and decorated . )
.Em (All text decorated, no end-of-sentence.)
.Em ( Text alright , excepting the period \&. )

In the second example, (All and end-of-sentence.) are considered arguments, and thus not accommodated for in terms of punctuation. In the third, the period is escaped and thus considered regular text.

Contents

History

Quotation

Several times I've mentioned how to interpret macro arguments as text — instead of, say, other macros — by quotation. In this section, I formalise the notion of quoting arguments. The issue of quotation is fairly complex owing to mdoc's predecessor, roff.

In short, quoting arguments to macros passes the enclosed text verbatim as a single argument. An obvious case follows:

.Fl "Ar"

By quoting Ar, it is passed verbatim to Fl If not, it would be interpreted as the macro Ar and open a new macro scope. What's worse is that the syntax is entirely legal! This illustrates a minor short-coming of mdoc: beginners may unwittingly invoke macros (such as Ar in our example). Printing a warning would cause more harm than good with well-formed manuals; thus, it's the responsibility of the document author to double-check that macro instructions are properly treated.

This condition could have been avoided by beginning the argument Ar with a zero-width escape, such as \&Ar. The need for quotation is more obvious with the Fn macro:

.Fn int foo int bar

The syntax of Fn is that it first accepts an optional function type, then a function name, then arguments to the function. These arguments usually include a type followed by a name. In our example, int refers to the function type, foo to the name, and both int and bar as separate arguments.

Our intention, however, was to have int bar considered a single argument. To do so, we would need to quote.

.Fn int foo "int bar"

The int bar argument is now passed intact to the macro.

To include quotation marks in quoted text, use two quotation marks in a row.

.Li """ "

This artificial invocation passes a quotation mark followed by four whitespaces to the Li macro. It is, however, unwise to use this language component: it's jarring to those expecting symmetric quotes, and easy to mis-type, leaving runaway quotes. It's safer to use an escape, such as \(dq, instead of pair-wise quotations.

Contents

History

Structure

An mdoc manual is divided into two logical parts: the prologue and the document body.

.\" Prologue follows:
.Dd May 26 2011
.Dt MDOC 7
.Os
.\" Document body follows:
.Sh NAME
.Nm mdoc
.Nd mdoc language reference
.Sh DESCRIPTION
The
.Nm mdoc
language is used to format
.Bx
.Ux
manuals.

The prologue specifies information regarding the manual's classification. For the most part, this information does not change over the course of development. It specifies the manual's title (which may encompass multiple documented components) and category, the date of last editing, the other information.

.Dd May 26 2011
.Dt MDOC 7
.Os

The document body consists of the documentation content. This material changes over the course of development, and is the bulk of the manual page. It minimally consists of the component name, invocation syntax (if applicable), and a description of operation.

.Sh NAME
.Nm mdoc
.Nd mdoc language reference
.Sh DESCRIPTION
The
.Nm mdoc
language is used to format
.Bx
.Ux
manuals.

Contents

History

Prologue

The prologue consists at most of the Dd, Dt, and Os macros. These always occur at the beginning of a manual.

.Dd May 26 2011
.Dt MDOC 7
.Os

The only firm requirement of the mdoc prologue is that the Dd macro comes first: many formatting systems will read up to the first macro to determine the formatting language. If Dd is not encountered first, the mdoc format may not be recognised.

Following the Dd, the prologue is conventionally ordered as first Dt and then Os. The Os macro is usually left without arguments, meaning that the manual applies to the current system.

After parsing the document prologue, the following is known:

The date of last modification.
The canonical title of the manual.
The manual category (manual section).
Whether the manual relates to a particular hardware architecture.
The relevant operating system.

Date

The date is specified by the Dd macro.

.Dd date

While no particular date format is required, it's best to use the month day, year format, where month is the month in English; day is the day of month; and year is the four-digit year. Arbitrary white-space may separate the tokens, which may also be quoted.

Example of canonical form:

.Dd June 03, 1991

Example of not zero-padded digit form:

.Dd June 3, 1991

Example of quoted-string form:

.Dd "June 3, 1991"

All of the above examples will normalise to the third of June, 1991. It's especially important that the month be in English, as not all operating systems support localisation.

Some formatters also support a special date format as follows:

.Dd $Mdocdate: October 10 2021 $

This is usually used in conjunction with source-code control systems that automatically change the date. Consult your formatter's manual for whether it supports this feature.

Title

A manual's title identifies the entire manual document. It is always specified in uppercase as the first argument of the Dt macro, which conventionally follows the initial Dd macro.

.Dt TITLE category architecture

The title usually corresponds to the file-name of the document, but this is not necessarily the case.

In the case of a single-component manual, such as the manual for a single UNIX command or programming function, the title corresponds to the manual name as specified with the SYNOPSIS Nm macro argument.

In the event of multiple components, such as a programming library, the title usually corresponds to the library name. If multiple commands are specified, such as with aliased names, the canonical form should be used.

Example of a title for the ls utility:

.Dt LS 1

Example of a title for the libgreeting function library, consisting of the hi and hello functions:

.Dt GREETING 3

If the title is left unspecified by omitting the Dt macro, behaviour is undefined. Usually a formatter will default to an empty string or LOCAL. In general, however, a manual without Dt may be considered incomplete.

Architecture

Some manuals, especially those in category 4 or 9, relate only to a particular hardware architecture. This is a useful specifier in the machine-dependent manuals for category 9 manuals.

These use the optional third argument of the Dt macro.

.Dt TITLE category architecture

For a list of possible architectures, consult your local documentation. A safe example is i386, for 32-bit x86-based systems; or amd64 for 64-bit AMD systems.

A device referring to a particular architecture uses this to explicitly note its relevant architecture. In normal manuals, this should not be used.

Operating System

Similar to architecture, some manuals only pertain to a particular operating system. This system may be specified to the Os macro of the prologue.

.Os system

If system is unspecified, the manual is assumed to apply to any operating system.

This form is useful when multiple operating systems have access to local-network administrative manuals, such as in a networked file-system environment. Otherwise, it is rarely used.

Contents

History

Document Body

The document body begins with the first macro not in the prologue set (Dd, Dt, and Os). The document body consists of the manual content itself, and varies significantly between categories and, of course, the material itself.

.Sh NAME
.Nm mdoc
.Nd mdoc language reference
.Sh DESCRIPTION
The
.Nm mdoc
language is used to format
.Bx
.Ux
manuals.

The content of the document body is divided into sections. Sections are indicated by the Sh macro.

.Sh SECTION NAME
Text within the section...

As described in the introduction, a section consists of its line arguments and all subsequent lines until the end of file or another Sh macro.

By convention, Sh arguments are capitalised. I'll describe conventional sections at length in the next chapter, as for the most part follow long-standing document conventions.

In general, the document body requires at least the NAME and DESCRIPTION sections, and usually the SYNOPSIS section as well. The first section must be NAME, optionally followed by SYNOPSIS. The DESCRIPTION section must follow either the NAME or SYNOPSIS.

Contents

History

Layout

An mdoc document body is divided into sections. The names and ordering of these sections is dictated by convention extending to the Version 1 AT&T UNIX Programmer's Manual.

The name section repeats the entry name and gives a very short description of its purpose.

The synopsis summarizes the use of the program being described. A few conventions are used, particularly in the Commands section. Underlined words are considered literals, and are typed just as they appear. Square brackets ([]) around an argument indicate that the argument is optional. When an argument is given as name, it always refers to a file name. Ellipses ... are used to show that the previous argument-prototype may be repeated. A final convention is used by the commands themselves. An argument beginning with a minus sign - is often taken to mean some sort of flag argument even if it appears in a position where a file name could appear. Therefore, it is unwise to have files whose names begin with -. The description section discusses in detail the subject at hand.

The files section gives the names of files which are built into the program.

A see also section gives pointers to related information.

A diagnostics section discusses the diagnostics that may be produced. This section tends to be as terse as the diagnostics themselves.

The bugs section gives known bugs and sometimes deficiencies. occasionally also the suggested fix is described.

The owner section gives the name of the person or persons to be consulted in case of difficulty. The rule has been that the last one to modify something owns it, so the owner is not necessarily the author.

These conventional sections haven't changed much over the years, although more sections have been added and several have changed with evolving UNIX operating system conventions. The full set of modern sections, and their order, is as follows.

NAME: Name of all documented components and a collective description.
SYNOPSIS: Calling syntax of the components.
DESCRIPTION: Description of all components. This constitutes the bulk of the manual.
IMPLEMENTATION NOTES: Specific notes on the implementation of a generic (e.g., standardised) component.
RETURN VALUES: Return values, if the components are functions.
ENVIRONMENT: Environmental variables affecting the components' operation.
FILES: Files affecting the components' operation.
EXIT STATUS: Exit status, if the components are commands.
EXAMPLES: Brief examples of invocation.
DIAGNOSIS: Error conditions, if a command or device driver.
ERRORS: Error conditions, if a function or library.
SEE ALSO: Links to other relevant manuals or references.
STANDARDS: Implemented or referenced standards.
HISTORY: A brief history of the components.
AUTHORS: The authors of the components.
CAVEATS: Caveats regarding the components' operation.
BUGS: Known bugs in the components.
SECURITY CONSIDERATIONS: Security precautions beyond the scope of the components.

Only the NAME and DESCRIPTION sections are required in the document body, although a SYNOPSIS should appear for most manuals as well.

Other sections may be necessary depending on the category. For example, RETURN VALUES is found for most category 3 and 2 manuals; while EXIT STATUS is found for most category 1, 6, and 8 manuals.

Contents

History

Required Sections

As discussed previously, a section is begun by the Sh macro and continues until the end of file or another section.

.Sh SECTION NAME
Text and macros within the section.

What follows is a description of each required section: if your manual does not have the documented section, it should not be considered as well-formed. Do note, however, that some types of manuals lack the SYNOPSIS section.

NAME

The NAME section immediately follows the document prologue and is thus usually the first macro of the document body. It specifies the name of each documented component, and provides a brief description of the components as a whole.

The following is an example of the NAME section for a single utility, hi.

.Sh NAME
.Nm foo
.Nd print a simple greeting

The Nd macro should consist of a single line without trailing punctuation or leading capitalisation. As a rule of thumb, this description should be a sentence clause in the imperative mood for commands and functions, or simply a noun phrase for file formats, devices, and miscellanea.

Example imperative:

.Nm foo
.Nd print a simple greeting

Example noun phrase:

.Nm mdoc
.Nd mdoc language reference

In the event of multiple named components, such as a function library or aliased commands, comma-separate each command but for the last. It's common to alphabetically order this listing.

.Nm hello ,
.Nm hi
.Nd print greetings

Note that the punctuation should be separate from the macro argument. This allows the formatter to distinguish between the name and trailing punctuation.

SYNOPSIS

The SYNOPSIS section, if specified, follows the NAME section. It specifies the calling syntax of a component, thus, it is necessary for functions and commands. The SYNOPSIS section has a layout dictated by convention, and depends upon the category.

Commands

For command manuals in category 1, 6, and 9, each command must have its full syntax stipulated.

.Nm hello
.Op Fl a
.Op Fl o Ar output
.Op Ar prefix

This defines three optional arguments for the hi command: a flag, a flag with an argument, and an argument. Flags should be purely alphabetical, without regard to whether a flag takes an argument. Arguments should also be alphabetised.

Note that if your manual only documents one component, it's unnecessary to re-write the command name for Nm. If omitted, the last specified Nm in the NAME will be used.

Multiple commands are specified in the order they appear within the NAME section.

.Nm hello
.Op Fl a
.Op Fl o Ar output
.Op Ar prefix
.Nm hi

Since there are multiple Nm macros in the NAME section, it's necessary that we specify the name of each command. In this example, an additional command hi is specified, which has neither flags nor arguments.

Functions

Function libraries are more complicated, as they involve more diverse content. A function library SYNOPSIS section consists of all documented material, including header files, functions, variables, macros, and so on.

A minimum function manual consists of a single function call and the header file of its prototype (if in a language requiring header files, such as C):

.In greeting.h
.Ft int
.Fo hello
.Fa "int C"
.Fa "const char *output"
.Fc

The header file comes before those functions it describes. If one or more header files are required, list them in the order of inclusion in source files.

.In sys/types.h
.In greeting.h
.Ft int
.Fo hello
.Fa "u_int C"
.Fa "const char *output"
.Fc

If multiple functions are documented, list them in the order they appear in the NAME section.

.In sys/types.h
.In greeting.h
.Ft int
.Fo hello
.Fa "u_int C"
.Fa "const char *output"
.Fc
.Ft void
.Fn hi

List any global variables with the Vt and/or Va macro following function prototypes.

.In sys/types.h
.In greeting.h
.Ft int
.Fo hello
.Fa "u_int C"
.Fa "const char *output"
.Fc
.Ft void
.Fn hi
.Vt extern const char * const * greetings;

Macro definitions, however, should come before the function prototypes. These use the Fd macro and must include the preprocessor directive for the macro.

.In sys/types.h
.In greeting.h
.Fd #define GREETING
.Ft int
.Fo hello
.Fa "u_int C"
.Fa "const char *output"
.Fc
.Ft void
.Fn hi
.Vt extern const char * const * greetings;

Some manuals define a range of functions with differing header dependencies. In general it's not a good idea to group these within the same manual. However, if necessary, arrange the functions and variables underneath their header file In macros. These need not necessarily much with the NAME section ordering, but should be as close as possible.

DESCRIPTION

This section documents the component itself, and is usually the longest. For commands, each command is described in detail along with its arguments. For functions, each function must be described along with its types and arguments.

Commands

A command or set of commands is documented in DESCRIPTION with a brief explanation of behaviour, default usage, then a list of arguments. Some utilities state default usage following the argument list; however, manpages beginning with these statements are more readable and economical.

The
.Nm
command prints a mixed-case greeting to standard output.
.Pp
The arguments are as follows:
.Bl -tag -width Ds
.It Fl C
Whether to uppercase the output.
.It Fl o Ar output
A file into which output should be written.
.It Ar prefix
A string prefixed to the output.
.El

If multiple commands are included, they should be listed in the order they appear in NAME and DESCRIPTION. Remember to specify a documented command, in this case, whenever invoking the Nm macro. Command exit statuses are documented in EXIT STATUS.

Functions

Functions do not share the The arguments are as follows convention that commands enjoy. Most often, a function is described in paragraph form.

The
.Fn hello
function prints a greeting to standard out.
If
.Fa C
is non-zero, output is upper-cased.
If
.Fa prefix
is non-NULL, it is prefixed to the output.

A function with many variables, or complicated variables, may wish to choose the same listed-argument notation of commands.

The
.Fn hello
function prints a greeting to standard out.
The arguments are as follows:
.Bl -tag -width Ds
.It Fa "C"
If non-zero, output is upper-cased.
.It Fa "prefix"
If non-NULL,
.Fa prefix
is prefixed to the output.
.El

Above all, you must be careful to document each argument to each function. Function return values are usually documented in RETURN VALUES.

Contents

History

Optional Sections

As discussed previously, a section is begun by the Sh macro and continues until the end of file or another section.

.Sh SECTION NAME
Text and macros within the section.

What follows is a description of each optional section. An optional section is not required for a well-formed manual, but may be necessary for a manual of a given type. For example, the EXIT STATUS section is not required, but is necessary for utilities.

IMPLEMENTATION NOTES

For components describing an algorithm, or implementing a generic interface, it's at time useful to document the implementation itself. As this is not relevant to the calling syntax or description of a component, this is relegated to the IMPLEMENTATION NOTES section.

Consider a simple sorting function, mysort.

.Sh SYNOPSIS
.In mysort.h
.Ft void
.Fn mysort "int *input"
.Sh DESCRIPTION
The
.Fn mysort
function in-place sorts an integer array
.Fa input .
.Sh IMPLEMENTATION NOTES
The
.Fn mysort
function uses a bubble-sort algorithm for sorting and thus operates in
O(n^2) time with respect to input size.
Since swapping is in-place, a constant number of allocations occur.

In general, IMPLEMENTATION NOTES is not used, and is thus rarely found in UNIX manuals.

RETURN VALUES

Manuals describing functions (categories 2, 3, and 9) must use the RETURN VALUES section to document each function's return value. If a manual documents functions in a language without return values, or functions do not return a value, they need not use this section.

System calls (category 2) usually employ the Rv macro to stipulate a standard return value statement.

.Sh RETURN VALUES
.Rv -std

Note that the std flag is a required argument to Rv, for historical reasons.

For non-system functions, be as brief as possible.

.Sh RETURN VALUES
The
.Fn hello
function returns zero on success and non-zero on failure.

ENVIRONMENT

Both commands and functions may be affected by UNIX environmental variables. These must be documented in the ENVIRONMENT section of the manual. Each variable should be listed as a Ev along with its effect on the component.

.Sh ENVIRONMENT
.Bl -tag -width Ds
.It Ev TZ
The time zone for when displaying dates.
.El

Some historical manuals use ENVIRONMENT VARIABLES instead of ENVIRONMENT.

FILES

Both commands and functions may also be affected by files, although this is mainly the purview of commands. These files should be listed in the FILES section in a tagged list.

.Sh FILES
.Bl -tag -width Ds
.It Pa ~/.profile
User's login profile.
.El

EXIT STATUS

This section is the dual of RETURN VALUES for commands in categories 1, 6, and 8. It documents the exit status of commands.

If your utility exits with zero on success and greater than zero on failure (the standard for most utilities), use the Ex macro.

.Sh EXIT STATUS
.Ex -std

More complex commands should document all possible exit status.

EXAMPLES

In many situations of casual use, the EXAMPLES section is the first visited in a manual. It should consist of concise, documented examples of the most common uses of your component.

For commands, the EXAMPLES section should contain a handful of common use cases. In general, these should consist of standalone invocations and, if the input and output correspond to other utilities, invocations as part of a pipeline.

Although the hello example is almost too trivial for documentation, consider if it were used to greet new users to a Unix system. Thus, a common example would be the following:

.Sh EXAMPLES
The following sends a greeting message to the new user
.Qq joe .
.Pp
.Dl $ hello \(dqDear Joe, \(dq \(bv mail -s \(dqHi!\(dq joe

The Dl, used for one-line literal displays, is a common macro in the EXAMPLES section. For multi-line displays, use the Bd literal environment, usually with a default indentation with offset indent.

.Sh EXAMPLES
The following sends a greeting message to the new user
.Qq joe .
.Bd -literal -offset indent
$ hello \(dqDear Joe, \(dq \(bv \e
mail -s \(dqHi!\(dq joe
.Ed

For functions and function libraries, it's more common to include a single, thorough source example than individual examples for each function. These always use the Bd literal display environment and an indentation.

.Sh EXAMPLES
The following is a simple utility interfacing with the
.Nm
library:
.Bd -literal -offset indent
#include <stdlib.h>
#include "hello.h"

int
main(int argc, char *argv[]) {
hello(0, argc > r ? argv[1] : NULL);
return(EXIT_SUCCESS);
}
.Ed

Use terse syntax for your example, without error checking for functions not being documented, e.g., open or scanf.

Some manuals will use the vS and vE macros around source code. These are not mdoc and should be avoided in portable manuals.

DIAGNOSTICS

If your component emits regular debug, status, error, or warning messages, document the syntax of these messages in DIAGNOSTICS.

Some historic manuals document function return values in this section, but modern practise is to do in RETURN VALUES or, if setting the error constant of the C library, ERRORS.

The Bl diag lists are most often used for documenting emitted messages.

.Sh DIAGNOSTICS
.Bl -diag
.It "%file:%line:%col: %msg[: %extra]"
An error occurred when processing
.Cm %file
at line and column
.Cm %line
and
.Cm %col ,
respectively.
The error message
.Cm %msg
may be followed by additional debugging information in
.Cm %extra .
.El

ERRORS

This section is used exclusively by functions that set or return a regular error code. The most common use is for system calls (category two) setting error constants in the C library. In either case, this section should consist of a single list documenting all possible error codes. In the latter case, each error should be labelled within the Er macro.

STANDARDS

If your component references any standards literature, it should be mentioned here. Most standards (e.g., POSIX, ANSI, etc.) may be semantically noted with the St macro. When implementing standardised wire protocols, references to RFC and other literature should also be mentioned here. These differ from referenced standards in terms of being implemented versus referred.

.Sh STANDARDS
The
.Nm
utility is compliant with the
.St -xpg4
specification.

If your component consists of deviations from a given standard, they should be mentioned in this section as well. Some historic manuals use a special COMPATIBILITY section for this, but this is discouraged unless when discussing compatibility with non-standard but common utilities.

This section has also been referred to as CONFORMING TO on some GNU/Linux manuals.

HISTORY

Some components have a historical basis: this should be included here. Keep this information terse and, above all, correct.

If your manual includes prior implementations of your component, for example, it's common to include the dates and developers of those prior implementations.

AUTHORS

Another standard section for UNIX manuals is the AUTHORS section, which should always mention the current contact for the utility. The traditional text for this section is as follows.

.Sh AUTHORS
The
.Nm
reference was written by
.An Joe Foo Aq joe@example.com .

However, in as e-mail addresses are a ubiquitous form of contact, it's considered good practise to use the correct semantic notation.

.Sh AUTHORS
The
.Nm
reference was written by
.An Joe Foo ,
.Mt joe@example.com .

The term reference in this fragment should reflect the content of the manual.

CAVEATS

The CAVEATS section is not often used. It consists of text relevant to unexpected (but unchangeable) behaviour of the component.

BUGS

If the component has known bugs, they should be listed here. In some historic manuals, authors used this section to list no bugs present; however, this text can be misleading for machine-readers of manuals and should be avoided in new manuals.

SECURITY CONSIDERATIONS

The SECURITY CONSIDERATIONS section is reserved for components whose deployment may be sensitive to security conditions, such as network daemons. It should include suggestions on security measures beyond the scope of the component.

This section was historically called SECURITY.

Contents

History

Tools

By now we've discussed the mdoc language by way of example and a non-authoritative reference. I've even referenced some formatters, such as mandoc and nroff. In this part, I'll focus on the environment of mdoc: formatters, project integration, and so on.

After reading this part, you'll have a much better idea of how to read, write, and format mdoc on your operating system. The bias of this part, however, will be toward UNIX systems.

Contents

History

Formatters

The most important part of the mdoc tools is the formatter, which compiles an mdoc document into an output format.

NAME

cat — concatenate and print files

SYNOPSIS

cat	[-benstuv] [file ...]

DESCRIPTION

The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (‘-') or absent, cat reads from the standard input.

All formatters must adhere to the general conventions set forth in the Version 1 AT&T UNIX Programmer's Manual, which details the terms that are in bold and those that are italicised (rendered with underlines in terminals).

Most formatters also support printer-friendly output, usually to PS or PDF. Some also include HTML or XHTML for web publication.

In this book, I'll focus only on contemporary formatters. Originally, mdoc, as a macro set for the roff language, was exclusively formatted by the troff and nroff utilities as distributed with BSD UNIX. Historically, troff was tailored for printers and graphical output, while nroff focussed on terminal output.

Most modern utilities, however, encompass both of these capabilities.

Contents

History

groff (GNU troff)

The GNU project wrote the groff utility as a reimplementation of ditroff, which encompassed the functionality of the historical nroff and troff utilities. The first version was released in 1990, and it is still actively maintained. groff is significant in that it is the predominant implementation of nroff and troff on contemporary UNIX operating systems.

On systems with groff installed, both troff and nroff invoke the underlying groff utility. It is able to produce the classical terminal and PS output, along with more recent support for XHTML, HTML, and PDF. It has strong support for non-ASCII output on supporting media. Consult your local groff manual all possible outputs via the T flag.

The mdoc implementation in groff was entirely re-written in version 1.17. Prior to this, input documents had some severe restrictions. Most notably, macro lines were limited to 9 arguments, Bl column macros had a restricted syntax, and displays such as Bd could not be nested.

The groff utility is supported on both UNIX and non-UNIX operating systems.

Examples

Paging a manual to a UNIX terminal:

groff -Tascii -mandoc file.1 | less

To strip the escape-character encoding of output to create clean, printable ASCII output:

groff -Tascii -mandoc file.1 | col -b >file.1.txt

Generating PS output:

groff -Tps -mandoc file.1 >file.1.ps

Contents

History

mandoc (mdocml)

The mandoc utility is a specialised mdoc formatter: although it also supports some other UNIX manual formats, it does not accept general-purpose roff input. Development began in 2008 to replace groff with an ISC licensed, high-speed reimplementation.

mandoc may be invoked as troff or nroff as its command-line arguments overlap. It supports the classical terminal and PS forms, and has very strong support for HTML and XHTML. PDF output is supported as well.

By considering mdoc as a special language, mandoc compiles its input into a representation of semantic content. This diverges from troff and its descendants, which compile mdoc into its basis form, roff, then into a presentational representation. As such, mandoc is also used for semantically querying manual content and for the rigorous validation of manuals.

The mandoc utility is supported on both UNIX and non-UNIX operating systems.

Examples

To validate a manual:

mandoc -Tlint file.1

To page a manual in the current locale (if supported) so that non-ASCII special characters render as proper glyphs:

mandoc -Tlocale file.1 | less

Produce HTML with a style-sheet:

mandoc -Thtml -Ostyle=file.css file.1 >file.1.html

Contents

History

Project Integration

mdoc documents fit perfectly into a UNIX development environment. In general, this is defined by a group of source files that produce executables as compiled and linked by make, called a build system. Sources are usually version-controlled using cvs, called revision control.

In this section, I discuss methods for integrating mdoc documents into a source-controlled build environment. I'll focus on mandoc as a formatter, but provide stubs for using nroff.

Our examples will consider a project building a utility foo from its single source file foo.c.

Contents

History

Build System

On modern UNIX systems, the method for build management is overwhelmingly the make utility. Although there are two disjoint make implementations in use (namely by for GNU and BSD UNIX systems), I examine the syntax common to both.

In this section, I'll assume the file Makefile already exists, and is used to build a system where one wishes to incorporate mdoc files.

all: foo

clean:
    rm -f foo foo.o

install: all
    install -m 0755 foo /usr/local/bin

foo: foo.o
    cc -o foo foo.o

A rigorous analysis of this syntax is beyond the scope of this book (do consult your system's documentation for the make command with man make). It defines the targets all, clean, and install: build, clean up, and install the utility, respectively.

File Extension

First, it's important to settle upon an input and output file extension, as make tracks file status by way of comparing the time-stamp of a file's input (which may be multiple files) and output (called the target). In short, if the target is older than any of the input files, it is rebuilt. The input files are created and maintained by the developer; the output files are built by make.

For simplicity, I use the standard .1, .2, and so on convention for the target (the output). I then use .in.1 and so on for input. Thus, it is necessary to notify the make utility of these new extensions before all other rules;

.SUFFIXES: .in.1 .1

If more categories are built, these would need to be added (e.g., .in.3 .3, etc.).

Build Rules

A build rule is required to translate input to output. Let's begin with a general rule to establish that the mdoc syntax is correct. We'll add this to the target building the main system: this way, all changes to the mdoc input file will be syntax-checked when make is invoked. We'll use mandoc to syntax-check the document.

.in.1.1:
    mandoc -Tlint $<
    #nroff -mandoc $< >/dev/null
    cp -f $< $@

We also need to build the target and clean it. Assume that foo.1 is the output file and foo.in.1, the input.

all: foo foo.1

clean:
rm -f foo foo.o foo.1

Result

Putting these together, the new Makefile is as follows:

.SUFFIXES: .in.1 .1

all: foo foo.1

clean:
    rm -f foo foo.o foo.1

install: all
    install -m 0755 foo /usr/local/bin
    install -m 0644 foo.1 /usr/local/share/man/man1

foo: foo.o
    cc -o foo foo.o

.in.1.1:
    mandoc -Tlint $<
    #nroff -mandoc $< >/dev/null
    cp -f $< $@

Formatted Output

Let's build an HTML manual with the make www rule. For simplicity, we won't install this file; it's merely for instruction. This rule will translate the built manual foo.1 into an HTML file foo.1.html.

.SUFFIXES: .in.1 .1 .1.html

Let's let our rule include a CSS file. Note that the traditional nroff utility doesn't include HTML output.

foo.1.html: style.css

.1.1.html:
mandoc -Thtml -Ostyle=style.css $< >$@

The target rule is simply as follows:

www: foo.1.html

The reason for building from foo.1 instead of foo.in.1 is that we may wish to postprocess the foo.1 file after it has been created. However, this is entirely decision of the programmer.

Contents

History

Revision Control

Several examples in this book have covered the topic of integrating mdoc documents into revision control systems. In this section, I cover the few steps required to integrate these documents with cvs.

Assume a file foo.in.1. consists of our mdoc source. I assume, for simplicity, that it is licensed with the ISC license and copyright-protected, both of which lead the document as a series of comments.

.\" Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd August 18 2011
.Dt FOO 1
.Os

The first step is to add a useful message to the top of the file as the version of the file. This is standard practise in revision controlled files.

Make sure that the first line has a tab character between the leading comment marker and $Id$. This sequence is filled in with the file's last editor, revision, and checkin date.

Some cvs servers (e.g., those in NetBSD and OpenBSD) support the Mdocdate sequence. This is filled by in cvs with the check-in date.

.Dd $Mdocdate$
.Dt FOO 1
.Os

In performing these two steps, the file's last-modified date and source identifier will be properly filled in by the cvs server. If your server does not support Mdocdate, you will have to maintain the date by hand, or possibly override the build rule for your file.

Contents

History

Composition

Since the UNIX manual has such a rich history of development, it would be strange were it not to have a significant body of supporting tools for composition. In fact, Version 7 AT&T UNIX was bundled with a set of tools, the UNIX Writer's Workbench, specifically for composing documents — spell-checking, grammar-checking, and so forth. Even then, sophisticated editors had long since been a pivotal part of UNIX.

Although the ed, ex, and vi editors are stipulated by POSIX.1-2008, the Writer's Workbench has long since been discontinued. Even spell-checkers are not standard across modern UNIX systems, although many high-quality editors and composition utilities may be downloaded.

In short, the situation is messy: composing mdoc documents (and in fact any roff document) is tricky to do without downloading special software.

However, one of the best parts of mdoc is that none of these specific tools are necessary: an mdoc document is just a text file and may be composed in any editor, and its text contents analysed by any utility smart enough to ignore mdoc syntax.

Contents

History

Editing

Since mdoc is an ASCII-clean text format, it may be edited in any text editor. In this section, I introduce a variety of editors available on most UNIX systems. Since this topic is exhaustively covered in most any introductory UNIX book, I only introduce portable editors.

Ed

The ed utility is a line editor standardised by POSIX.1-2008. The concept of a line editor may be familiar to those who have used a typewriter or teleprinter, where only the current line of input may be edited (or viewed, in some cases) at a time.

Its inclusion is largely for historical reasons, as using ed can be a frustrating experience for those accustomed to visual editors. I don't recommend using this utility for mdoc, although its function as a line editor makes it perfect for the task.

Ex, Vi

The vi and ex editors were powerful additions to the UNIX system: they allowed visual editing of files (versus line editing as with ed). This editor has inspired a raft of clones, but being standardised, some form of the utility is available on all UNIX systems. Furthermore, the vim clone of vi comes bundled with mdoc syntax highlighting.

Contents

History

Spell-checking

The right or wrong spelling of terms in technical documents is very important. Thus, it's always important to carefully spell-check your manuals, making sure that both technical and general terms are correctly spelt.

Unfortunately, spell-checking a mdoc document is fairly difficult, as the spell-checker must have some knowledge of the language structure to discern text from macros. Consider spell-checking the following snippet:

.Fl Alu Ar input

By now we understand that Fl and Ar are macros. But it's unreasonable to expect a spell-checker to do so. Thus, spell-checking manuals often raise many false-positives.

spell

The spell utility is distributed with many BSD UNIX operating systems as a simplistic spell-checker. In fact, it was first distributed with Version 6 AT&T UNIX. spell preprocesses its input with deroff, another historic utility with some functionality of stripping roff instructions from files.

To print a list of all unknown words, you can explicitly invoke deroff and spell as follows:

deroff -w file.1 | spell

A utility distributed with mandoc, demandoc, is significantly stronger than deroff. If available, it should be used instead. It has the same calling syntax of deroff.

demandoc -w file.1 | spell

You can also maintain a per-manual list of technical terms by using additional word lists. In the case of file.1, consider a sorted list of words file.1.words we're maintaining with special words (such as names). We could then augment a make rule to automatically make sure additions are spell-checked.

file.1: file.1.words

.in.1.1:
    mandoc -Tlint $<
    test -z `demandoc -w $< | spell -b +$@.words`
    cp -f $< $@

This snippet first makes the build of file.1 depend upon its local word file, file.1.words, a sorted list of words to ignore. When file.in.1 or file.1.words is updated, the rule is executed. It first makes sure that file.in.1 is well-formed, then spell-checks it against the ignored-words file.

The same can be accomplished on systems without mandoc.

file.1: file.1.words

.in.1.1:
test -z `deroff -w $< | spell -b +$@.words`
cp -f $< $@

ispell, aspell

Another common spell-checker is ispell and its GNU replacement aspell. I do not suggest using these utilities because of their poor internal support for mdoc. It's possible, however, to send stripped files for checking in a manner similar to spell:

demandoc -w file.1 | ispell -l

Or with deroff:

deroff -w file.1 | ispell -l

Both ispell and aspell also have a pipe mode for more meaningful output:

demandoc -w file.1 | ispell -a

Contents

History

Style & Grammar

The utilities bundled with the historical UNIX Writer's Workbench also allow for grammar and style-checking of mdoc documents — indeed, any document.

Like with spelling, these utilities cannot handle mdoc constructions. Unlike spelling, grammar depends on correct flow of terms. To wit, one must fully process input mdoc documents before passing them to such checks.

diction

The diction utility is rarely distributed with default UNIX operating systems, although it may be separately downloaded. The input to diction is best when it consists of well-formed sentences, which only appear when manuals are post-formatted.

mandoc file.1 | col -b | tail -n+2 | diction

Alternatively, with nroff:

nroff -mandoc file.1 | col -b | tail -n+2 | diction

This first strips the text decoration (underlined and bold text) from nroff or mandoc with col. The header is then stripped with tail. Finally, the formatted output is fed to the diction utility, which analyses text for readability.

Contents

History

Appendices

This part consists of appendices to the book. This will link heavily to external resources, although care is taken to provide enough information to make off-line reading meaningful.

Contents

History

Appendix: Glossary

ASCII: American Standard Code for Information Interchange. The predominant computer encoding for the English alphabet.
BSD License: A permissive open source license used for the BSD UNIX operating systems (among many, many other tools). May refer to either the two-part or three-part (deprecated) form.
BSD UNIX: A class of free implementations of the UNIX operating system, originally deriving from the Version 6 AT&T UNIX. This class consists primarily of FreeBSD, NetBSD, and OpenBSD. These operating systems are licensed either under the BSD license or the ISC license.
C: A programming language developed in 1969 by Dennis Ritchie at AT&T Bell Labs. This is the language of choice for UNIX development. A compiler for C first appeared in Version 6 AT&T UNIX, and one is stipulated now by POSIX.1-2008.
Cat Pages: Manual pages pre-formatted and installed with an operating system. Historically, the nroff utility was quite slow: pre-formatted pages in a cache reduced the wait time for man to display a manual. This has since become the convention for most UNIX operating systems.
CDDL: The Common Development and Distribution License, a free software license.
Command Line: The text environment for operating UNIX systems. Often replaced by graphical windowing systems such as the X Window System.
CSS: Cascading style sheet, used primarily to style documents in HTML or XHTML. The language is a standard maintained by the World Wide Web Consortium.
DocBook: A documentation system maintained by OASIS and developed at docbook.org.
DOS Prompt: Text (command line) interface to the historical Disk Operating System, usually Microsoft's Disk Operating System (MS-DOS).
English Spacing: The practise of using two spaces between sentences as punctuated by ., !, or ?. This applies even in the event that a sentence is quoted or parenthesised, where the spaces follow the final sentence enclosure.
GNU: The GNU project is a UNIX-like operating system licensed under on the General Public License.
GPL: The General Public License. This is the license of choice for the GNU project.
ISC License: A permissive free software license issued by ISC, the Internet Systems Consortium. This is the license of choice for the UNIX implementation.
HTML: Hypertext markup language. A structured mark-up language standardised by the W3C. This is the predominant language for formatting world wide web content.
libc: The C Standard Library. A set of functions (including system calls) in the C programming language. Standardised by POSIX, among other standards bodies.
Locale: A set of parameters defining a locality-specific user interface, such as special characters (glyphs), numerical representations, and so on.
Man Pages: Short form of UNIX manual pages. System documentation for UNIX systems. Usually viewed using the man utility, which pages formatted manual documents using to the screen. Man pages are formatted by a utility such as nroff or mandoc.
NetBSD: A free BSD UNIX operating system, NetBSD.
OpenBSD: A free BSD UNIX operating system, OpenBSD.
PDF: The Portable Document Format language used to format documents, usually for printing.
POSIX: Portable Operating System Interface for Unix. Most recently released as POSIX.1-2008, IEEE Std 1003.1-2008. Informally called UNIX08. Standards document for all UNIX implementations.
PS: The PostScript language, usually used as a page description language (e.g., printing).
Roff: A document language written for the original UNIX implementation in 1970. This language was used for text processing.
RUNOFF: A simple text processing utility for the CTSS operating system, usually paired as TYPSET and RUNOFF, developed before 1965.
RTF: Rich text format. Proprietary document file format used in some popular word processors.
System Call: A machine instruction that triggers the operating system to perform a privileged operation on behalf of the user. A typical example is to write a region of memory to a file. In the C standard library, these instructions are encoded as function calls such as write.
Terminal: The command line environment on a computer. This can either refer to a terminal utility run within a graphical environment or the computer screen itself in text mode.
UNIX: Computer system originally developed by AT&T Bell Labs in 1969. Modern open-source derivations include GNU/Linux, NetBSD, OpenBSD, etc.
UNIX Programmer's Manual: A historical manual for programming and operating the UNIX operating system. The First Edition, 1971, is preserved for reading.
WWB: The Writer's Workbench. This was a set of writing utilities first distributed in the Seventh Edition of the UNIX operating system.
XHTML: Extensible Hypertext markup language. XML-based form of the popular HTML format. Standardised by the W3C.

Contents

History

Appendix: Macros

This table consists of brief descriptions of mdoc macros referenced in this book (meaning: this is not a complete list!), then links to full descriptions according to the mdoc reference on http://mdocml.bsd.lv. Disclaimer: I'm the principle author of this system.

Ap: Insert an apostrophe. Reference: Ap.
Ar: Argument to a command or flag. Reference: Ar.
Bd: Begin a display. Reference: Bd.
Bl: Begin a list. Reference: Bl.
Bq: Square-bracket arguments. Reference: Bq.
Bx: BSD UNIX text. Reference: Bx.
Dd: Date of manual's last edit. Reference: Dd.
Dl: Display a line of literal text. Reference: Dl.
Dt: Title and category of a manual. Reference: Dt.
Dv: Constant-value variables. Reference: Dv.
Ed: End a display. Reference: Ed.
El: End a list. Reference: El.
Em: Text to be emphasised (presentational). Reference: Em.
Er: Error constant. Reference: Er.
Ev: Environment variable. Reference: Ev.
Ex: Exit code of a command. Reference: Ex.
Fa: Function argument. Reference: Fa.
Fc: Close multi-line function prototype. Reference: Fc.
Fd: A preprocessor macro definition. Reference: Fd.
Fl: Command flag (switch). Reference: Fl.
Fn: Function name. Reference: Fn.
Fo: Open multi-line function prototype. Reference: Fo.
Ft: Function type. Reference: Ft.
In: Include file (header file). Reference: In.
Lb: Library name. Reference: Lb.
Li: Literal text (presentational). Reference: Li.
Nd: A one-line description of the material. Reference: Nd.
Nm: Set or get name of documented component. Reference: Nm.
Ns: Suppress the following space and reset formatting. Reference: Ns.
Os: Operating system applying to manual. Reference: Os.
It: List item. Reference: It.
Oc: Close an optional part block. Reference: Oc.
Oo: Open an optional part block. Reference: Oo.
Op: Optional part of a command invocation. Reference: Op.
Ox: Format the OpenBSD operating system name. Reference: Ox.
Pp: Separate paragraphs with vertical space. Reference: Pp.
Pq: Parenthesise arguments. Reference: Pq.
Ql: Enclose literal argument in single quotes. Reference: Ql.
Qq: Enclose arguments in regular double quotes. Reference: Qq.
Rs: Begin a reference (bibliographic) block. Reference: Rs.
Rv: Function return value. Reference: Rv.
Sh: Begin a manual section. Reference: Sh.
Sq: Single-quote arguments. Reference: Sq.
Ss: Begin a manual subsection. Reference: Ss.
St: Print a standard name. Reference: St.
Va: Variable name. Reference: Va.
Vt: Variable type. Reference: Vt.
Xr: Manual cross-reference. Reference: Xr.

Contents

History

Appendix: Commands

This is a list of all commands mentioned in this book and how to find them on-line. All referenced utilities are open-source.

Aspell: A replacement of ispell with greater word-suggestion power and support for non-ASCII input. Available at GPL.
Cat: Output a file directly to the terminal. This utility is standardised by POSIX.1-2008.
Col: Filters line-feeds, usually used to strip backspaces from encoded documents. Originally in Version 6 AT&T UNIX, standardised in POSIX.1-2008.
Cvs: A revision management system (the concurrent version system). Originally a client-server extension to prior revision systems. Non-standard. Available under the GPL at savannah.nongnu.org/projects/cvs or the ISC license at www.opencvs.org.
Echo: Echo input arguments back to the terminal. This utility is standardised by POSIX.1-2008.
Demandoc: Remove all mdoc (and other UNIX manual format) control statements from a file by formally parsing input (via mandoc). Built to replace deroff for mdoc UNIX manuals. Non-standard. Available under the ISC license at mdocml.bsd.lv.
Deroff: Remove [most] roff control statements from a file. This utility uses heuristics instead of properly parsing its input. Originally in Version 7 AT&T UNIX (if not before?) and used heavily by the Writer's Workbench. Non-standard. Available under the Lucent Public License at heirloom.sf.net/tools.html and under the GPL at marmaro.de/prog/deroff.
Diction: Checks the diction of a (German or English) document. Originally in the Version 6 AT&T UNIX Writer's Workbench. Non-standard. A GPL version is available at www.gnu.org/s/diction.
Ditroff: A historical utility merging the code-bases of the original nroff and troff into a device-independent utility. All modern nroff and troff utilities are implemented by a ditroff implementation such as groff. A version of the pre-groff utility is available under the Lucent Public License at heirloom.sf.net/doctools.html.
Dump: Backup a file-system. This utility isn't standardised, but appears on most modern UNIX implementations regardless. Dumped file-systems may be restored with the dual restore utility.
Ed: Edit a file line by line on a terminal. This utility is standardised by POSIX.1-2008. It is the first editor to be bundled with UNIX, extending to Version 1 AT&T UNIX.
Ex: Extends ed to operate in visual (screen) mode. This utility is standardised by POSIX.1-2008. It first appeared in Version 8 AT&T UNIX, and is usually invoked as a special mode of vi.
Fsck: A file-system checker. Although not standardised, this utility is present on most UNIX systems.
Groff: The GNU re-implementation of ditroff, thus providing troff and nroff utilities. Non-standard. Available under the GPL at www.gnu.org/s/groff.
Ispell: A re-write of spell for international dictionaries. Non-standard. Available at www.cs.hmc.edu/~geoff/ispell.html under its own license.
Ls: List the contents of a file-system directory. This utility is standardised by POSIX.1-2008.
Make: A build system that uses a graph of dependencies (by a file's last modified date) to determine when a target needs to be rebuilt. This utility is standardised by POSIX.1-2008 and was first released in Version 7 AT&T UNIX. It has two somewhat incompatible implementations, as the standardised syntax is fairly limited: under the GPL at www.gnu.org/software/make, referred to as GNU make; or distributed with BSD UNIX systems as BSD make.
Man: A POSIX.1-2008 standardised utility for viewing UNIX manpages. The standard document only specifies that it accepts a name and returns output: no more. man usually looks up the manual to display in a set of directories reserved for manuals, then either pages pre-formatted manuals to the screen (cat pages) or formats it on the spot with nroff or mandoc.
Mandoc: A specialised formatter for UNIX manuals designed to replace groff for UNIX manual input. Non-standard. ISC licensed. Available at mdocml.bsd.lv.
Nroff: A re-write of the original formatter for the roff language (the name deriving from new roff). Built to accommodate for terminals. Modern uses of this utility are actually through a re-write, ditroff.
Restore: The dual to dump: restores a dumped file-system. This utility is not standardised, but found on most UNIX operating systems anyway.
Sccs: Historically the dominant revision control system for UNIX systems, and standardised by POSIX.1-2008. Despite being standardised, few UNIX systems include this utility. It has largely been replaced by cvs.
Spell: The original (English-only) UNIX spell-checker distributed with Version 6 AT&T UNIX. Non-standard. BSD licensed and available at code.google.com/p/unix-spell. Also may be emulated by ispell and aspell.
Tail: A standard POSIX.1-2008 utility for outputting parts of a file.
Troff: Traditional name of a formatter for the roff langage. First released in Version 6 AT&T UNIX as a printer and graphical-device version of nroff (the name deriving from typesetter roff). Modern uses of this utility are actually through a re-write, ditroff.
Vi: Extends ed to operate in fully visual (screen) mode, extending ex in its display handling capability. This utility is standardised by POSIX.1-2008. It first appeared in Version 8 AT&T UNIX.
Vim: A popular implementation of the vi editor. It is distributed under a custom GPL-like license at www.vim.org.

Contents

History

Preface

NAME

SYNOPSIS

DESCRIPTION

Tutorial Introduction

Commands

Simple Command

NAME

SYNOPSIS

DESCRIPTION

Elaborate Command

SYNOPSIS

EXIT STATUS

Case Study

Functions

Simple Function

SYNOPSIS

NAME

SYNOPSIS

DESCRIPTION

Elaborate Function

SYNOPSIS

SYNOPSIS

DESCRIPTION

Case Study

Function Library

NAME

NAME

LIBRARY

NAME

LIBRARY

SYNOPSIS

NAME

LIBRARY

SYNOPSIS

DESCRIPTION

RETURN VALUES

Case Study

System Call

NAME

SYNOPSIS

DESCRIPTION

NAME

SYNOPSIS

DESCRIPTION

RETURN VALUES

ERRORS

HISTORY

Case Study

Manual Syntax and Structure

Syntax

Input Encoding

Macro Line

Text Line

Escape Sequences

Special Characters

Predefined Strings

Comments

Macros

In-line Macro

Block Partial Implicit

Block Full Implicit

Block Partial Explicit

Block Full Explicit

Punctuation

Sentential Punctuation

Regular Punctuation

Quotation

Structure

Prologue

Date

Title

Category

Architecture

Operating System

Document Body

Layout

Required Sections

NAME

SYNOPSIS