A utility without a manual is of no utility at all.
This is a guide for writing UNIX manuals in the mdoc language. If you're new to writing UNIX manuals, or you want to learn about best practises for high-quality manuals, this book may benefit your work.
To those unfamiliar with UNIX, mdoc is a language for documenting utilities, programming functions, file and wire formats, hardware device interfaces, and so on. By a language I mean a structured, machine-readable document format such as HTML, the primary language of web pages; or RTF, used by word processors. man is the utility for querying documents in mdoc and other languages, collectively called man pages.
The following, for example, is a fragment of man output for the cat command.
cat | [-benstuv] [file ...] |
Why mdoc? After all, there are plenty of other UNIX manual languages out there, from the historical man to DocBook. In short, mdoc is:
No other format can boast all of these points at once.
In fact, although I've mentioned UNIX several times already, mdoc isn't exclusively tied to UNIX. Although UNIX and mdoc are historically linked, open source mdoc tools exist for any operating system. Furthermore, the documentation capabilities of mdoc apply to computing systems in general — not just UNIX.
In this book, however, I'll assume you are casually familiar with man and its output. This will allow us to focus on manuals with the same formatted output in mind. Thus, if you're unfamiliar with the man utility, this is a good time to read an introductory text on the subject (such as a UNIX beginner's guide), or at the very least, read the output of man man (the manual page of the man command).
This is not a canonical reference! The mdoc language is not standardised. For official reference, consult the manual distributed with your target computer system with man mdoc. This work primarily addresses the elements of mdoc common to any UNIX deployment, noting common pitfalls in portability.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Let's begin with practical examples of mdoc.
The intended audience of this part is somebody who has never written a mdoc manual. Although you may be tempted to jump to the chapter relevant to your manual type (for example, a command or function library), it's best to read the chapters in order. I'll explain mdoc syntax as we go.
If you've already written a few manuals, you may want to read this part anyway: beyond explaining technical mdoc language concepts by example, I'll also introduce some best practises and discuss portability between various mdoc environments.
I'll frequently refer to the screen output of mdoc documents as displayed with the UNIX man utility. Furthermore, I'll refer to command invocation in the traditional UNIX way — on the command line. In short, a bit of UNIX knowledge will help to avoid confusion. But I'll briefly introduce invocation syntaxes as the need arises.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Commands are the way in which a user operates her computer. Already I've noted the man command: if you've interacted with a UNIX system, you've probably run at least man intro or man man to learn about your system.
In this chapter, I'll discuss how to document these commands with mdoc.
This may be unfamiliar if you're accustomed to graphical interfaces — all of our examples will refer to command-line, text-based commands. If your target environment isn't a UNIX system, it's a good idea to read these examples anyway, as as they will expose the rudimentary structure of the mdoc language. As mentioned before, reading an introductory text on UNIX will help avoid confusion.
Let's begin by making a mental checklist for the criteria that make a good manual for a command. This checklist arises by inverting what a manual reader expects in opening a manual: what does the command do and how do I operate it?
Above all, the best litmus test is whether a colleague or friend can read your manual and be able to use your command without any assistance on your part. Don't be discouraged by how this can take several tries to get right!
I'll begin with a simple command, hi, which prints hello, world to the screen. I'll then add some command-line arguments to this command. By the time you finish this chapter, you should have a grasp of mdoc syntax and some of its more widely-used macros.
In this text, I'll refer to the invocation of commands as cmd flag farg arg. Here, cmd refers to the command invocation name, flag is a flag (or switch) to that command, farg is an argument to a flag (not all flags have arguments), and arg is an argument to the command.
The dash in front of flag indicates a flag, while the square brackets around flag farg indicate an optional part of the invocation. Since arg is not bracketed, it is a mandatory part of the invocation.
This convention is formalised by the POSIX.1-2008 standard (Base Definitions, sec. 12.1), so you can expect to see it often in the UNIX world.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 22:24:07 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Consider a simple UNIX command hi that prints hello, world and exits. Let's create a manual page hi.1 documenting this command. In this example, I'll begin with the full manual. In later examples, we'll build up the manual piece by piece.
How to display this manual page depends on the system you're using.
Traditionally, the command for formatting UNIX manuals for a terminal is nroff. For now, let's stick with that.
To display output, you must invoke nroff as nroff
-mandoc file. The mandoc flag indicates that input is in mdoc. Hereafter, I'll refer to nroff
simply as the formatter
to avoid confusion, as there are many available mdoc
formatters.
hi |
Let's start by studying the input and output. We can see most of the text translated into output, for instance, the capitalised NAME input is left-justified and in bold text. Same with SYNOPSIS and DESCRIPTION, although the .Sh text before this terms is missing. We can even see the output sentence Print "hello, world" and exit spread over lines 10–12:
Let's take a closer look at this fragment.
The .Qq is part of mdoc's instruction syntax. Input lines beginning with a dot are instructions to the formatter called mdoc macros, or just macros for short. The macro name is a terse two or three-character word following the dot, for example, Qq. The name of a macro tersely hints at its function. The words following the Qq to the end of line are arguments in the scope of the macro.
Scope, a technical term in the field of programming languages, refers to the body of input within the context of an instruction or variable. In mdoc, a macro's scope is the block of text and instructions in the formatting context of that macro. Looking at the input and output, we can infer the scope of Qq by seeing what's surrounded by quotes (the formatting, in this case).
As we explore more and more macros in this book, we'll see that each macro follows one of a handful of scope rules. It's already clear that Qq is limited in scope to its invocation line. But notice that the formatter recognised the content between Sh macros as requiring indentation. So it's clear that mdoc also has a concept of multi-line scope. In fact, Sh has both line arguments, for the name of the section; and multi-line arguments, for section content.
Furthermore, the existence of Qq within the Sh scope means that scopes may be nested. In the next section we'll see how multiple macros may even be specified on a single line.
We can visualise this scoping as follows, with an outer scope and inner scope:
Now let's return to hi.1 with this new knowledge of macros and scopes.
We see seven macros in total, Dd, Dt, Os, Sh, Nm, Nd, and Qq. We know now that Qq encloses its arguments in double-quotes, Sh begins a named section with indented multi-line arguments.
Of the remaining macros, Dd accepts the last modification date of the
manual in month day, year
format. Dt refers to the manual's
title, HI, and its category, 1.
Numbered manual categories are UNIX conventions, but applicable to any operating system. We'll explore more standard
categories throughout this book. Note that HI is uppercase: by convention, Dt should always accept a capitalised document title. We'll talk
more about titles and sections in later chapters of this book. For now, let's assume that a category number identifies
the topic of the manual, where 1 refers to utilities.
Next, Os indicates the operating system of the system running the formatter. If left unspecified, the formatter will return the current operating system (e.g., OpenBSD 4.9, Linux 2.6.32-5, or Microsoft Windows XP).
Note that text following the \" marker is an mdoc comment, which has the following syntax:
Comments are line-scoped, like Qq:
Moving along, Nm accepts the manual's name. This differs from the title, Dt, in that a single manual may document multiple components. We'll see examples of this in later chapters. Finally, Nd accepts a brief, one-line description of the command.
You can see that we re-invoke Nm in the SYNOPSIS, only without arguments. The formatter is smart enough to fill in its argument with the last supplied argument, in this case being hi.
Since our simple command has no command-line arguments, its invocation is simply the command name.
Piecing this all together, we now have the following.
In this example, you've noticed that \(dqhello, world\(dq has the same behaviour of the Qq invocation. In mdoc, quotation marks signify literal strings. Thus, we used an escape character \(dq to render ".
You may ask why not just use Qq, such as
For the time being, assume that Nd must have its scope on the invocation line. Strictly-speaking, we could have written
but this encourages dangerous behaviour in assuming that quoted arguments may not affect output. This isn't always the case! We'll see later how quoted terms on macro lines change the grouping of arguments — at times non-intuitively.
Before moving on to the next section, let's look quickly over our checklist for a well-formed manual.
To the effect of the exit status, let's modify the DESCRIPTION slightly for clarity.
Of course, our command must actually do so! For simplicity's sake, let's assume that this is the case.
With our simple, well-documented example in mind, let's move on to a more realistic UNIX command.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 22:24:07 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Most UNIX commands have flags, arguments, return values, environmental variables, and so on. So let's expand upon our example to include arguments for writing to an output file and a flag for outputting in uppercase letters. Furthermore, we'll accept an optional prefix string on the command-line, and return non-zero on failure.
This changes two parts of our manual: the SYNOPSIS section, where we'll record the invocation syntax of our command; and the DESCRIPTION, where we'll describe the command-line options. We'll also add a new section, EXIT STATUS, to describe the non-zero exit on failure.
Let's start by documenting our command-line options in the SYNOPSIS section:
The output renders as follows:
hello | [-C] [-o output] [prefix] |
Already, we begin to see the output take shape with the C and o characters, and the prefix. It's also clear that the Op macro surrounds its arguments in square brackets, just as Qq surrounded its line in double-quotes.
But how did the formatter know to prefix the C and o with a dash, or underline the arguments output and prefix?
It's obvious this has something to do with Fl and Ar.
Macro lines may in fact consist of multiple macros — sometimes nesting further macros, sometimes closing prior scopes to begin one anew. The Fl and Ar words are macros nested within the scope of Op. However, while Op contains both of these child scopes, the Ar macro closes out the Fl scope and begin its own.
Outer parts are an outer scope, while inner parts are an inner scope. Now it's easy to see how Fl prefixes only the C with a dash and not the arguments following: its scope is closed out by Ar.
Note that to document a flag Ar, we would need to quote its arguments as Fl "Ar" (we'll later learn how to escape arguments with zero-width spaces to accomplish the same). As there are many mdoc macros, a popular novice mistake is to unknowingly invoke a macro when expecting to print text.
With our command syntax documented, let's document the arguments themselves. To do so, we detail the meaning of flags and arguments in the DESCRIPTION section.
Immediately, we see the introduction of several new macros: Pp, Bl, It, and El. More interestingly, we notice the text on the Bl begins with a dash, just as when passing arguments on a command line. This is the first instance of a macro that accepts flags.
The rendered output of this fragment is as follows.
It should be clear that the Pp macro, which always stands alone, introduces a vertical paragraph break.
Earlier, I introduced the concept of a multi-line scope for Sh, which
was closed and re-opened by subsequent invocations of Sh. In this
fragment, the Bl macro (for begin list
) is explicitly closed
out by the El macro (end list
). This is an example of explicit
scope closure, versus the implicit scope closure of Sh sequences.
Predictably, the Bl and El enclosure consists of list items, begun by the multi-line It macro lines. Like Sh, the It macro has its scope closed by subsequent invocations of It. As expected, its scope also closes when the surrounding list is closed with El.
Until now, we've discussed only macros and macro arguments. But a handful of macros — Bl included — also accept flags which themselves may have arguments. In our example, the tag flag to Bl stipulates a tagged list. A tagged list entry consists of two parts: a tag and data, similar to the <DL> descriptive lists in HTML consisting of a key and data. Bl accepts a second flag, width, which accepts the argument Ds. This instructs the formatter that the tag portion of the list has width Ds, which is shorthand for default spacing.
Next, let's look closer at the input line
Note that it's correctly rendered with the period flushed up against the text, whereas the period is space-separated in the input. (The period itself isn't font-decorated, although this is difficult to see in the media you're reading.)
By making the punctuation a separate argument, we distinguish it from the term prefix, and thus it is not underlined. The formatter is smart enough to distinguish standalone punctuation. When writing an mdoc manual, punctuation should always be separated from macro arguments unless it's part of the argument itself. This allows the formatter to correctly intuit end-of-line spacing.
If we hadn't done so, the formatter wouldn't distinguish period from word. This is more intuitive when re-using the familiar Qq.
We can now see the difference in the placement of punctuation:
Let's piece this all together. You'll recognise the Dd, Dt, and Os macros from the last section, although the Dt argument has changed with our command name.
Notice that we don't repeat the Op macros in the DESCRIPTION, although we stipulate them in the SYNOPSIS. This is because we document the flags and arguments themselves in the DESCRIPTION, not the calling syntax of the command.
Finally, let's accomodate for command errors by stipulating the exit status of the command. To do this, we add a new section to the end of the manual, EXIT STATUS, consisting of a single macro. We didn't add this to hi.1 because we didn't stipulate any exit state; however, it's good practise to always include this section, even if your command only exits in one way.
The Ex macro is special in that it always accepts a flag, std. This is by convention. Although you can specify an argument to Ex, it works like Nm without arguments in that it reproduces the name of the document as last invoked with Nm. It prints a standardised message about the exit status of the command.
With our manual complete, let's go over our checklist.
Of course, most real manuals have many other useful bits of information, such as author names, referenced standards, files, and so on. I'll describe these in detail in later chapters of this book.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 22:57:49 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
I now introduce a case study of a real-world manual, in particular the echo utility from OpenBSD. The original file may be viewed on-line at src/bin/echo/echo.1, file version 1.20. I choose this mainly because of its simplicity.
These initial comments are automatically created by the source-control system cvs, which fills in information about the last editor. I'll talk about revision control and those funny dollar-sign enclosures in Part 3. These particular comments indicate that the file was initially imported from NetBSD in 1995, where it was last edited by cgd (a system name, not the user's real name). It was last edited in OpenBSD, its current form, by jmc in 2010.
If you're keeping your manual under source control, it's usually a good idea to begin your file with a similar line.
A tab character separates the comment marker from the text. Again, this will be covered later in this book — don't worry if it looks strange.
This long comment is the license and copyright of the source file. Of course, our use of this source file is compatible with the license, as may be read from the text itself!
This comment is of historical note. The @(#) sequence was inserted by the sccs utility (Source Code Control System). Although this utility is part of POSIX.1-2008, it has mostly been replaced by cvs. You'll probably never encounter this string in your own manuals, so it's safe to disregard.
At this point the manual content itself begins.
This indicates that the manual's title is ECHO in category 1 (utilities) for the current installed operating system. The $Mdocdate$ enclosure is similar to that as defined at the top of the file with $OpenBSD$.
This documents a single command, the echo command, which does as mentioned.
The command accepts a single optional flag, n, and an arbitrary number of optional arguments string. Note that re-stating the command name for the Nm is superfluous in this case.
The DESCRIPTION opens with a brief explanation of the utility and its output. The strange set of backslash-escaped characters \ \& is required to make the doubly-nested macros Pq and Sq (parenthesise and single-quote, respectively) correctly enclose a single space.
This follows the standard form of documenting flags and arguments as a term/definition list. Each one — in this case only one — is documented in alphabetical order.
Notes the standard exit sequence. Note that the argument to Ex is superfluous, as only one command is listed for the manual.
Although these weren't cited in other sections of the manual, the author felt it necessary to link to them. This is probably because both csh and ksh include internal implementations of a function by the same name.
This last section fully describes the utility's conformance to the POSIX standard, which is very important to those writing portable utilities. The St macro expands into the relevant standard's full name, IEEE Std 1003.1-2008 (“POSIX.1”). For a full list of standards, consult your local documentation for the macro.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/05 00:39:38 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Programming functions are a significant part of the UNIX canon, from the system call interface to the C library. If you're a C or C++ developer, chances are you've at least glanced through the manuals of functions such as socket, printf, or memmove.
In general, the mdoc macros used for documenting programming functions are the same as those used for Commands; however, there are some domain-specific bits to annotate the various parts of function versus command invocation. You'll see that each command invocation macro, such as Fl for a flag, has an analogue for programming functions, such as the Fa, for a function argument.
The mdoc format is used primarily for the C language and Fortran, but it works with C++, Perl, Tcl, and other imperative languages. In fact, most any language with functions (or subroutines) and variables will work, typed or not. In this book, I focus exclusively on the C language. This is due to the overwhelming presence of C libraries and functions documented with mdoc.
Before beginning, we need to change our mental checklist for a complete manual. Since function manuals can document more than just function parts, our manual must grow to account for complexity.
A function is any callable instruction, be it a C function, routine, or macro. A variable may also be a C variable or macro. I'll consistently use the function and variable terminology in this book.
In general, you don't have to be knowledgeable of C to understand this section, but it helps to have a grasp of basic programming structure, such as functions, function prototypes, and header files. In any event, I'll refer to a header file as a text file consisting of function prototypes. Header files for the C language, such as in our examples, end with the .h suffix. A C function prototype indicates the calling syntax of a function, such as the following.
In this, the C function isspace, notationally referred to as isspace, has a single argument int c (an integer named c) and returns a value of type int (another integer). Multiple arguments are comma-separated.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/05 16:50:11 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Let's study a simple C function, hi, which prints hello, world just like in prior sections. We begin with the familiar first macros.
All that's changed is the manual category, from 1 to 3. We'll discuss manual categories later in this book. Suffice to say that programming functions and libraries (not system calls!) are in category 3.
The calling syntax of our function is documented in the SYNOPSIS section. Assume that our function prototype is within the header file hi.h as void hi(void), which, in non-programming terms, declares that a function hi accepts no arguments and does not return a value.
This introduces three unfamiliar macros. The In macro specifies an include file that interfacing programmes will need to include. The Ft and Fn macros collectively document the function (return) type and function name. Since not all languages have return types, the Ft macro is optional in this context.
By now it comes as no surprise that Ft is scoped to the end of its line, as is Fn in this example. In fact, both of these macros are syntactically similar to the Ar and Fl found in the first chapter: their scopes are closed by subsequent macros on the same line.
Since our function has no arguments or return values, all we need to do is add some bits in the DESCRIPTION section to complete this manual.
Here, you'll notice a difference between a function and command manual. It's clear that we refer back to our invoked command using Fn instead of Nm. Why is this? The Nm macro, when used in the body of a manual, refers to the command name, not the manual name, as we used Nm to annotate that utility name in the SYNOPSIS. In functions, we use the Fn macro. The difference is that Fn won't repeat the manual name if used without arguments. This complexity is simply the result of poor planning in the mdoc language.
Let's visualise the output so far:
Lastly, let's stipulate the function return value in its own section, RETURN VALUES. This mirrors the EXIT STATUS introduced for hello.1. Although we don't have a return value, it's good practise to include this section anyway.
Although this example is instructive, it's quite simple. Let's review our checklist before moving on.
Very few real-world functions are so simple. In the next section, we introduce a more practical function with types and arguments.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2014/04/07 21:27:38 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Let's also study a function form of the elaborate command example. Again, I'll use C as my example. Since this is a bit more involved, you may feel a little lost if you're not familiar with C programming. I'll keep the technical jargon to a minimum.
Let's re-write hi as hello, accepting a Boolean (zero or one) integer of whether to capitalise, and an optional character string (a word) prefix. Let's also stipulate an integer return value.
If you're not familiar with C, the const char * and int parts are part of the C language. Note that the C and prefix names haven't changed.
The include file (In) and function return type (Ft) are unchanged but for the type (int instead of void). I've added an explicit-scope macro pair Fo and Fc, syntactically like Bl and El, that encloses the function's arguments.
This renders as follows. Note that the formatter is smart enough to comma-separate the Fa macros.
It's clear that the Fo macro accepts the function name (as Fn did for the simple example), but it also opens a function prototype scope. This scope is closed by Fc. The contained Fa macros are for function arguments.
If you're wondering why I didn't use Fn as in the last section, it's a matter of readability. Using Fn puts everything on one long line, such as the following.
This works with two arguments, but can quickly run into long lines. In general, your mdoc manual shouldn't exceed 78 characters per line. Shorter lines are useful when managing manuals in cvs or other version management systems — we'll discuss this in later sections of this book.
The quoted arguments to Fa may seem superfluous, but each argument to the Fa, for the C language, refers to a type and variable name. Since one may specify several arguments to a single Fa, the quotes are necessary for signifying a single argument type and name.
This renders as follows, with the Fa scope highlighted.
In the C language, function prototypes don't necessarily need named function arguments. However, it's good practise to name arguments when documenting them in the SYNOPSIS so that we can consistently refer to them later on in the manual. Let's refer to them now in the DESCRIPTION, where we document our arguments.
Note that there are no set conventions for documenting function arguments in the DESCRIPTION body. Sometimes this is done within the flow of a regular sentence. Other times, as below, we'll introduce each argument as part of a list.
Here, we see the familiar Bl and El list enclosure. Notice how I re-use the Fa macro to document individual arguments, just like I re-used Fl and Ar when documenting command-line flags and arguments. In the last section, I mentioned why we use Fn instead of Nm for re-stating the name.
This renders as follows.
Finally, let's add a section documenting the return value of this function. This will differ from the simple example in that we actually return a value.
Piecing this example together, we have the following the following respectable C function manual.
Running through our checklist, we see that we've described preprocessor information with the header file macro In; function calling syntax and types in the SYNOPSIS; and arguments in the DESCRIPTION along with function operation. This contains all we need to know about the function.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2014/04/07 21:27:38 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
I now introduce a case study of a real-world function manual, in particular the manual for the strtonum function, which is an extension to the C Standard Library found in OpenBSD. The original file may be viewed on-line at src/lib/libc/stdlib/strtonum.3, file version 1.14.
In this case study, I've chosen a manual with some bad behaviour — not broken, but bad. This is intentional to show how real-world manuals deviate from recommended forms. I'll explicitly note each instance of bad behaviour as we explore the manual's contents.
This is the standard comment header to manual files in OpenBSD. Indeed, most distributed manuals begin with a copyright notice, then a license. The $OpenBSD$ line is automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the ISC license.
These three standard macros establish the last modified date, manual title (same as the single documented function but capitalised), manual category 3 (functions), and the default operating system. The $Mdocdate$ line, like the $OpenBSD$ line, is automatically updated by cvs whenever the document is committed to the source repository.
Declares a single documented function, strtonum, and its purpose. The quotations within the Nd macro are superfluous: like Qq macro studied earlier, Nd accepts an arbitrary number of arguments to format. Quotation, in grouping these as one argument, serves little but to pass in whitespace (there is no special whitespace to pass in).
This declares the function prototype and calling syntax. First, let's examine the new Fd macro. The use of this macro for a header inclusion is obsolete: new manuals should always use In. This makes it easier for parsers to understand a header file — and possibly link to it — instead of being a generic preprocessor statement. The re-written form would begin as follows:
Moving along, we see that each function argument includes its name (e.g., nptr and minval). While not common in header file prototypes, this allows later references of function invocation in the manual to refer back to the prototype for type and context information. In the previous section, we discussed the relevance of quotation with Fa: the same is done here.
While we could have used Fn, it would have created an overly long input line. Using Fn instead of Fo is purely a matter of convenience and has no effect on parsing or formatting.
In the SYNOPSIS, the Fa included the full type information. Here, however, we use Fa with just its name, nptr. We could have done the same in the SYNOPSIS, but the C language includes all type information in its prototypes.
The Li macro here isn't good practise: since the long long refers to a type, it should be of type Vt. This behaviour — using a presentation macro instead of a semantic one — is a holder from legacy manual forms that are purely presentational. If you find yourself applying a style, think twice whether it's a good idea!
The remainder of the DESCRIPTION section has completely captured the calling syntax and behaviour of the function. The usage of Ql macro is simply to set aside non-alphanumeric letters from the regular stream of text.
Since this function returns a rather tricky error message, it's necessary to describe the effects of both the return value and the passed-in arguments.
Many manual readers jump directly to the EXAMPLES section to gain an understanding of your function. Thus, not only must the example compile and run, it must also demonstrate as many parts of the function as possible. In the case of strtonum, an error condition and a non-error condition are documented. However, the header file inclusion(s) are missing, which may mislead readers. In particular, the non-standard errx function requires the err.h header file.
The ERRORS section will be rigorously covered in the section on System Calls. In brief, since the errno global error variable is set, each possible value must be documented in a list using the Er macro. These are always enclosed within Bq.
Furthermore, the error string in errstr must also be documented.
This section collects all references to other manuals made elsewhere in this manual, then adds more for completion. Note that the entries are alphabetically sorted.
Since this function is included in OpenBSD's C Standard Library, the fact that the function is not standard must absolutely be documented. In this, the Ox macro indicates the OpenBSD operating system (each BSD UNIX operating system has its own macro).
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/05 16:50:11 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
I've mentioned several times that the name provided to Nm doesn't necessarily refer to the title of the manual in Dt. Let's study a simple function library, using both hi and hello, which demonstrates this concept.
A function library is a collection of object files, which consist mainly of programming functions, within a single file called a library. On most UNIX systems, you can find libraries installed in /usr/local, ending in .a or .so.
This example applies to any number of functions belonging to the same library — not necessarily all functions in the library. In fact, one commonly finds large libraries spread over many manuals, each of which contain several similar functions.
For simplicity's sake, I'll call this C function library libgreeting, implying that the installed library is called libgreeting.a or libgreeting.so. It will consist of two header files, hi.h and hello.h, containing the function prototypes for hi and hello, respectively.
Let's begin with the first few macros, which are also called the manual prologue.
Note that I've changed the document title to be GREETING instead of choosing between
function names. This is because the manual documents the entire function library, not just one particular function. In
general, a function library should have its name not include the leading lib
.
It's a good rule of thumb that the Dt title of your document matches its filename.
Next, I'll list the names of the functions being documented. I also change the description of the manual to be more generic, just in case I want to add new functions, later.
Here I've used Nm twice to indicate that the manual documents two functions. In doing so, I'll have to be careful when invoking Nm in later parts of the manual, as it will produce hi if I don't specify a name, and this is probably not desired (nor should it be depended upon, as I may re-order the names).
If we were only documenting a single function in a library, we would only assign Nm and Nd to the relevant function and not that of the library.
It's good practise to alphabetise the function names in the NAME section. We must also be sure to comma-separate each name, leaving the last invocation without a comma. Let's look at the output so far.
Even though that is hard to maintain and not very useful, some operating systems, for example FreeBSD and NetBSD, require a LIBRARY section for base system libraries. For portable libraries, do not include such a section.
This uses the macro Lb, which accepts the name of the library
starting with lib
. This macro is not portable because the list of known library names is system
dependent, so it will produce different output on different systems, which is not desirable for a manual page.
The SYNOPSIS section will simply be a collection of the calling syntaxes for both functions, which we've already studied. If we were only documenting one function, would list only that function here.
Note that I've listed both include files prior to the function prototypes. This is familiar to C programmers, where functions may have multiple include files that need a specific order. The functions are listed in the same order as their Nm listing.
Let's examine the output so far.
Already, a manual reader has lots of pertinent information: the name of the library, its header file, and the function calling syntax. Let's continue in documenting the functions and their arguments, but this time, we'll do so in a different style than before.
Instead of using lists, we describe each function as a free-form stream of text. We depend on the SYNOPSIS to hint the reader as to the function argument types; there's no need to re-state them.
Notice how each sentence in this fragment ends on its own line, for example,
By doing so, the formatter is able to recognise the end of sentence and correctly handle sentential spacing. In most
cases, this means adding two spaces between the period and subsequent text. From this follows a rule of thumb, new
sentence, new line
.
In this DESCRIPTION we've captured what each function does and what its arguments are. What remains are return values.
Let's collect these fragments into a single document and see if it's enough to use as a programming reference.
We'll use our mental checklist as a guide. First we stipulated linking information with the Lb macro. Then we introduced the calling syntax of each function, naming their arguments. We also stipulated the necessary header files in the order they'd be included in source files. In the DESCRIPTION, we described each function and its arguments in full. Lastly, we documented return values in the RETURN VALUES section.
From this information, a programmer should be able to interface with our library.
Contents | Next | Home | History |
Last edited by $Author: schwarze $ on $Date: 2016/03/22 14:28:44 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
I now introduce a case study of a real-world function library manual, in particular the manual for the getc, fgetc, getw, and getchar functions from OpenBSD. The original file may be viewed on-line at src/lib/libc/stdio/getc.3, file version 1.12. This is not the manual for the full function library, but only a handful of similar functions.
This is the standard comment header to manual files in OpenBSD. The $OpenBSD$ line is automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the BSD license.
This classifies our manual in category 3 as a function or function library. The title of the manual, GETC, is chosen as the most general of those functions listed below in the NAME section.
Lists (alphabetically) all the functions that will be documented, and some general notes about their collective function. We next jump down into the SYNOPSIS; since this set of functions is part of the C Standard Library, it needs no special linking information.
This documents the calling syntax of all functions. Note that the Fd macro is used instead of the In macro. This invocation is historically relevant, but new manuals should always use In.
Next, each function and its arguments is explained as a free-flowing paragraph. This was probably chosen instead of using a list item for each argument (with Bl) due to the small number of arguments.
The usage of the Em macro is not correct: the Va or Dv macro would have been more appropriate. The same applies to the Li. The mdoc language is semantic, so using presentation macros such as Li and Em is discouraged.
All possible return values are correctly documented in the RETURN VALUES section and relevant functions cross-linked in the SEE ALSO section. Note that the cross-linked manuals are also alphabetically sorted.
Noting standards conformance is extremely important: it allows programmers and administrators to depend on your component in a cross-platform fashion. These functions are part of the C Standard Library.
The BUGS section should be used very carefully — bugs preferably should be fixed. In this section, design bugs have been documented. Whether the CAVEATS section would be more appropriate is up to the manual author.
We found several inconsistent uses of mdoc in this manual. In general, if you find unusual or erroneous macros or styles in UNIX manuals, notify the authors! A bug in a manual is just as important as a bug in the code.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
A system call differs from a user-land function in that it triggers the operating system kernel to perform some operation. This usually applies to I/O, such as reading from files or sockets with write. Other than that, system calls are no different than regular functions — they're invoked, have return values, and so on.
In mdoc, however, a system call is a special function consisting of at least one section not found in ordinary function manuals.
The first difference between ordinary functions and system calls is the manual category. Let's study a function khello, kernel hello
, which is similar to the hello
function described earlier.
All system calls are in category 2. Furthermore, unless under special circumstances, system call are each accorded their own manual.
I'll use the same descriptive text as in the hello example. Note that for system calls, the hello.h header file should be in the compiler's standard include path. This is usually /usr/include on UNIX systems.
You'll notice I've omitted the LIBRARY section in this example, as system calls by definition aren't a part of a library. Furthermore, I've used the Dv macro to annotate the term NULL as a constant variable.
Let's examine the output so far.
In the hello example, I included a section RETURN VALUES detailing the return value of the function. System calls, however, usually return a standard value and have a side effect of setting the C library errno variable when invoked within a C language context. This is documented with a special macro Rv.
The std flag is by convention always specified. This macro will produce standard text regarding the errno value and that the function returns -1 on failure and 0 on success.
If you have multiple functions specified in your manual, you must list them individually as arguments to Rv.
Next, the possible values of errno must be specified in the ERRORS section as a list. Let's assume that EFAULT may be set if the pointer is invalid.
The syntax of this list differs from lists we've already encountered. Earlier we used the special term Ds as an argument to width to specify a generic width. Here, we used Er, which is also specified at the start of each list tag (lines beginning with It).
The macro Er specifies a possible value of errno. There are many standard variable names for errno values, such as EFAULT used in our example. When we stipulate this as the argument of width, the formatter is able to translate this into a generic width of most Er macro contents.
You should avoid using this construct unless it's in a conventional way, as it is here.
If your system call is part of an operating system, it's common to add some lines as to when it was added. Let's assume you're adding the function to a fictional Foo OS. Most modern UNIX operating systems have their own macros, such as Bx for BSD UNIX. Be sure to note the version of the operating system.
Let's put all of these sections together and preview the output.
We can make sure the manual is complete by reviewing the checklist for function documentation.
First we implied linking information by using category two (which does not need to be specially linked). Then we introduced the calling syntax of the function, naming its arguments. We also stipulated the necessary header files. In the DESCRIPTION, we described the function and its arguments in full. Lastly, we documented return values in the RETURN VALUES section and the errors set in ERRORS.
We also added a HISTORY section, which isn't mentioned as part of our checklist but is considered good practise for system calls. In general, a note on historical information is useful to put your component in the general context of related machinery.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2014/04/07 21:27:38 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
I now introduce a case study of a real-world system call manual, in particular the manual for the fsync function from OpenBSD. The original file may be viewed on-line at src/lib/libc/sys/fsync.2, file version 1.9.
The cvs identifiers (both from the current system, OpenBSD, and the import source system, NetBSD), copyright, license, and sccs identifier (from the original system) are presented in the usual way: the the $OpenBSD$ and $NetBSD$ lines are automatically updated by the revision control system, cvs, whenever an update to the file is committed. The line following is the copyright message, and following that is the text form of the BSD license.
The manual's last-modified date is maintained with the automatically-updated $Mdocdate$ sequence. Its title is set to the single function's capitalised form, category 2 for system calls under the current operating system.
The Nd macro's arguments are superfluously quoted again.
Again, in historical manuals, Fd is sometimes used instead of the modern In macro. Note also the inclusion of the function argument's name, fd, where regular C prototypes would usually only include the type.
Since fsync is a simple function, its description is fairly straightforward. The single function argument fd is fully described as well.
This is not correct, as it omits information on the errno global error being set. The Rv macro should be used instead.
Most (if not all) system calls set the errno global error upon failure. This, erroneously, was not mentioned in the RETURN VALUES section, but is documented here.
Note that the cross-references in SEE ALSO are ordered first by section, then alphabetically. The Bx is referenced as the origin of the system call. The STANDARDS section is sorely missing, as fsync is a function specified by POSIX.1-2008 standard.
We again found several inconsistent uses of mdoc in this case study. Let this serve as a reminder that if you find bad or unusual mdoc in your manuals, notify the authors! A bug in a manual is just as important as a bug in the code.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
In the last part, I introduced some mdoc language syntax by way of example. We covered Commands and Functions. In this part, I'll study the structure of the UNIX manual itself.
Historically, the syntax and structure of mdoc derive from roff, a text processing language predating even UNIX. mdoc was in fact a bundle of macros expanded by a formatter into roff — not a separate language. Only recently has mdoc been mature enough to consider as a standalone language.
The general syntax of roff (and thus mdoc) can be traced to the RUNOFF command from the mid-sixties! The conventions of section names and manual categories were formalised later, in the early seventies, with the Version 1 AT&T UNIX Programmer's Manual.
Although the focus of this book is obviously on mdoc, a great deal of its idiosyncrasies derive from roff, so we'll spend some time discussing seemingly-unnecessary complexity in the context of general text processing.
I reiterate that this is not a canonical mdoc reference: mdoc is not a standard, and varies in subtle ways across formatters and operating systems. In this part, I'll discuss only the portable parts of mdoc.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Before studying the structure of mdoc manuals, let's review the language we've seen so far. Foremost, we've noticed that mdoc documents consist only of printable ASCII characters. We noted that a period at the beginning of a line indicates a mdoc macro:
It's safe to say, in this case, that mdoc is line-oriented in that programme flow is in part governed by position on a line. In the case of Qq, we saw how the macro extends to the end of the line. This is also the first notion of scope, specifically scoping to the end of line. We then saw examples where scope covers multiple lines and accommodates for nested macros as well as text.
We were briefly introduced to the concept of macros accepting flags and flag arguments.
Finally, we noted that double-quotes have special semantic significance, which led to the topic of escaped terms such as \(dq for a double-quote character. We also saw how punctuation is treated in special ways when lying at line boundaries.
In this chapter, we'll formalise these concepts. I'll draw my terminology from the literature of formal languages and grammar, but it's not necessary to be familiar with the terms beforehand.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Without exception, a well-formed mdoc document consists only of ASCII printable characters, the space character, the newline character, and in some cases the tab character. Most modern formatters allow for CR+LF newlines \r\n, but this is not portable. Modern formatters also accomodate for unlimit to line length; this is not necessarily the case for legacy formatters.
Unilaterally, the backslash \ is always interpreted as the beginning of an escape sequence. If an escape precedes a newline, it escapes the current line:
Formally speaking, a macro line is one beginning with a control character. In mdoc, this is traditionally the . character, although historical documents may also use the ' character. This notation extends back to the historical RUNOFF utility.
Control Words:
Input generally consists of English text, 360 or fewer characters to a line. Control words must begin a new line, and begin with a period so that they may be distinguished from other text. RUNOFF does not print the control words.
A line with only a control character followed by zero or more whitespace characters is stripped from input.
A macro line may, in some circumstances, contain more macros. The first macro — the one following the control character — may then be distinguished as the line macro.
On macro lines the following non-alphanumeric characters are syntactically meaningful as follows. These characters are collectively called reserved characters.
! | punctuation |
" | control character (quotation) |
( | punctuation |
) | punctuation |
, | punctuation |
- | control character (macro argument) |
. | punctuation |
: | punctuation |
; | punctuation |
? | punctuation |
[ | punctuation |
\ | control character (escape sequence) |
] | punctuation |
| | punctuation |
To pass these characters along as literal text, you must either escape or quote them.
If an unescaped space character is encountered on a macro line, it is used to delimit macros, macro arguments, and flags. Multiple consecutive space characters have no effect on output.
The spaces between Hello, and world delimit arguments in this case, and produce the same output of Hello, World without extra spaces.
A text line is any line not beginning with a control character. Text lines are never parsed for macros and may consist of printable ASCII character. Text lines are concatenated together when forming output, so unless in certain circumstances, newlines are stripped from input. Using a blank text line as a vertical separator is not portable.
If a space character is encountered on a text line, it is reproduced verbatim in the output.
The spaces between Hello, and world will be reproduced in both cases as-is. However, it is considered non-portable to use spaces on a text-line to shape output: HTML, for example, by default collapses whitespace. Secondly, consider whether controlled spacing between text in an otherwise free-form text sequence is appropriate. In most space-retaining cases, such as in source code examples, you're better off with a literal display mode such as covered at the end of this section.
Do not use the space-retaining feature to create double-spaces following a sentential period! See Sentential Punctuation for how to do this properly.
If the first letter of a text line is a space character, the output line shall be preceded by a newline. This creates the effect of an implicit literal display.
The portability of this behaviour is unknown. For greater portability (and semantic annotation), a literal display mode should be opened instead with, for example, the Bd literal:
In this example, the compact flag prevents leading vertical space. To effect a vertical space following the literal display, use a Pp.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/12/25 14:44:21 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
An escape sequence is any grouping of characters following a backslash \. This may happen anywhere in input. What follows the escape sequence syntactically depends upon the first letter. The following sections describe common escape sequences. The use of any other sequence is strongly discouraged for portable manuals; in fact, the use of any escape beyond \& should be strongly avoided: it makes manuals in different output formats inconsistent depending on their methods of glyph rendering.
Special characters allow the encoding of non-ASCII characters and, in macro lines, the use of reserved characters. Special characters may be invoked anywhere in input.
There are three forms of special character, distinguished by the number of letters in the sequence.
\n | one-letter |
\(nn | two-letter |
\[N] | n-letter |
The n-letter form may be used to express any of the others. For example, \& (a zero-width space) is equivalent to \[&]. The most common escape sequence is in fact \&, a non-printing, zero-width space. When preceding a word, it automatically causes it to be rendered as regular text:
If the Ar were not preceded with an escape, it would have be interpreted as the Ar macro instead of the flags Ar. An alternative to this is to quote the argument (see Quotation). The zero-width escape is found more readily in literal contexts beginning with a period, such as
An alternative form of special character is the predefined string. These are legacy roff constructs of an escape sequence that may be programmatically set or unset. The syntax for predefined strings follows:
\*n | one-letter |
\*(nn | two-letter |
\*[N] | n-letter |
The use of predefined strings is discouraged in portable manuals, as available strings may differ between implementations and formatters.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/12/25 14:52:52 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Comments — words in an mdoc document not interpreted by the formatter — are indicated by the special character \".
The comment extends from the special character to the end of the line. If the newline is escaped, the comment only applies to the current line. In other words, the newline escape is commented.
A comment may span an entire line if it's specified as a pseudo-macro, that is, following the control character ..
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
So far we've considered several different types of macros. A macro is usually a terse, two or three-character sequence specified on a macro line. In this section I formalise the various types of macros, categorised by their scope rules. As with many other languages, macros instructions are either scoped to one line (following a single instruction), which I call in-line; or to multiple lines (bracketed between instructions), which I call blocked. Block macros are usually invoked on a line of their own, as with Bd, but may also be invoked within a line.
Generally speaking, a macro is syntactically defined as having a macro name, and optionally flags and with optional flag arguments. The arguments to a macro depend on its scope rules.
The hyphen - indicates a macro flag only when the preceding macro accepts arguments.
An in-line macro's arguments are scoped to the current line. Its scope may also be closed out by subsequent macros: an in-line macro can never contain a nested macro. For a complete reference, see In-Line macros in the mdoc reference.
Not all in-line macros accept arguments, and some in-line macros accept a fixed number of arguments.
For example, the regular way of structuring command-line arguments, as described initially in the Elaborate Function guide, is a command flag, followed by flag arguments, followed by regular arguments. We can put most invocation forms on one line as follows.
In this example, Ar, Fl, and Ns are in-line macros. The Op is a block partial-implicit. The Fl macro opens within the Op and is closed by the Ns, which accepts no arguments at all. This suppresses the space between the flag and its arguments (this alternative style is used at times, but discouraged). The arguments are opened by Ar and close at the end of the line.
The following is an example of macros with a fixed number of arguments:
The Xr macro accepts the mandoc and 1 arguments, then reverts to accepting text. The Ap accepts no arguments, so it immediately reverts to the trailing text.
Finally, an example of an in-line macro accepting flags follows:
The argument to St specifies the standard to be printed.
A block partial macro is similar to an in-line macro in that its scope is restricted to the current line: it is implicitly closed by the end of line (as opposed to block partial explicit macros) and partial in that it only extends to the current line (as opposed to block full implicit macros). Unlike an in-line macro, it accepts nested macros (hence block macro). For a complete reference, see Block Partial Implicit macros in the mdoc reference.
The scope of a partial block macro is always closed by the end of the line; any macros between it and the end of line are interpreted as nested macros. We began this book with the block partial implicit macro Qq. The nested qualities of this macro category may be seen by embedding Qq and Pq
Be warned. If you open but do not close a block partial explicit macro before the end of the line, behaviour is not always well-defined as the scope is broken.
A macro seen early on, the Sh macro, is block full implicit. Unlike block partial implicit macros, these consist of multiple lines (they are blocks) and treat the line arguments and multi-line arguments differently (full). For a complete reference, see Block Full Implicit macros in the mdoc reference.
The scope of Begin is closed out implicitly — by one of several possible macros or the end of file. The notion of a full macro is obvious when considering Sh:
In this, the macro must separately decorate its line arguments and multi-line arguments. In this case, the line arguments must be bolded while the multi-line arguments must be indented. The Sh macro is closed out by a subsequent Sh or the end of file. Compare this to Ss, which closes out with a subsequent Sh, Ss, or end of file.
The simplest multi-line macro is the block partial explicit, which is opened and closed by two separate (explicit) macros. It is partial because it does not differentiate between arguments on the current line or subsequent lines, as opposed to block full explicit macros. The pair of macros involved in a full block macro are called the beginning and ending macros. For a complete reference, see Block Partial Explicit macros in the mdoc reference.
One must be careful, in full block macros, not to break the scope of other block macros, or behaviour is undefined.
We have not yet considered a block partial explicit macro pair in this book. But we can do so by considering Oo and Oc. This pair of macros, for option open and option close, extend the behaviour of Op to multiple lines.
The block full explicit macros are full in the sense that arguments on the macro line and arguments following are treated differently (like block full implicit macros). The earliest example of this is the Bl. These macros are explicitly closed by a closing macro and may contain nested macros. For a complete reference, see Block Full Explicit macros in the mdoc reference.
Consider the Bd macro, which does not accept line arguments (most block full explicit macros do not accept line arguments). It is manually closed by Ed.
In this example, the text between the Bd and Ed are treated specially.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/12/25 15:03:57 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The mdoc language, in descending from the type-setting language roff, has significant type-setting capabilities. Punctuation is treated specially in all mdoc documents, both in terms of macro and text lines.
The following characters are considered punctuation:
! | ending sentence |
" | ending enclosure |
( | opening enclosure |
) | ending enclosure |
, | ending |
. | ending sentence |
: | ending |
; | ending |
? | ending sentence |
[ | opening enclosure |
] | ending enclosure |
| | intervening |
These are treated specially by the formatter when used in macro lines and at the end of text lines.
End of Sentence, End of Line.
The end of a sentence should always be at the end of a line. This way, the formatter can recognise a sentence by the
punctuation used and insert the correct amount of spaces. If supported by the output media (HTML, for example, does not), all modern mdoc
formatters use English spacing to mark sentence boundaries.
The ending sentence
punctuation in the punctuation table marks an end of sentence.
In text lines, sentence punctuation should always occur at the end of the line.
Note, in the last sentence, that the formatter will recognise sentence punctuation even when followed by ending
enclosure
punctuation as noted in the punctuation table.
However, take care that non-sentence punctuation, such as for abbreviations, does not happen to fall at the line boundary.
In this case, the formatter will interpret Dr. as ending a sentence. In this event, you can either restructure your line or add a zero-width escape following the period.
Macro lines are slightly more complicated. The same rules apply, but punctuation marks must be separated by spaces. The formatter will understand the role of the punctuation and remove the spaces accordingly, or reorder sentence and closing punctuation.
The punctuation may be escaped by either a trailing escape, as in the text case, or a preceding escape. In this case it is not considered punctuation, but regular text. Note that this will also cause an intervening space to be printed.
Non-sentential text line punctuation — commas, parenthesis, quotes, etc.— is a matter of literal printing.
The rules for macro lines are the same but for in-line macros, which might decorate individual terms with text. In this case, punctuation as a standalone argument is specially treated in that it is not decorated, and whitespace removed according to the punctuation type (opening, closing).
In the second example, (All and end-of-sentence.) are considered arguments, and thus not accommodated for in terms of punctuation. In the third, the period is escaped and thus considered regular text.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/12/25 14:44:21 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Several times I've mentioned how to interpret macro arguments as text — instead of, say, other macros — by quotation. In this section, I formalise the notion of quoting arguments. The issue of quotation is fairly complex owing to mdoc's predecessor, roff.
In short, quoting arguments to macros passes the enclosed text verbatim as a single argument. An obvious case follows:
By quoting Ar, it is passed verbatim to Fl If not, it would be interpreted as the macro Ar and open a new macro scope. What's worse is that the syntax is entirely legal! This illustrates a minor short-coming of mdoc: beginners may unwittingly invoke macros (such as Ar in our example). Printing a warning would cause more harm than good with well-formed manuals; thus, it's the responsibility of the document author to double-check that macro instructions are properly treated.
This condition could have been avoided by beginning the argument Ar with a zero-width escape, such as \&Ar. The need for quotation is more obvious with the Fn macro:
The syntax of Fn is that it first accepts an optional function type, then a function name, then arguments to the function. These arguments usually include a type followed by a name. In our example, int refers to the function type, foo to the name, and both int and bar as separate arguments.
Our intention, however, was to have int bar considered a single argument. To do so, we would need to quote.
The int bar argument is now passed intact to the macro.
To include quotation marks in quoted text, use two quotation marks in a row.
This artificial invocation passes a quotation mark followed by four whitespaces to the Li macro. It is, however, unwise to use this language component: it's jarring to those expecting symmetric quotes, and easy to mis-type, leaving runaway quotes. It's safer to use an escape, such as \(dq, instead of pair-wise quotations.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/12/25 15:10:22 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
An mdoc manual is divided into two logical parts: the prologue and the document body.
The prologue specifies information regarding the manual's classification. For the most part, this information does not change over the course of development. It specifies the manual's title (which may encompass multiple documented components) and category, the date of last editing, the other information.
The document body consists of the documentation content. This material changes over the course of development, and is the bulk of the manual page. It minimally consists of the component name, invocation syntax (if applicable), and a description of operation.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The prologue consists at most of the Dd, Dt, and Os macros. These always occur at the beginning of a manual.
The only firm requirement of the mdoc prologue is that the Dd macro comes first: many formatting systems will read up to the first macro to determine the formatting language. If Dd is not encountered first, the mdoc format may not be recognised.
Following the Dd, the prologue is conventionally ordered as first Dt and then Os. The Os macro is usually left without arguments, meaning that the manual applies to the current system.
After parsing the document prologue, the following is known:
manual section).
The date is specified by the Dd macro.
While no particular date format is required, it's best to use the month day, year format, where month is the month in English; day is the day of month; and year is the four-digit year. Arbitrary white-space may separate the tokens, which may also be quoted.
Example of canonical form:
Example of not zero-padded digit form:
Example of quoted-string form:
All of the above examples will normalise to the third of June, 1991. It's especially important that the month be in English, as not all operating systems support localisation.
Some formatters also support a special date format as follows:
This is usually used in conjunction with source-code control systems that automatically change the date. Consult your formatter's manual for whether it supports this feature.
A manual's title identifies the entire manual document. It is always specified in uppercase as the first argument of the Dt macro, which conventionally follows the initial Dd macro.
The title usually corresponds to the file-name of the document, but this is not necessarily the case.
In the case of a single-component manual, such as the manual for a single UNIX command or programming function, the title corresponds to the manual name as specified with the SYNOPSIS Nm macro argument.
In the event of multiple components, such as a programming library, the title usually corresponds to the library name. If multiple commands are specified, such as with aliased names, the canonical form should be used.
Example of a title for the ls utility:
Example of a title for the libgreeting function library, consisting of the hi and hello functions:
If the title is left unspecified by omitting the Dt macro, behaviour is undefined. Usually a formatter will default to an empty string or LOCAL. In general, however, a manual without Dt may be considered incomplete.
The category of a manual, sometimes called the manual section, specifies the type of component a manual describes. It is specified in the second argument of the Dt macro.
These categories are dictated by convention extending to the Version 1 AT&T UNIX Programmer's Manual.
This manual is divided into seven sections:
- Commands
- System calls
- Subroutines
- Special files
- File formats
- User-maintained programs
- Miscellaneous
Commands are programs intended to be invoked directly by the user, in contradistinction to subroutines, which are intended to be called by the user's programs. Commands generally reside in directory bin (for binary programs).
These sections have been expanded and formalised in the intervening years, amounting to the following modern conventions.
kerneldevelopment).
There are several refinements to the numerical category convention. Perl, Fortran, and Tcl libraries are often grouped under category 3p, 3f, and 3tcl, respectively. Perl modules may also fall under 3pm. Tcl libraries are also found in the n category.
Although some common libraries are traditionally referred to with a custom suffix, such as 3ssl for the OpenSSL library, this notation is heavily discouraged.
Manuals for the X Window System, traditionally bundled with UNIX systems, are categorised under X11. Manuals for the popular X11R6 distribution of the X Window System may also be listed under X11R6.
The paper category historically consisted of longer papers, the draft category consists of draft manuals, unass consists of uncategorised manuals, and local consists of local system documentation. These categories are rarely used and should be avoided for portable, readable manuals.
Some manuals, especially those in category 4 or 9, relate only to a particular hardware architecture. This is a useful specifier in the machine-dependent manuals for category 9 manuals.
These use the optional third argument of the Dt macro.
For a list of possible architectures, consult your local documentation. A safe example is i386, for 32-bit x86-based systems; or amd64 for 64-bit AMD systems.
A device referring to a particular architecture uses this to explicitly note its relevant architecture. In normal manuals, this should not be used.
Similar to architecture, some manuals only pertain to a particular operating system. This system may be specified to the Os macro of the prologue.
If system is unspecified, the manual is assumed to apply to any operating system.
This form is useful when multiple operating systems have access to local-network administrative manuals, such as in a networked file-system environment. Otherwise, it is rarely used.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:32 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The document body begins with the first macro not in the prologue set (Dd, Dt, and Os). The document body consists of the manual content itself, and varies significantly between categories and, of course, the material itself.
The content of the document body is divided into sections. Sections are indicated by the Sh macro.
As described in the introduction, a section consists of its line arguments and all subsequent lines until the end of file or another Sh macro.
By convention, Sh arguments are capitalised. I'll describe conventional sections at length in the next chapter, as for the most part follow long-standing document conventions.
In general, the document body requires at least the NAME and DESCRIPTION sections, and usually the SYNOPSIS section as well. The first section must be NAME, optionally followed by SYNOPSIS. The DESCRIPTION section must follow either the NAME or SYNOPSIS.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
An mdoc document body is divided into sections. The names and ordering of these sections is dictated by convention extending to the Version 1 AT&T UNIX Programmer's Manual.
- The name section repeats the entry name and gives a very short description of its purpose.
- The synopsis summarizes the use of the program being described. A few conventions are used, particularly in the Commands section. Underlined words are considered literals, and are typed just as they appear. Square brackets ([]) around an argument indicate that the argument is optional. When an argument is given as name, it always refers to a file name. Ellipses ... are used to show that the previous argument-prototype may be repeated. A final convention is used by the commands themselves. An argument beginning with a minus sign - is often taken to mean some sort of flag argument even if it appears in a position where a file name could appear. Therefore, it is unwise to have files whose names begin with -. The description section discusses in detail the subject at hand.
- The files section gives the names of files which are built into the program.
- A see also section gives pointers to related information.
- A diagnostics section discusses the diagnostics that may be produced. This section tends to be as terse as the diagnostics themselves.
- The bugs section gives known bugs and sometimes deficiencies. occasionally also the suggested fix is described.
- The owner section gives the name of the person or persons to be consulted in case of difficulty. The rule has been that the last one to modify something owns it, so the owner is not necessarily the author.
These conventional sections haven't changed much over the years, although more sections have been added and several have changed with evolving UNIX operating system conventions. The full set of modern sections, and their order, is as follows.
Only the NAME and DESCRIPTION sections are required in the document body, although a SYNOPSIS should appear for most manuals as well.
Other sections may be necessary depending on the category. For example, RETURN VALUES is found for most category 3 and 2 manuals; while EXIT STATUS is found for most category 1, 6, and 8 manuals.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
As discussed previously, a section is begun by the Sh macro and continues until the end of file or another section.
What follows is a description of each required section: if your manual does not have the documented section, it should not be considered as well-formed. Do note, however, that some types of manuals lack the SYNOPSIS section.
The NAME section immediately follows the document prologue and is thus usually the first macro of the document body. It specifies the name of each documented component, and provides a brief description of the components as a whole.
The following is an example of the NAME section for a single utility, hi.
The Nd macro should consist of a single line without trailing punctuation or leading capitalisation. As a rule of thumb, this description should be a sentence clause in the imperative mood for commands and functions, or simply a noun phrase for file formats, devices, and miscellanea.
Example imperative:
Example noun phrase:
In the event of multiple named components, such as a function library or aliased commands, comma-separate each command but for the last. It's common to alphabetically order this listing.
Note that the punctuation should be separate from the macro argument. This allows the formatter to distinguish between the name and trailing punctuation.
The SYNOPSIS section, if specified, follows the NAME section. It specifies the calling syntax of a component, thus, it is necessary for functions and commands. The SYNOPSIS section has a layout dictated by convention, and depends upon the category.
For command manuals in category 1, 6, and 9, each command must have its full syntax stipulated.
This defines three optional arguments for the hi command: a flag, a flag with an argument, and an argument. Flags should be purely alphabetical, without regard to whether a flag takes an argument. Arguments should also be alphabetised.
Note that if your manual only documents one component, it's unnecessary to re-write the command name for Nm. If omitted, the last specified Nm in the NAME will be used.
Multiple commands are specified in the order they appear within the NAME section.
Since there are multiple Nm macros in the NAME section, it's necessary that we specify the name of each command. In this example, an additional command hi is specified, which has neither flags nor arguments.
Function libraries are more complicated, as they involve more diverse content. A function library SYNOPSIS section consists of all documented material, including header files, functions, variables, macros, and so on.
A minimum function manual consists of a single function call and the header file of its prototype (if in a language requiring header files, such as C):
The header file comes before those functions it describes. If one or more header files are required, list them in the order of inclusion in source files.
If multiple functions are documented, list them in the order they appear in the NAME section.
List any global variables with the Vt and/or Va macro following function prototypes.
Macro definitions, however, should come before the function prototypes. These use the Fd macro and must include the preprocessor directive for the macro.
Some manuals define a range of functions with differing header dependencies. In general it's not a good idea to group these within the same manual. However, if necessary, arrange the functions and variables underneath their header file In macros. These need not necessarily much with the NAME section ordering, but should be as close as possible.
This section documents the component itself, and is usually the longest. For commands, each command is described in detail along with its arguments. For functions, each function must be described along with its types and arguments.
A command or set of commands is documented in DESCRIPTION with a brief explanation of behaviour, default usage, then a list of arguments. Some utilities state default usage following the argument list; however, manpages beginning with these statements are more readable and economical.
If multiple commands are included, they should be listed in the order they appear in NAME and DESCRIPTION. Remember to specify a documented command, in this case, whenever invoking the Nm macro. Command exit statuses are documented in EXIT STATUS.
Functions do not share the The arguments are as follows convention that commands enjoy. Most often, a function is described in paragraph form.
A function with many variables, or complicated variables, may wish to choose the same listed-argument notation of commands.
Above all, you must be careful to document each argument to each function. Function return values are usually documented in RETURN VALUES.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:32 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
As discussed previously, a section is begun by the Sh macro and continues until the end of file or another section.
What follows is a description of each optional section. An optional section is not required for a well-formed manual, but may be necessary for a manual of a given type. For example, the EXIT STATUS section is not required, but is necessary for utilities.
For components describing an algorithm, or implementing a generic interface, it's at time useful to document the implementation itself. As this is not relevant to the calling syntax or description of a component, this is relegated to the IMPLEMENTATION NOTES section.
Consider a simple sorting function, mysort.
In general, IMPLEMENTATION NOTES is not used, and is thus rarely found in UNIX manuals.
Manuals describing functions (categories 2, 3, and 9) must use the RETURN VALUES section to document each function's return value. If a manual documents functions in a language without return values, or functions do not return a value, they need not use this section.
System calls (category 2) usually employ the Rv macro to stipulate a standard return value statement.
Note that the std flag is a required argument to Rv, for historical reasons.
For non-system functions, be as brief as possible.
Both commands and functions may be affected by UNIX environmental variables. These must be documented in the ENVIRONMENT section of the manual. Each variable should be listed as a Ev along with its effect on the component.
Some historical manuals use ENVIRONMENT VARIABLES instead of ENVIRONMENT.
Both commands and functions may also be affected by files, although this is mainly the purview of commands. These files should be listed in the FILES section in a tagged list.
This section is the dual of RETURN VALUES for commands in categories 1, 6, and 8. It documents the exit status of commands.
If your utility exits with zero on success and greater than zero on failure (the standard for most utilities), use the Ex macro.
More complex commands should document all possible exit status.
In many situations of casual use, the EXAMPLES section is the first visited in a manual. It should consist of concise, documented examples of the most common uses of your component.
For commands, the EXAMPLES section should contain a handful of common use cases. In general, these should consist of standalone invocations and, if the input and output correspond to other utilities, invocations as part of a pipeline.
Although the hello example is almost too trivial for documentation, consider if it were used to greet new users to a Unix system. Thus, a common example would be the following:
The Dl, used for one-line literal displays, is a common macro in the EXAMPLES section. For multi-line displays, use the Bd literal environment, usually with a default indentation with offset indent.
For functions and function libraries, it's more common to include a single, thorough source example than individual examples for each function. These always use the Bd literal display environment and an indentation.
Use terse syntax for your example, without error checking for functions not being documented, e.g., open or scanf.
Some manuals will use the vS and vE macros around source code. These are not mdoc and should be avoided in portable manuals.
If your component emits regular debug, status, error, or warning messages, document the syntax of these messages in DIAGNOSTICS.
Some historic manuals document function return values in this section, but modern practise is to do in RETURN VALUES or, if setting the error constant of the C library, ERRORS.
The Bl diag lists are most often used for documenting emitted messages.
This section is used exclusively by functions that set or return a regular error code. The most common use is for system calls (category two) setting error constants in the C library. In either case, this section should consist of a single list documenting all possible error codes. In the latter case, each error should be labelled within the Er macro.
This section consists of cross-references to other manuals or literature. It is a standard section in most UNIX manuals. Any components referenced in your manual should be duplicated here along with any other bibliographic texts.
For UNIX manual cross-references, use the Xr macro. These should be specified in a list ordered first by section, then by name. Non-terminal references should be comma-separated.
If your manual references other documents or literature, you may include them in this section within a bibliographic reference as well. The Rs block is used for bibliographic references. These should be specified after any UNIX manual cross-references.
Bibliographic references should be ordered by document title. Advanced references will be covered later in this book.
If your component references any standards literature, it should be mentioned here. Most standards (e.g., POSIX, ANSI, etc.) may be semantically noted with the St macro. When implementing standardised wire protocols, references to RFC and other literature should also be mentioned here. These differ from referenced standards in terms of being implemented versus referred.
If your component consists of deviations from a given standard, they should be mentioned in this section as well. Some historic manuals use a special COMPATIBILITY section for this, but this is discouraged unless when discussing compatibility with non-standard but common utilities.
This section has also been referred to as CONFORMING TO on some GNU/Linux manuals.
Some components have a historical basis: this should be included here. Keep this information terse and, above all, correct.
If your manual includes prior implementations of your component, for example, it's common to include the dates and developers of those prior implementations.
Another standard section for UNIX manuals is the AUTHORS section, which should always mention the current contact for the utility. The traditional text for this section is as follows.
However, in as e-mail addresses are a ubiquitous form of contact, it's considered good practise to use the correct semantic notation.
The term reference in this fragment should reflect the content of the manual.
The CAVEATS section is not often used. It consists of text relevant to unexpected (but unchangeable) behaviour of the component.
If the component has known bugs, they should be listed here. In some historic manuals, authors used this section to
list no bugs present
; however, this text can be misleading for machine-readers of manuals and should be avoided
in new manuals.
The SECURITY CONSIDERATIONS section is reserved for components whose deployment may be sensitive to security conditions, such as network daemons. It should include suggestions on security measures beyond the scope of the component.
This section was historically called SECURITY.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:33 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
By now we've discussed the mdoc language by way of example and a non-authoritative reference. I've even referenced some formatters, such as mandoc and nroff. In this part, I'll focus on the environment of mdoc: formatters, project integration, and so on.
After reading this part, you'll have a much better idea of how to read, write, and format mdoc on your operating system. The bias of this part, however, will be toward UNIX systems.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The most important part of the mdoc tools is the formatter, which compiles an mdoc document into an output format.
cat | [-benstuv] [file ...] |
All formatters must adhere to the general conventions set forth in the Version 1 AT&T UNIX Programmer's Manual, which details the terms that are in bold and those that are italicised (rendered with underlines in terminals).
Most formatters also support printer-friendly output, usually to PS or PDF. Some also include HTML or XHTML for web publication.
In this book, I'll focus only on contemporary formatters. Originally, mdoc, as a macro set for the roff language, was exclusively formatted by the troff and nroff utilities as distributed with BSD UNIX. Historically, troff was tailored for printers and graphical output, while nroff focussed on terminal output.
Most modern utilities, however, encompass both of these capabilities.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The GNU project wrote the groff utility as a reimplementation of ditroff, which encompassed the functionality of the historical nroff and troff utilities. The first version was released in 1990, and it is still actively maintained. groff is significant in that it is the predominant implementation of nroff and troff on contemporary UNIX operating systems.
On systems with groff installed, both troff and nroff invoke the underlying groff utility. It is able to produce the classical terminal and PS output, along with more recent support for XHTML, HTML, and PDF. It has strong support for non-ASCII output on supporting media. Consult your local groff manual all possible outputs via the T flag.
The mdoc implementation in groff was entirely re-written in version 1.17. Prior to this, input documents had some severe restrictions. Most notably, macro lines were limited to 9 arguments, Bl column macros had a restricted syntax, and displays such as Bd could not be nested.
The groff utility is supported on both UNIX and non-UNIX operating systems.
Paging a manual to a UNIX terminal:
To strip the escape-character encoding of output to create clean, printable ASCII output:
Generating PS output:
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:33 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The mandoc utility is a specialised mdoc formatter: although it also supports some other UNIX manual formats, it does not accept general-purpose roff input. Development began in 2008 to replace groff with an ISC licensed, high-speed reimplementation.
mandoc may be invoked as troff or nroff as its command-line arguments overlap. It supports the classical terminal and PS forms, and has very strong support for HTML and XHTML. PDF output is supported as well.
By considering mdoc as a special language, mandoc compiles its input into a representation of semantic content. This diverges from troff and its descendants, which compile mdoc into its basis form, roff, then into a presentational representation. As such, mandoc is also used for semantically querying manual content and for the rigorous validation of manuals.
The mandoc utility is supported on both UNIX and non-UNIX operating systems.
To validate a manual:
To page a manual in the current locale (if supported) so that non-ASCII special characters render as proper glyphs:
Produce HTML with a style-sheet:
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:33 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
mdoc documents fit perfectly into a UNIX development environment. In general, this is defined by a group of source files that produce executables as compiled and linked by make, called a build system. Sources are usually version-controlled using cvs, called revision control.
In this section, I discuss methods for integrating mdoc documents into a source-controlled build environment. I'll focus on mandoc as a formatter, but provide stubs for using nroff.
Our examples will consider a project building a utility foo from its single source file foo.c.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
On modern UNIX systems, the method for build management is overwhelmingly the make utility. Although there are two disjoint make implementations in use (namely by for GNU and BSD UNIX systems), I examine the syntax common to both.
In this section, I'll assume the file Makefile already exists, and is used to build a system where one wishes to incorporate mdoc files.
A rigorous analysis of this syntax is beyond the scope of this book (do consult your system's documentation for the make command with man make). It defines the targets all, clean, and install: build, clean up, and install the utility, respectively.
First, it's important to settle upon an input and output file extension, as make tracks file status by way of comparing the time-stamp of a file's input (which may be multiple files) and output (called the target). In short, if the target is older than any of the input files, it is rebuilt. The input files are created and maintained by the developer; the output files are built by make.
For simplicity, I use the standard .1, .2, and so on convention for the target (the output). I then use .in.1 and so on for input. Thus, it is necessary to notify the make utility of these new extensions before all other rules;
If more categories are built, these would need to be added (e.g., .in.3 .3, etc.).
A build rule is required to translate input to output. Let's begin with a general rule to establish that the mdoc syntax is correct. We'll add this to the target building the main system: this way, all changes to the mdoc input file will be syntax-checked when make is invoked. We'll use mandoc to syntax-check the document.
We also need to build the target and clean it. Assume that foo.1 is the output file and foo.in.1, the input.
Putting these together, the new Makefile is as follows:
Let's build an HTML manual with the make www rule. For simplicity, we won't install this file; it's merely for instruction. This rule will translate the built manual foo.1 into an HTML file foo.1.html.
Let's let our rule include a CSS file. Note that the traditional nroff utility doesn't include HTML output.
The target rule is simply as follows:
The reason for building from foo.1 instead of foo.in.1 is that we may wish to postprocess the foo.1 file after it has been created. However, this is entirely decision of the programmer.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:33 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Several examples in this book have covered the topic of integrating mdoc documents into revision control systems. In this section, I cover the few steps required to integrate these documents with cvs.
Assume a file foo.in.1. consists of our mdoc source. I assume, for simplicity, that it is licensed with the ISC license and copyright-protected, both of which lead the document as a series of comments.
The first step is to add a useful message to the top of the file as the version of the file. This is standard practise in revision controlled files.
Make sure that the first line has a tab character between the leading comment marker and $Id$. This sequence is filled in with the file's last editor, revision, and checkin date.
Some cvs servers (e.g., those in NetBSD and OpenBSD) support the Mdocdate sequence. This is filled by in cvs with the check-in date.
In performing these two steps, the file's last-modified date and source identifier will be properly filled in by the cvs server. If your server does not support Mdocdate, you will have to maintain the date by hand, or possibly override the build rule for your file.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2012/01/01 15:13:33 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Since the UNIX manual has such a rich history of development, it would be strange were it not to have a significant body of supporting tools for composition. In fact, Version 7 AT&T UNIX was bundled with a set of tools, the UNIX Writer's Workbench, specifically for composing documents — spell-checking, grammar-checking, and so forth. Even then, sophisticated editors had long since been a pivotal part of UNIX.
Although the ed, ex, and vi editors are stipulated by POSIX.1-2008, the Writer's Workbench has long since been discontinued. Even spell-checkers are not standard across modern UNIX systems, although many high-quality editors and composition utilities may be downloaded.
In short, the situation is messy: composing mdoc documents (and in fact any roff document) is tricky to do without downloading special software.
However, one of the best parts of mdoc is that none of these specific tools are necessary: an mdoc document is just a text file and may be composed in any editor, and its text contents analysed by any utility smart enough to ignore mdoc syntax.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Since mdoc is an ASCII-clean text format, it may be edited in any text editor. In this section, I introduce a variety of editors available on most UNIX systems. Since this topic is exhaustively covered in most any introductory UNIX book, I only introduce portable editors.
The ed utility is a line editor standardised by POSIX.1-2008. The concept of a line editor may be familiar to those who have used a typewriter or teleprinter, where only the current line of input may be edited (or viewed, in some cases) at a time.
Its inclusion is largely for historical reasons, as using ed can be a frustrating experience for those accustomed to visual editors. I don't recommend using this utility for mdoc, although its function as a line editor makes it perfect for the task.
The vi and ex editors were powerful additions to the UNIX system: they allowed visual editing of files (versus line editing as with ed). This editor has inspired a raft of clones, but being standardised, some form of the utility is available on all UNIX systems. Furthermore, the vim clone of vi comes bundled with mdoc syntax highlighting.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The right or wrong spelling of terms in technical documents is very important. Thus, it's always important to carefully spell-check your manuals, making sure that both technical and general terms are correctly spelt.
Unfortunately, spell-checking a mdoc document is fairly difficult, as the spell-checker must have some knowledge of the language structure to discern text from macros. Consider spell-checking checking the following snippet.
By now we understand that Fl and Ar are macros. But it's unreasonable to expect a spell-checker to do so. Thus, spell-checking manuals often raise many false-positives.
The spell utility is distributed with many BSD UNIX operating systems as a simplistic spell-checker. In fact, it was first distributed with Version 6 AT&T UNIX. spell preprocesses its input with deroff, another historic utility with some functionality of stripping roff instructions from files.
To print a list of all unknown words, you can explicitly invoke deroff and spell as follows:
A utility distributed with mandoc, demandoc, is significantly stronger than deroff. If available, it should be used instead. It has the same calling syntax of deroff.
You can also maintain a per-manual list of technical terms by using additional word lists. In the case of file.1, consider a sorted list of words file.1.words we're maintaining with special words (such as names). We could then augment a make rule to automatically make sure additions are spell-checked.
This snippet first makes the build of file.1 depend upon its local word file, file.1.words, a sorted list of words to ignore. When file.in.1 or file.1.words is updated, the rule is executed. It first makes sure that file.in.1 is well-formed, then spell-checks it against the ignored-words file.
The same can be accomplished on systems without mandoc.
Another common spell-checker is ispell and its GNU replacement aspell. I do not suggest using these utilities because of their poor internal support for mdoc. It's possible, however, to send stripped files for checking in a manner similar to spell:
Or with deroff:
Both ispell and aspell also have a pipe mode for more meaningful output:
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
The utilities bundled with the historical UNIX Writer's Workbench also allow for grammar and style-checking of mdoc documents — indeed, any document.
Like with spelling, these utilities cannot handle mdoc constructions. Unlike spelling, grammar depends on correct flow of terms. To wit, one must fully process input mdoc documents before passing them to such checks.
The diction utility is rarely distributed with default UNIX operating systems, although it may be separately downloaded. The input to diction is best when it consists of well-formed sentences, which only appear when manuals are post-formatted.
Alternatively, with nroff:
This first strips the text decoration (underlined and bold text) from nroff or mandoc with col. The header is then stripped with tail. Finally, the formatted output is fed to the diction utility, which analyses text for readability.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
This part consists of appendices to the book. This will link heavily to external resources, although care is taken to provide enough information to make off-line reading meaningful.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
UNIX manual pages. System documentation for UNIX systems. Usually viewed using the man utility, which pages formatted manual documents using to the screen. Man pages are formatted by a utility such as nroff or mandoc.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
This table consists of brief descriptions of mdoc macros referenced in this book (meaning: this is not a complete list!), then links to full descriptions according to the mdoc reference on http://mdocml.bsd.lv. Disclaimer: I'm the principle author of this system.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2011/11/04 01:06:28 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
This is a list of all commands mentioned in this book and how to find them on-line. All referenced utilities are open-source.
Contents | Next | Home | History |
Last edited by $Author: kristaps $ on $Date: 2014/04/07 09:33:39 $. Copyright © 2011, Kristaps Dzonsons. CC BY-SA.
Automatically generated from Practical UNIX Manuals: mdoc, Kristaps Dzonsons, CC BY-SA.