Where do UNIX manpages come from? Who introduced the section-based layout of NAME, SYNOPSIS, and so on? And for manpage authors: where were those economical two- and three-letter instructions developed?
The many accounts available on the Internet lack citations and are at times inconsistent. In this article, I reconstruct the history of the UNIX manpage based on source code, manuals, and first-hand accounts.
Special thanks to Paul Pierce for his CTSS source archive; Bernard Nivelet for the Multics Internet Server; the UNIX Heritage Society for their research UNIX source reconstruction; Gunnar Ritter for the Heirloom Project sources; Alcatel-Lucent Bell Labs for the Plan 9 sources; BitSavers for their historical archive; and last but not least, Rudd Canaday, James Clark, Brian Kernighan, Douglas McIlroy, Nils-Peter Nelson, Jerome Saltzer, Henry Spencer, Ken Thompson, and Tom Van Vleck for their valuable contributions.
Please see the Copyright section if you plan on reproducing parts of this work.
Saltzer wrote the RUNOFF utility for MIT's IBM 7094 CTSS operating system in the MAD computer language. Its legacy is considerable: not only do contemporary manpages inherit from RUNOFF, many of them, in fact, use instructions identical to those specified in the original RUNOFF manual.
Input generally consists of English text, 36O or fewer characters to a line. Control words must begin a new line, and they begin with a period so that they may be distinguished from other text. RUNOFF does not print the control words.
Of the many abbreviated RUNOFF control words, macros such as
br are still common-place. According to Saltzer and the source
literature, the syntax of RUNOFF inherits loosely from the prior DITTO, MEMO, and MODIFY utilities
by M. J. Leslie Lowry, Fernando J. Corbató, and J. Richard Steinberg, 1963. The original purpose
of RUNOFF was to format Saltzer's doctoral thesis
While working at AT&T Bell Labs' Whippany centre, Canaday led a porting effort of the CTSS at MIT to the GE-635 (in 635 assembly). The RUNOFF utility is suspected to be part of this port. The ported CTSS was originally intended as a prototype (to be replaced by Nike hardware), but ended up being used for five more years. No sources could be located for this CTSS port.
Little is known about this speculated port of RUNOFF, only that it was called roff and probably ran on Bell Labs' GCOS-II GE-635. Ritchie is commonly cited as participating in this port, and this is not disputed. Canaday is also mentioned as an author, though this is not the case by his own account. Both McIlroy and Saltzer speculate that this version, if written, was likely in BCPL, much like runoff.
In 1967, Madnick ported the RUNOFF code to the IBM CP67/CMS at IBM as SCRIPT. The documentation of SCRIPT explicitly mentions the backspace-encoding convention used to this day by manpage formatters on UNIX terminals (of course, this was common practise in mechanical type-writers before then):
Thus the backspace key allows underscoring and overprinting at the terminal for SCRIPT files. The logical backspace character prints only when entered and does not take up a column in the record; it logically backspaces one column...
Source code for the original re-write of RUNOFF could not be located, although a considerable amount of documentation exists for this utility. The dates of Madnick's porting derive from his publication and independent accounts.
In 1969, McIlroy released an influential
BCPL port of RUNOFF
to extend the runoff
model to the GECOS GE-645
computer at AT&T Bell Labs, Murray Hill. He did not refer to the CTSS RUNOFF source code in writing runoff, nor any other
speculated derivatives of Saltzer's utility. The progress of this utility
subsequent 1969 is recorded in the Multics BCPL
source as ported by R. F. Mabee:
The first ROFF for Multics was written in March, 1969, by Doug McIlroy of Bell Labs. Art Evans made extensive modifications to it in May and June, 1969, adding many comments and making various changes. Footnoting added by Dennis Capps in 1970. Maintained by Harwell Thrasher in 1971. Many new features added and bugs fixed by R Mabee in 1971-1972. RUNOFF and BCPL were brought over to the 6180 Multics (from 645) in May of 1973 by R F Mabee.
McIlroy's port is at times referred to as runoff, at
times as roff. The reason for renaming is not entirely clear:
I see that I
called it runoff in 1969. By 1971 it was roff. Now
I'm not so sure I got the name from Morris. Conceivably it came from Thompson, who was big on shortening names. My 1971 description came in January;
the first ediion [sic] Unix manual is dated December 1971.
The compose utility, né runoff, was a port of
RUNOFF to PL/1 by Capps for the Multics
Honeywell 6180. It was later tuned and improved by Ed Wallman. Tom Van Vleck, Wallman's manager in
mid-1978, recounts the porting effort
to produce photo-typeset manuals and eliminate dependence on
BCPL. (Note: the primary source
erroneously refers to Ossanna as writing the
BCPL runoff; it was in fact McIlroy.)
After being exposed to RUNOFF while at MIT in summer 1966, Kernighan wrote a port in Fortran (for an IBM System/360) while working on his doctoral thesis at Princeton (to format the thesis). By his account, the system was used for five more years by the student agency. The punch-card source has long since been lost.
Thompson wrote a PDP-7 port of either the BCPL runoff or directly from the CTSS RUNOFF. This fork was an evolutionary dead-end, replaced by the PDP-11 roff(1) which was written at around the same time. The source code for this port has long since been lost.
Most of the programming team for Multics continued working with UNIX at AT&T Bell Labs, Murray Hill, so it's no surprise that Multics runoff was incorporated as UNIX roff(1) in Version 1 AT&T UNIX, 1971. This was a PDP-11 assembly-language port of the BCPL runoff. According to McIlroy, Ossanna convinced the AT&T patent department to use UNIX and roff(1) for formatting patent applications, and according to Thompson, that usage was the justification for the PDP-11 purchase.
There are many differing accounts of who wrote this version, but most settle on Ossanna, Ritchie, and Thompson. However, the first-hand accounts of McIlroy, Kernighan, and Thompson settle on Ritchie as the primary author. This is corroborated by Ritchie's expertise in BCPL, as roff(1) was based upon McIlroy's BCPL runoff.
Unfortunately, the first roff(1) version is lost. Some sources from Version 1 AT&T UNIX have been reconstructed from old tapes, but roff(1) was not among them. All that remains from this time are manual entries and first-hand accounts. However, the PDP-11 source for Version 5 AT&T UNIX still exists, and by first-hand accounts, is a modified form of the original.
Version 1 AT&T UNIX roff(1) is also notable, regarding manpages, for its release of the First Edition UNIX Programmer's Manual, which defines the manpage structure and layout enjoyed in the present day. Thompson conceived of this convention, inspired by the Multics MSPM, itself inspired by the CTSS manuals.
The source code of Version 1 and 2 AT&T UNIX manual pages may not have survived; only typeset versions are known.
Ossanna took over the PDP-11 roff(1) and built nroff(1), which focussed on outputting text onto terminals, for Version 2 AT&T UNIX. The exact motivations for this are unknown, but are vaguely agreed as evolving the utility and language to support more advanced formatting.
Unfortunately, a source record of the original effort is entirely lost; however, later versions of the PDP-11 source (for Version 6 AT&T UNIX) and manual page (Version 3 AT&T UNIX) are available.
Thompson mentions that nroff(1) introduced the notion of programmable macros. This is corroborated by the Version 6 AT&T UNIX source for nroff(1), where files under the prefix /usr/lib/tmac. are parsed for macros (see nroff.8 in the Version 6 AT&T archive). This same behaviour is not reproduced in the roff(1) sources from the same time, so it makes sense that this is the first appearance of macros.
On the other hand, the manual pages for Version 3 AT&T UNIX (Feb. 1973) were still written in raw roff without using any macro set.
Ossanna started writing troff(1) from his PDP-11 nroff(1) sources for Version 4 AT&T UNIX. It's widely asserted that the driving motivation was to create output for the CAT phototypesetter.
Original sources for this version of AT&T UNIX are also lost; however, the manual still exists, and records of the PDP-11 source exist in raw tape data for Version 6 AT&T UNIX (which apparently couldn't be reconstructed).
From Version 4 AT&T UNIX (Nov. 1973) to Version 6 AT&T UNIX (May 1975), a simple macro set was used for writing manual pages, but the macros were completely incompatible with macro sets used today.
In around 1975, Ossanna re-wrote troff(1) in the C-language. However, in 1977 this work was discontinued and sources were untouched for nearly two years until resumed by Kernighan:
After Joe Ossanna died in late 1977, troff was static for probably close to two years, since no one had the time and courage to touch it. It was entirely in C at that point, somewhat over 9000 lines as I remember. I simply modified it, gradually getting around some of the limitations on fonts, making use of dynamic memory, and generating "device-independent" output for devices whose properties were specified in dynamically loaded files.
The result is the first intact UNIX troff(1) source. By Version 7 AT&T
UNIX, both nroff(1) and troff(1) are built from the
same C sources, separated by preprocessor conditionals, while roff(1) was still
built in its PDP-11 assembly. The ditroff(1) name origin is unknown. Kernighan writes,
I'm pretty sure that
I only talked about a "device independent troff"; the name "ditroff" came from somewhere else, and I've
never been fond of it.
Kernighan's device-independent troff(1) was repackaged in commercial AT&T (and derivative) UNIX systems for years to come. In circa 1978, the DWB featured troff(1) as its mainstay; later, the WWB bolted on many additional word-processing utilities. These applications were repackaged in 1989 by USL, a subsidiary of AT&T Bell Laboratories. The DWB tools were then bought by SoftQuad in circa 1978, which rebranded troff(1) as sqtroff(1).
When Douglas McIlroy edited volume 1 of the manual pages for Version 7 AT&T UNIX, he revised the manual page macros substantially, first designing and implementing most of the macros that are still used in the man(7) language today.
The Plan 9 operating system, initially released in 1991 from AT&T Bell Labs, Murray Hill, is a research system extending the UNIX model. It included ditroff(1), which was steadily improved by Kernighan to include features such as UTF encoding. The troff(1) for Plan 9 was not free software until the Third Edition in June, 2000, when sources were licensed under the Plan 9 license. It was again re-licenced in 2002 under the Lucent Public Licence 1.02.
In 2005, Sun Microsystems published an CDDL-licenced variant of their Solaris operating system called OpenSolaris. This included a re-licenced descendant of troff(1) as imported into AT&T UNIX System V UNIX in 1983. Ritter incorporated this software into the Heirloom project in August, 2005.
The GNU troff is popularly considered to be the most wide-spread in modern UNIX installations. It's bundled by default on most GNU/Linux operating systems. This port, renamed groff, was written by James Clark in 1989 specifically for the GNU project. At this time, no open-source implementation existed.
The groff port was based purely from troff documentation as shipped with SunOS 4.1.4, and was written in C++ on Sun 4/110 (initial implementations of the GNU C++ compiler were on Sun, making it an ideal starting point).
In 1991, Spencer sent a terse mail to USENET comp.sources.unix indicating that he'd written an interpreter in the AWK language for the man and ms macro packages documented in Version 7 AT&T UNIX nroff(1) and ditroff(1).
This is awf, the Amazingly Workable Formatter – a "nroff -man" or (subset) "nroff -ms" clone written entirely in (old) awk.
According to Spencer, awf was necessary for the portability of C News, a USENET news server written by Spencer and Geoff Collyer in 1987. The project wished to distribute manpage sources, but since ditroff(1) sources were still license-encumbered, needed to guarantee portability to un-licensed UNIX systems.
Abell ported awf(1) to C in 1991 while at
Purdue. The motivation of this port was to
a C language version that would run on small systems, particularly MS-DOS ones.
Like awf, mandoc(1) primarily reads ditroff(1) macro files (not general ditroff(1) input), although it has some capability for generalised input. It is the first fully semantic parser for manpages, exposing the annotated content of parsed documents. groff(1) manuals were the predominant basis for this port.
Disclaimer: the author of this document originally wrote mandoc(1).
May have ported RUNOFF to the GE-635 while at AT&T Bell Labs, Whippany.
Wrote compose while at Honeywell.
Possibly ported RUNOFF to a GECOS GE-635.
The content of this page are available under Creative Commons' Attribution Share-Alike: deriving works must attribute the author(s) and use a similar license. However, cited communications (and copies of source materials), available here, are copyright the sender/author, and not covered by this license.