4. Document Type Definitions
(Note: to keep the explanation simple, most of this
section is going to tell some lies, mainly by omitting a lot of
history. Truthfulness will be fully restored in a
following section.)
DocBook is a structural-level markup language. Specifically, it
is a dialect of XML. A DocBook document is a hunk of XML that uses
XML tags for structural markup.
In order for a document formatter to apply a stylesheet to your
document and make it look good, it needs to know things about the
overall structure of your document. For example, it needs to know
that a book manuscript normally consists of front matter, a sequence
of chapters, and back matter in order to physically format chapter
headers properly. In order for it to know this sort of thing, you
need to give it a Document Type
Definition or DTD. The
DTD tells your formatter what sorts of elements can be in the document
structure, and in what orders they can appear.
What we mean by calling DocBook an `application' of XML is
actually that DocBook is a DTD — a rather large DTD, with
somewhere around 400 tags in it.
Lurking behind DocBook is a kind of program called a
validating parser.When you format a DocBook document, the
first step is to pass it through a validating parser (the front end of
the DocBook formatter). This program checks your document against the
DocBook DTD to make sure you aren't breaking any of the DTD's
structural rules (otherwise the back end of the formatter, the part
that applies your style sheet, might become quite confused).
The validating parser will either bomb out, giving you error
messages about places where the document structure is broken, or translate
the document into a stream of formatting events
which the parser back end combines with the information in your stylesheet
to produce formatted output
Here is a diagram of the whole process:
The part of the diagram inside the dotted box is your formatting
software, or toolchain. Besides the obvious and
visible input to the formatter (the document source) you'll need to
keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
mind to understand what follows.