Revisitng DocBook

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
post-thumb

Published on 8 September 2022 by Andrew Owen (5 minutes)

The Darwin Information Typing Architecture (DITA) and DocBook are two XML-based authoring frameworks. I strongly prefer DocBook. Today’s article is an update of an article on the subject that I originally wrote for the Spring 2011 issue of Communicator. I may have been on to something because in 2014, a group of DITA specialist gave up consulting to create their own component content management system (CCMS) based on DocBook.

As a tech writer, I was never much of a fan of XML-based authoring until I discovered XMLmind’s XML Editor. It gives you a WYSIWYG view of your text, has keyboard shortcuts, and you rarely need to worry about tags. It’s written in Java and is free for personal and open source use. It can be used for DITA or DocBook But there’s a time and resources cost associated with using DITA; DocBook on the other hand, gives almost all of its benefits without the hassle.

An introduction to DITA

DITA is now over 20 years old. It was created by IBM to replace its own Standard Generalized Markup Language (SGML)-based document markup language. The Organization for the Advancement of Structured Information Standards (OASIS) defines DITA as both a set of document types for authoring and organizing topic‑oriented information and a set of mechanisms for extending, and constraining document types.

The Darwin part, a reference to the naturalist, represents the idea of inheritance. Specialized elements inherit the properties of base elements. This concept will be familiar to developers, and DITA could be seen as object‑oriented authoring. The Information Typing part refers to topic-based authoring. You get three topic types to start with; concept, task and reference. And that’s about it.

DITA is a framework that enables you to build a solution and if you go the DITA route you should expect to spend many months building a system before you can start using it to write documents. For a company with tens of thousands of employees and hundreds of writers that need to impose a common structure on its documents, perhaps this makes sense.

About DocBook

DocBook is now over 30 years old. It originated at O’Reilly & Associates, publishers of the ubiquitous computer books with covers featuring animal woodcuts. It was designed for documenting computer systems, but it’s flexible enough to deal with any subject. For example, a prominent British book publisher uses a DocBook-based system to deliver electronic versions of its most popular print titles, including everything from cookbooks to guidebooks.

OASIS also manages the DocBook standard and has the following to say about its purpose:

Almost all computer hardware and software developed around the world needs some documentation. For the most part, this documentation has a similar structure and a large core of common idioms. The community benefits from having a standard, open, interchangeable vocabulary in which to write this documentation. DocBook has been, and will continue to be, designed to satisfy this requirement.

From SGML to XML

SGML was introduced to solve a number of documentation problems, but its main feature is the separation of content and style. It does this by using tags, descriptive terms in angle brackets, that surround plain text. These tags are interpreted by an application that creates a formatted version of the document. XML and HTML are both derived, at least in part, from a simplified version of SGML. But unlike HTML, XML must be well-formed; that is, opening tags must have matching closing tags in the right place. XHTML was an attempt to make HTML well-formed, but it lost the popularity contest to HTML5.

Style sheets and output

The neat thing about XML is that you can perform transforms on it using XSLT (a topic I’ve covered before) to produce other XML docs, HTML, plain text or eXtensible Stylesheet Language:Formatting Objects (XSL-FO). Formatting Objects Processors such as Apache FOP or RenderX can create a PDF from an XSL-FO file. I use ASCIIdoctor FOP.

If you’re using DocBook to create a printed document, then you’ll need a good XML style sheet. Fortunately, DocBook provides a library of style sheets. Editing style sheets used to be a real pain point. But there are now graphical style sheet editors such as Altova Stylevision and Arbortext Styler. Cloud-based DocBook solutions typically include a style sheet editor.

XPath, XPointer and XInclude

Without getting into an in-depth discussion of XML, I do want to mention a few tools that can be helpful for single-sourcing and localization:

  • XPath enables you to navigate the tree of an XML document and select nodes that match a given set of criteria.
  • XPointer is like XPath for media.
  • XInclude enables you to assemble an XML document from other XML documents, including translations or boilerplate text.

These can be used to support the key features of a modern documentation system (internationalization, localization and collaborative authoring) with the source contained in a single repository. In my experience with DITA, I had to clone the main document repository for every translation. The other reason DITA was favored over DocBook was topic-based authoring, but that’s been supported in DocBook 5 for over a decade.

Conclusion

If your current documentation tool set meets your needs, stick with it. You can use DITA for single-sourcing, but first you’d have had to re-implement the rich tag set of DocBook, and then create the style sheets from a much more rudimentary starting point. If anyone ever suggests adopting DITA, you should seriously consider using DocBook instead. Unless you’re looking at creating a custom system from the outset, DocBook will enable you to achieve the same outcomes as DITA over a shorter timescale, at a significantly lower cost.

Update

I recently saw a blog article from the product marketing manager of a company that sells an XHTML-based document editor that I’ve used on occasion and has recently acquired a DITA CCMS. For legal purposes I’m going to assume you know which company I’m talking about, but it’s not really important. My honestly held opinion is that the article is a complete hatchet job on DocBook, attempting to pass itself off as an objective comparison. It made me question if I’d been fair to DITA in this article, but I think I have. I’ve used DITA-based products from a variety of vendors in a number of companies. And every time I would rather have been using DocBook.