Creating a hybrid document management system

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
post-thumb

Published on 18 January 2024 by Andrew Owen (6 minutes)

This week, I listened to a podcast where the host opened with a sincere apology and the guest made some interesting points about the current social media landscape. Volume is king. Trying to do journalism without the backing of a large publication (particularly its fact-checkers and legal team) is both hard and risky. Journalists and content creators alike have to consider how what they do reflects on their personal brand. And there is a problem with the conflation of opinion and facts. Alternative facts are a thing (a bad thing).

This gave me pause for thought. I don’t explicitly state my credentials on the blog, but you can read my profile. When I write about writing, it’s from a lifetime’s experience. With technical documentation, it’s fifteen years as a technical communicator. And with software development, it’s the same fifteen years in the industry, a computer science masters’ degree and being embedded in research and development departments. But a lot of what I write is not pure facts. It’s informed opinion. There are other perspectives, and the solutions I present are by no means the only way of doing things. With that said, these are my thoughts on how I would build a clean sheet document management system if I was doing it today.

In previous articles, I’ve covered migrating from Markdown to structured authoring, writing in XML with DocBook and why I think code tools aren’t always a good choice for writers. If you have a well staffed technical communications department and your writers can keep up with the demands of continuous integration and deployment, and you have the budget, then you probably should go with a dedicated publishing platform. But according to enterprise Linux vendor SUSE, many documentation projects are moving in the other direction, specifically from DocBook (structured XML) to Asciidoc (a markup language derived from the DocBook schema):

With products and solutions becoming increasingly complex, documentation needs to rely more and more on external contributors. The hurdle to contribute is significantly lower when using AsciiDoc as it is with DocBook. Such a move not only requires converting the DocBook sources to AsciiDoc, but also changing the project setup, the toolchain and writing new stylesheets.

And for that reason, in 2018 it added Asciidoc support to its open source DocBook Authoring and Publishing Suite. It includes:

  • Stylesheets for automated multichannel publishing (HTML, PDF, EPUB and so on).
  • Conditional text, customer-specific builds and translation support (using profiling).
  • Validation, including XML schema, spelling and broken link checking.
  • Automatic image resizing based on output target.

Because AsciiDoc is a subset of DocBook, it can output DocBook. You can convert DocBook to AsciiDoc with DocBook Rx, although you may find you get better results with Pandoc. However, in my experience, whichever tool you use, meta-data tends not to survive the round-trip process. With traditional publishing platforms, the content is in XML format and contributors have to learn a new tool set. But when contributors outnumber writers, wouldn’t it make more sense to work the other way around?

If your writers are comfortable using developer tools like VS Code, then that’s great. But if not, then a headless CMS could be the answer. Except that I’m not aware of any that support AsciiDoc. Although TinaCMS is open source, so if you have a team of developers with time on their hands, you could add it. There are browser and IDE plugins that provide a live preview, but writers still have to work in plain text. But there is a web-based editor that enables writers to work in DocBook without ever having to see an XML tag. It’s a JavaScript version of XML Mind’s XML Editor that runs on your local network. So as long as one of your writers is comfortable with IDEs, the rest of them can write in a friendly editor, and you can convert the DocBook to AsciiDoc.

AsciiDoc is more widely supported than I first realized. GitHub has built-in preview support. Indeed, when it comes to automating deployment, if your docs are already in GitHub then you can use Actions to trigger deploys on code commits (or any other criterion). Also it’s trivial to add support to static sites generated with Hugo and hosted on Netlify (like this one). There’s also a dedicated AsciiDoc multi-repository documentation site generator called Antora. It has a number of features more commonly associated with traditional documentation management systems:

  • Integrated version management and branch support.
  • Cross-references decoupled from file systems, environments and URLs.
  • Metadata and taxonomy support.

My one concern with Antora is that it doesn’t appear to have localization support. Even if you have no intention of ever translating your content, it’s better to have a system that can handle it and not need it than the other way around.

If I was putting together a new document management system where the majority of contributors are developers, here’s a summary of what it might look like:

  • A central documentation repository in GitHub with automated deployments using Actions.
  • DAPS for conditional text, translation, customer builds and PDF output.
  • Hugo with Asciidoctor for HTML5 sites.
  • LanguageTool for spelling, grammar and style checking.
  • VS Code with the AsciiDoc and LanguageTool plugins for developers.
  • XML Mind XML Editor Web Edition with the LanguageTool browser plugin for non-developers.
  • Mermaid for diagrams.
  • Pandoc for converting DocBook to Word.
  • Affinity Suite for marketing content (imports Word).
  • A Lucene derivative for search.
  • Matomo for analytics.
  • DeepL for machine translation.
  • API scripts to publish to wikis (such as Confluence) and knolwedgebases (such as Zendesk).

You’ll have noticed Affinity Suite on the list and may be wondering about it. This is an alternative to Adobe Creative Suite that doesn’t use a subscription pricing model. Ideally, you want to maximize content reuse across an organization. But marketing folks have different needs than technical writers. In this case, converting AsciiDoc content to Word makes it easy to import into Affinity Publisher. And illustrators will be much happier with Affinity Designer and Publisher than open source alternatives like Krita and InkScape.

In conclusion, while this may sound like a very custom setup, DocBook is a well established standard and the tools are mature. Although initial setup may require some specialist knowledge, it should be well within the capabilities of any DevOps engineer to maintain such a system. And if somewhere down the line you do suddenly find yourself with a well staffed technical communications department, there is an easy migration path to off-the-shelf document management solutions.