Exporting a Confluence page to Word XML

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
post-thumb

Published on 25 May 2023 by Andrew Owen (2 minutes)

I’ve written previously about exporting release notes from Jira in XML format. That was relatively trivial. This week, I needed to export a Confluence page in Word XML (.docx) format. That turned out to be much more involved. On any page in Confluence, if you click More Actions ( ) and then select Export > Export to Word, you’ll get a document with a .doc extension that Word can open. But it’s not what most conversion tools would recognize as a standard Word doc. And if you need it in XML format, you still have to open it and then resave it. And you also have to interact with the web page. I wanted a better solution.

The caveat is that this will only work on macOS or Windows, and you will need Word installed on the machine where you run the script. You’ll also need to have Python 3, cURL, pip and doc2docx installed. As always, I recommend doing this with a package manager (Homebrew on macOS or Scoop on Windows). I’ll give the examples for macOS, but on Windows you can simply replace brew with scoop.

Install the tools

  1. Install cURL: brew install curl.
  2. Install Python: brew install python.
  3. Install pip: curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py.
  4. Install doc2docx: pip install doc2docx.

Get an API key

  1. In Confluence, click your avatar and select Settings.
  2. In the Your settings section, click Password.
  3. In the API token section, click Create and manage API tokens.
  4. Click Create API token.
  5. Enter a Label and click Create.
  6. Click Copy and store the token somewhere secure. After you click Close, you won’t be able to access the token again.

If you ever need to revoke a token, you can do it from this page.

Get the page ID

Navigate to the page in Confluence. The URL should look something like this:

https://<instance>.atlassian.net/wiki/spaces/<space-id>/pages/<page-id>/<page-name>

Take a note of the page-id.

Create the script

Now you can put it all together in a script.

curl -u $email_address:$api_token -H "Content-Type: application/msword” \
"https://<instance>.atlassian.net/wiki/exportword?pageId=$page_id” -o word.doc
doc2docx word.doc word.docx

On Windows, doc2docx uses win32.com, while on macOS it uses JXA. It automatically opens the document in Word and then saves a copy in docx format. You could extend the script to parse over a set of known IDs. But because of the dependency on having Word installed on the system, there’s no easy way to automate the script on Linux. Although you might be able to get it to work using Wine.