Creating an information architecture with Chat-GPT 5

Published on 27 August 2025 by Andrew Owen (6 minutes)

Earlier this month, OpenAI launched GPT-5, its latest and most advanced AI model, for all ChatGPT users (including free users). The company claims the model is smarter, faster and more useful, particularly across domains including writing, coding and health care. It also claims GPT-5’s hallucination rate, where the model fabricates answers, is lower. This is the first time free tier users have been given access to a reasoning model (if they hit their usage cap, they’ll be given access to something called GPT-5 mini). GPT-5 support is also included in Microsoft Copilot.

The response has been mixed. Some have noticed an incremental improvement, while for others it’s been a sea change. I fall into the second group. Full disclosure: I work for a company that has bet the farm on AI. I still dislike the term AI: machine reasoning is not the same as human intelligence. I still view large language model generative AI as basically a really good natural language processor hooked up to a really good predictive text generator. I have concerns about the energy consumption of using AI to perform trivial tasks. But I can’t deny the results.

In my role as a technical documentation lead, I need to wrangle a vast amount of information. I’ve long been an advocate for adopting formal information architectures or taxonomies. But the process is usually extremely time-consuming, typically from three to nine months, depending on the complexity of the information. This includes a lot of user research, often involving a process called card sorting involving up to 10 participants. The steps include preparation, recruitment, making the cards, writing the instructions, data collection and analysis. The aim is to identify user mental models, create intuitive information hierarchies, consider labeling, and nomenclature, validating existing structures and resolving navigation and discoverability concerns.

Because I don’t have the time or resources to do that, I’ve been attempting to short-cut the process with AI. If there’s one thing large language models should be good at, it’s sorting words into a taxonomy. But my previous attempts with a number of different models including GPT-4 were sketchy at best. After doing some initial groundwork, I was able to get GPT-5 to generate a three tier taxonomy in a matter of minutes. While the result was not perfect, without any further work it would have been good enough. Although I’ve worked with information architects, I’m not formally trained in the discipline, so I also asked Chat-GPT for some tips along the way. And I’ve come to the conclusion that a technical writer with a bit of tech savvy can generate a good information architecture in about a week if certain conditions are met. So here’s the process I used.

Information architectures often work best when you have hierarchical tags. But the software systems I have to use have a flat tagging system. If I throw a thousand tags into them and then try to restrict which tags users can use, it will cause information overload. So I had to impose a hierarchy. The tags had to be based on the most restrictive software requirements. That meant lowercase letters and no spaces. So I decided to use underscores to separate levels. Based on my experience creating indexes, and also not wanting the tags to become unreasonably long, I decided to limit the tags to three levels. So a third level tag would look something like: top_mid_leaf. Reading the literature, I decided to go with around 10 top-level tags, each with up to seven mid-level tags and an unlimited number of leaf tags. This should prevent cognitive overload for users during tag selection and search.

The next step was to get a complete dictionary of terms to throw at the AI. I was lucky that the bug tracking system was set up to be used by every part of the business and included an API. From this I was able to extract a complete list of components. I broke this list down into individual words, sorted it and removed the duplicates using non-AI tools. This gave me over a thousand words. I normalized the list so that singular, plural and verb terms would be replaced by the plural. For example, log, logs and logging would all be represented by logs. I then had GPT-5 create a set of 10 top-level categories based on the word list and a description of the nature of the business. Because I was using Copilot, GPT-5 was also able to use internal documentation on Microsoft systems as an input.

The result was a set of phrases that would need to be shortened to single words for the purposes of tagging. But the categories fairly well encapsulated the dataset and the business model. Because of the three-level limit, I later had to add some other top-level categories, taking me over the 10 tag limit, but not by much. These included languages and regions (for example, languages_english_canadian and regions_uk_wales). Then for each of the top-level categories, I had GPT-5 create up to seven mid-level categories. Again, the result was phrases. I then had it assign all the words in the list to a mid-level tag and generate a CSV file with the complete mapping.

At this stage, I could have continued to use GPT-5 to refine the list. I’m sure it could have suggested single words to replace the phrases. And given additional input it could have improved the mid-level tags words were assigned to. However, at this point I’ve been a writer for about four decades (I’m counting the Cat in the Hat musical I started writing during my childhood), so I was happy to take over from the AI. Opening the CSV file in a spreadsheet, I was able to bulk replace phrases with single words and adjust the placement of terms. The result was a comprehensive taxonomy for the software product and the business that can now be rolled out to the bug tracking system, the document management system and internal and external documentation. Before rollout, total time taken was one business week.

The caveat is that if you don’t have a bug tracking system that has a comprehensive component list that covers every aspect of your products and business activities, you’ll need to compile that list first. But if you have Copilot and numerous documents stored in Microsoft systems, you could get GPT-5 to provide you with the list of the 1,000 most commonly used words in your business documentation as a starting point. Arthur C. Clarke famously said: “Any sufficiently advanced technology is indistinguishable from magic.” AI is not magic, or sentient. But when applied to the right problems, it’s an incredibly useful tool. The temptation to use AI for everything is only going to grow over time, but this has to be balanced with an ethical and environmentally conscious approach. Perhaps one of the most useful skills in future will be knowing when not to use AI.