Which natural languages should you support?

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
post-thumb

Published on 12 October 2023 by Andrew Owen (4 minutes)

I have long been an advocate for localization (even if I am somewhat behind with translating the older content on this site into French). It’s a fair assumption that, most of the time, readers would prefer to access content in their own language. But you have limited resources, so which natural (not computer) languages should you support?

A 2021 study by Forrester Research for Phrase TMS found that the 10 most common languages on the web at the time were:

  1. English
  2. German
  3. French
  4. Spanish
  5. Japanese
  6. Chinese
  7. Portuguese
  8. Italian
  9. Korean
  10. Swedish

The study noted:

Worldwide, four out of five people do not speak English. Many of them are business to business buyers. Organizations that let them operate in their own language will have a competitive and even first-mover advantage in emerging markets.

Even if you don’t live and breath localization, you’re probably aware that the top three languages by most metrics are Chinese, Spanish and English in that order. What makes the web different is that it represents purchasing power. English is the main language in three of the G7 nations. German is the main language in Germany and Austria and is also spoken in parts of Belgium, Brazil, Italy, Liechtenstein, Luxembourg and Switzerland. Besides France, French is also an official language in Belgium, Canada, Luxembourg and Switzerland. The United States is the second-largest Spanish-speaking country in the world. Japan is a high-tech economy. China is the second-largest country in the world after India (which has 23 official languages). Outside Portugal, Portuguese is the main language in Brazil (a major emerging market). Besides Italy, Italian is an official language in San Marino and Switzerland. South Korea is a high-tech economy. And Sweden, home of Spotify, is secretly the Silicon Valley of Europe, possibly due to a late-1990s government policy to put a computer in every home.

The top 10 languages by native speakers are:

  1. Chinese
  2. Spanish
  3. English
  4. Arabic
  5. Hindi
  6. Bengali
  7. Portuguese
  8. Russian
  9. Japanese
  10. Lahnda

But the top 10 languages by total speakers are:

  1. English
  2. Chinese
  3. Hindi
  4. Spanish
  5. Arabic
  6. French
  7. Bengali
  8. Russian
  9. Portuguese
  10. Urdu

So back to the original question. The answer depends slightly on your audience. Many non-English native speakers would prefer to read developer documentation in English over their own language. This is particularly true for German developers. But conversely, French-speaking developers would often prefer documentation in French. When it comes to developer relations, the simple answer is to ask developers what they want. But for consumers, and where user interface text is concerned, developers are consumers, everyone would prefer text in their own language. But there are hundreds of languages and translation is expensive. If you have any plans to use machine translation, you should use English as your source language. You should then translate that into the first language of the majority of your users, whatever that may be. I would recommend using US English as your source, even if your main market uses a different English dialect. After English, the other languages I think you should translate to in order of priority are:

  1. German
  2. Japanese
  3. Portuguese
  4. Chinese
  5. Spanish
  6. French
  7. Italian

My reasoning is based on purchasing power, opportunity in emerging markets and my own workplace experience of which languages are prioritized by multi-national corporations. Germany and Japan are huge markets. Brazil and China are huge emerging markets. So is India, but English is widely used as a second language. Spanish is the most widely spoken language across the Americas. French may be a legal requirement for some applications in Canada and is widely used in developed markets in Europe (as is Italian).

As previously mentioned, translation is incredibly expensive. This often leads to the use of machine translation. In my experience, DeepL is the best overall. But you should still get the translations reviewed by a native speaker. I don’t have that luxury, so I run my translated text through LanguageTool. But another alternative is to crowd-source your translations with a platform like Weblate. Click the translation tag for more on that.

My penultimate tip is to replace user interface text with images that don’t need to be translated. However, you should still translate the alt-text to keep the site accessible.

Ultimately there is a trade-off between the cost of translation and the potential increased audience share. But my final tip is that even if you have no plans to translate right now, build your solutions with localization built in. Because retro-fitting localization is a painful, time consuming and often expensive exercise.