Published on 26 October 2023 by Andrew Owen (3 minutes)
Machine translation has come a long way since researchers figured out that it was better to translate phrases than individual words. It works best when there are many texts in the source and destination languages. So if you’re translating to and from languages that both have a small number of digitized texts, it’s likely that translation will use English as a middle step. In this case, the accuracy of the translation can be affected. And I wouldn’t trust even the best machine translation without some level of review. In my case, I use DeepL in conjunction with LanguageTool (see the links at the bottom of the page).
But something that’s easy to overlook is the security aspect. If you’re using free tools, you’re most likely contributing your text to the data set. Now in the case of articles I write for this blog, that’s fine, because the source and destination text are both public (and can easily be scraped by internet-derived large language models). But before you put personally identifiable information, customer data or commercial secrets on a remote server, you should check the policy of the translation tool you’re using. For example, here’s an extract from DeepL’s free use privacy policy:
When using our translation service, please only enter texts that you wish to transfer to our servers. The transmission of these texts is necessary in order for us to provide the translation and offer you our service. We process your texts, the documents you upload and their translations for a limited period of time to train and improve our neural networks and translation algorithms. This also applies to corrections you make to our translation suggestions. The corrections are forwarded to our servers to check them for accuracy and, if necessary, to update the translated text according to your changes.
DeepL also makes it clear that on the free plan you may not translate any content that would be covered by privacy legislation such as the General Data Protection Regulation (GDPR). But maybe you need to translate something as a one-off, or at least sufficiently infrequently that you don’t want to sign up for a paid subscription. In that case, you need offline translation. And if you have a Mac running macOS 12 or later, you already have that ability. It’s available in any app with text by right-clicking the text and selecting Translate. Supported languages include:
Before you get started, from System Settings, search for translation and select Translation Languages. Download whichever languages you require. But you must select the On-Device Mode check box to prevent the use of online translation, where Apple may retain your text for up to two years. Even with this check box selected, Safari and Siri will still use online translation.
When you right-click (or control-click) some text and select the translation option, a dialog prompts you to select From and To languages. Even if you’ve already downloaded languages, you still have to click Download Languages to do the translation. You then have the option to Replace with Translation or Copy Translation. There’s a limit of 10,000 characters in a single translation. But it’s probably better to translate related chunks one at a time.
I’m not aware of any equivalent functionality in Linux or Windows. However, I did find the Argos Translate offline translation tool. It uses OpenNMT for translations and can be used as either a Python library, command-line, or GUI application. However, the only release is for macOS.