In this short blogpost I will explain why the Chatlayer.ai Language Independent NLP outperforms Google Translate combined with regular NLP engines.
I often get asked the question if we use machine translation to get a language independent NLP. If I subsequently say no, people are surprised and ask why we took a different approach and made it hard on ourselves.
Chatlayer is always keeping up with the state-of-the-art in NLP. So it is no surprise that we first thoroughly investigated the literature of multilingual models. These are models that can perform NLP tasks for many different languages at the same time.
In 2018, the effectiveness of machine translation tools for multilingual NLP was evaluated . At that point in time the machine-translation baselines slightly outperformed multilingual models. However, as research progresses, multilingual models have been getting better and better. The current state-of-the-art multilingual models are significantly outperforming machine translation baselines on text classification and even text generation tasks. The recently introduced XLM  and XNLG models  are examples of this.
English to French
Imagine being proficient in English and you want to build a bot that also understands French. At first you’ll probably start building your bot in English. You create intents with a broad range of expressions like we discussed here.
To make sure your bot also understands French, the easiest option is to take all the expressions you have created in English, and translate them to French. If you use tools like Google Translate or DeepL to do this, the first thing you’ll notice is that you’ll have significantly less unique expressions in French. What happened is that many of the sentences you created in English are translated to one and the same sentence in French. In our personal experience you often have 30 to 40% fewer expressions.
French to English
What if you don’t know beforehand which language your user will speak. Then you’ll have to work the other way around. You will need any sentence to be translated into English. In order to do this you first have to detect which language it is before you can translate it into English.
Language detection and translation both introduce additional errors. Language detection for chat messages is only 80 to 95% accurate depending on the language and machine translation introduces another 20% to 60% error.
Language Independent NLP
With our Language Independent NLP we took a different approach. Instead of separating the language detection, machine translation and intent recognition into different steps, we combined them all in 1 model. It goes straight from a written sentence to the corresponding intent, irrespective of the language.
In theory this would mean that you don’t have any performance loss if you switch from one language to the next. In practice, however, this is not the case. There is a small loss in performance, specifically for language specific sayings and expressions. But the performance difference is a lot smaller than when Machine Translation is used. We only notice a 4 to 8% performance reduction instead of an aggregated performance loss of around 15 to 70% in the machine translation case.
Even though Machine Translation can reduce your workload by translating expressions to other languages, it comes at a great cost in performance. Depending on the language, you lose between 15 and 70%.To compensate for this gap, a lot of manual work will be required to enrich the other languages with additional expressions.
With Chatlayer’s Language Independent NLP there is nearly no performance difference going from one language to another. This means you can launch a bot in any number of languages. If you trained your model in only one language, you only need to enriched it with some very language specific expressions.
 Conneau, Alexis, et al. “XNLI: Evaluating Cross-lingual Sentence Representations.” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
 Conneau, Alexis, and Guillaume Lample. “Cross-lingual Language Model Pretraining.” Advances in Neural Information Processing Systems. 2019.
 Chi, Zewen, et al. “Cross-Lingual Natural Language Generation via Pre-Training.” arXiv preprint arXiv:1909.10481 (2019).