University of Tartu developing translation programme with Mozilla Firefox
Recently, news of Mozilla's new translation programme Bergamot has spread through international technology news portals but few readers may know the team also includes language technologists from the University of Tartu.
ERR's science portal Novaator spoke with the head of the Tartu team professor Mark Fišel, of natural language processing at the Institute of Computer Science, about the collaboration and the work going on behind the scenes.
The project also involves Charles University in Prague and Sheffield and Edinburgh universities in the UK.
Mark Fišel, please tell us what this project is about?
It all began with language technologists from four universities wanting to do a European Commission-funded research project together on machine translation. One idea was to fit machine translation into a web browser. Thanks to a contact person at the University of Edinburgh, we asked Mozilla to be our partner and in January 2019 the project kicked off. This is a research project, which means that most of our activity is exploratory: we are studying how we could alter the best existing machine translation methods in a way to make them even better.
What exactly is machine translation?
The principle of machine translation is easy to explain: a machine or a computer must translate one text from one language to another automatically. It is one of the oldest language processing tasks, as this has been actively addressed since the beginning of the 1950s. Despite the long history, ideal machine translation is yet to be developed, however, in practice, its quality is good enough for it to find use. Machine translated text is mainly used in post-editing, where the automatically translated text is manually corrected. With many topics, the average time needed for post-editing is less than what is required for translating from scratch.
What needs to be done to make the quality of machine translation better? What does your daily work entail?
Our main role in this project is to make machine translation engines flexible and adaptable to the content and style of the text. For example, in the context of nature, the machine should translate the word aas as 'meadow', but recognising a text on knitting, aas should be translated as 'loop'. Or seeing a formal English text, the Estonian translation should use the form 'teie' (formal) not 'sina' (informal). In the end, the programme should be able to make these decisions automatically.
We are also participating in other stages of the project: for example, we are working on the automatic estimation of translation quality. Its purpose being to decide after the generation of the translation whether it was successful or not. This is necessary to warn the user of a low-quality translation.
What will the final product be if everything goes according to plan?
A large proportion of the project is research and experiments, but a working prototype will also be made. At the moment we plan to make the new technology available in the Firefox browser.
What is its main difference between this when compared to Google's current automatic translation?
The main difference with Google's automatic translation and its machine translation plugin for Chrome is that Google Translate is cloud-based, which means that all text input is sent to Google's servers for translating. Bergamot machine translation will work on the client's computer and not on a cloud, which ensures the privacy of the texts.
The second characteristic is that existing translation engines – including Google's and UT's – translate single sentences without looking at the context. The contribution by the University of Tartu scientists should ensure that the translation engine adapts to the context and style of the entire web page and takes into account other additional information to improve translation quality.
What is the 'shift to client-side translation' that has received a lot of attention in the English-language media?
Our partners at Charles University in Prague are working on the so-called client-side translation. The idea is to provide the possibility of improving translation quality for users who are not fluent in the target language. The purpose of the machine translation system in this case would be to identify that part of the input as either being too complicated or ambiguous for successful translation, and to ask the user to rephrase it.
In conclusion, it may be said that the researchers at the University of Tartu Institute of Computer Science are working on applications, which most of the readers of this article probably use regularly. It is important to note that all the results of this research when finished, will be freely released with permissive licences. This project involves translations from English into Estonian, Polish, Czech, German, French and Spanish, and vice versa.
The translation of this article from Estonian Public Broadcasting's science news portal Novaator was funded by the European Regional Development Fund through the Estonian Research Council.
--
Download the ERR News app for Android and iOS now and never miss an update!
Editor: Helen Wright