AI language models may give inaccurate answers in Estonian

Estonian researchers have developed an artificial intelligence (AI) barometer enabling users to compare responses from by various language models and assess their Estonian proficiency. The team aims to gather at least 50,000 comparisons by the end of June.
In an appearance on ETV's "Terevisioon" on Wednesday, Kairit Sirts, an associate professor in natural language processing at the University of Tartu's Institute of Computer Science, noted that several language models exist, although ChatGPT is likely the most familiar to Estonians.
Current text-based AI tools, however, work best in English, and according to Sirts, it's difficult to evaluate how well they understand Estonian or Estonian culture from the outside.
"One possibility is to compare them — let people use them and evaluate which they think is best," she said. "That way, it's possible to get assessments for a wide variety of questions that researchers themselves may not even think of."
Sirts explained that when a language model doesn't know the answer, it "hallucinates" — because it generally has to generate some kind of response. That's why users sometimes receive bizarre answers to certain questions.
"When you consider how language models and AI work, they simply generate one word after another based on certain knowledge, which can result in something that isn't actually true," she emphasized.
Models also often struggle with math questions. For instance, if asked how much shorter the Tall Hermann Tower is than Tallinn's TV Tower, various models may provide rather strange responses.
"The models don't perform the kind of calculation process we do in our heads — where we take two numbers, place them side by side and do a math operation between them," the professor explained. "But if they've been sufficiently trained in certain types of calculations, the best models can give the impression that they know how to do math."
According to Sirts, the AI Barometer (link in Estonian) will help raise awareness of the varying quality of language models, and encourage the use of text-based AI in users' native languages.
"On the site, you can enter a prompt, to which two anonymous language models respond," she described. "The user can then choose the better answer, after which the names of the models are revealed."
Based on these ratings, a leaderboard emerges that is continuously updated through the collaborative effort, she added.
Sirts noted that the selection of language models being compared is continually updated as well, allowing users to track how new models perform in Estonian compared to older ones.
Language models currently featured in the Estonian AI Barometer include GPT, Gemini, Claude, Llama and Mistral.
The barometer is open to anyone interested in AI development and the future of the Estonian language.
--
Follow ERR News on Facebook and Twitter and never miss an update!
Editor: Sandra Saar, Aili Vahtla