Ministry mulling amendment to give AI access to public data in Estonian

The Ministry of Justice is drafting a legislative amendment that would allow the sharing of public language data with artificial intelligence for research and development purposes.
The Ministry of Justice and Digital Affairs is preparing a legislative intent document to modernize the processing of personal data for scientific and historical research as well as for national statistics.
The document notes that the use of personal data in research should be interpreted broadly. In other words, if personal data is used in the development of artificial intelligence, the same rules that apply to scientific research could be used as a basis. Additionally, data processing may also rely on other legal grounds, such as the individual's consent.
Minister of Justice and Digital Affairs Liisa Pakosta (Eesti 200) told Delfi that the proposed change would allow all publicly available texts, which do not carry an opt-out mark, to be readable by artificial intelligence. "In this context, AI 'thinking' is considered equivalent to the way researchers think," she said.
Pakosta acknowledged that the core issue is how to clarify the legal framework for using data that is currently freely accessible to everyone. She noted that when the Estonian Language Institute began developing a language model about 13 years ago, the legal basis should have been addressed more clearly.
"Looking back, it must be admitted that when the Estonian Language Institute, under the jurisdiction of the Ministry of Education and Research, began developing an Estonian language model around 13 years ago, it would have been beneficial if the legal foundations had been more clearly established," Pakosta said.
On Monday, the minister, representatives of various ministries and researchers discussed with the Riigikogu Cultural Affairs Committee the role language corpora play in AI development and what steps should be taken to ensure that, in the future, AI speaks better Estonian than it currently does.
Liina Kersna, chair of the Cultural Affairs Committee, told ERR that Estonia has a constitutional obligation to ensure that AI speaks Estonian as well as possible.
"This is extremely important for the survival of our language," she said, adding that widely used closed AI language models are significantly more accurate in English than in Estonian today. According to Kersna, several European countries share the same concern and are seeking ways to preserve and protect their languages.
Kersna said there are two paths on the table: first, developing a language corpus that AI developers could use under agreed conditions; and second, either creating a domestic AI platform or collaborating within Europe to establish one, enabling the state to use its data more effectively to build future services.
The better the quality of the texts used for training, the better the AI will communicate, which is why the input material must be of high quality, Kersna emphasized.
"It is in everyone's interest that AI speaks good Estonian, and the more high-quality Estonian we can input into it by mutual agreement, the better it will be for all of us," said Kersna.
According to her, legal experts have not found any provision in current law that would prohibit AI from using various publicly available texts. "However, the Ministry of Justice has submitted for coordination a legislative intent document that reiterates rules generally used across Europe, which state that public language data may be used by AI under a research and development clause," Kersna explained.
It is still unclear when the proposed legislative amendment will reach the government or the Riigikogu.
--
Follow ERR News on Facebook and Twitter and never miss an update!
Editor: Barbara Oja, Marcus Turovski