Estonian computer scientists surprised by capabilities of China's DeepSeek

Estonian artificial intelligence researchers say China's new DeepSeek-R1 language model is comparable to the best available models, including ChatGPT. It also appears the Chinese achieved their results with significantly lower financial and time costs, all while operating under U.S. chip restrictions.
A Chinese-made artificial intelligence model called DeepSeek has shot to the top of Apple Store's downloads, stunning investors and sinking some tech stocks, the BBC reported this week.
"Its latest version was released on 20 January, quickly impressing AI experts before it got the attention of the entire tech industry - and the world," it said.
ERR's science portal Novaator asked Estonian experts for their view.
"When discussing the capabilities of the R1 model, it is indeed surprising that, based on my own tests, it is quite comparable to the ChatGPT o1 model, which is currently the strongest of all publicly accessible models," said Meelis Kull, professor of artificial intelligence at the University of Tartu. He added that OpenAI has also trained the o3 model, but it is not widely available to the public.
At the same time, Kull pointed out an interesting difference: while a large part of the ChatGPT o1 model's reasoning process remains hidden, the R1 model is significantly more transparent. "Of course, this does not mean that we understand much more precisely how the model thinks. With artificial neural networks, it is difficult to explain the internal basis on which they make decisions," the researcher noted.
Kull was also surprised that such a powerful model was made fully public right from the start. "It can be downloaded and run independently. Of course, performing calculations requires a powerful server with multiple graphics processors. A regular laptop or desktop computer cannot handle it."
Until now, Meta has been the primary company offering open-source models, but in Kull's view, its Llama (Large Language Model Meta AI) models have so far been less capable than the paid versions of ChatGPT.
"Since DeepSeek's model has been released as open-source, we can see its architecture. In the case of ChatGPT's newer models, their architecture is unknown. It is also unclear how exactly DeepSeek was trained, but they have still made an unexpectedly large amount of information public," he said.
Kull added that the model released by the Chinese is comparable to the paid versions of ChatGPT. "Perhaps an exception is the ChatGPT o1 Pro version, which costs €200 per month to use. That model might surpass DeepSeek," Kull speculated.
Chinese achieved more with less
Kull said the data published by the Chinese indicates that training the R1 model took 2.788 million GPU hours of processing time, which, when converted, amounts to more than 300 GPU years. "On one hand, this seems like an enormous amount of time. On the other hand, it is probably significantly less than what was required to train the ChatGPT models," the professor said.
Since OpenAI does not disclose similar figures, it is difficult to directly compare the training times of the two models. However, Kull is fairly certain it took less time to train R1.
"Their training methodology is clearly world-class. How much genuine innovation is involved, I cannot say at this point. One technique they emphasize is the mixture of experts. Simply put, the model consists of multiple branches whose opinions are merged together," the researcher explained.
Despite being open-source and relatively transparent, Kull advises extreme caution when using DeepSeek. All input text is processed through servers located in China, posing a significant security risk. "Under no circumstances should any sensitive information be entered there," he stressed.
If the open model is downloaded and run on a private server, the risks are already lower, Kull pointed out.
"In that case, it is possible to run the computer without internet access, ensuring that no information leaks. However, one should keep in mind that the model has been trained to reflect the positions of the Chinese government. This is not necessarily a direct danger, but it will state, for example, that Taiwan is part of China, and so on," he warned.
DeepSeek is China's show of force against the U.S.
Tanel Tammet, professor at the Tallinn University of Technology's Institute of Software Science, also said the R1 model's capabilities surprised him as well, although there is nothing fundamentally shocking about it.
"China's artificial intelligence development has been at a very high level for quite some time. If we look at the percentage of scientific articles written by the Chinese in leading forums, it is remarkably high," he observed.
Tammet believes there is more experimentation with artificial intelligence in China than in the United States. Since China does not have financial resources comparable to U.S. tech giants and is also subject to chip restrictions, these limitations push the Chinese to act more cleverly. In this light, restrictions may actually function as a kind of innovation engine.
"In China, many different groups, both in companies and major universities, are working on language models. They are constantly experimenting with better ways to train models. Many different approaches are being tested. It is entirely clear that when a wide variety of methods are explored, some discoveries will eventually be made. I believe that is exactly what happened with the Chinese," Tammet said.
Leading U.S. corporations in artificial intelligence development, such as OpenAI, Meta, and Google, may be too locked into their past successes, making them less inclined to explore entirely different approaches, Tammet suggested.
"Of course, they are continuously striving forward, but they likely have certain assumptions about how things should be done and do not see much point in trying a completely different approach," the professor speculated.
Meta, Google, and OpenAI have not made any major breakthroughs for quite some time, Tammet said. There are minor advances, but nothing rapid. "It's no longer just a matter of adding more chips, training for longer, and expecting the model to improve. I think brute-force scaling no longer adds much, and we are already quite close to a plateau," he explained.
"Essentially, what happened is that the Chinese trained their model slightly differently. Not fundamentally differently, but they have implemented several new and interesting ideas. Experts have already read about these ideas in scientific articles and say that yes, this approach could indeed work," Tammet added.
Tammet has a theory regarding the timing of DeepSeek-R1's release.
"The United States recently tightened chip restrictions on China. I would not rule out the possibility that the Chinese had already achieved these results some time ago. However, they may have decided that right now was the most politically impactful moment to make it public. The timing was remarkably well planned," the professor concluded.
--
Follow ERR News on Facebook and Twitter and never miss an update!
Editor: Helen Wright