After causing unrest with its AI model, which has capabilities that rival those of Google and OpenAI, China’s DeepSeek is questioned about whether its bold claims hold up to scrutiny.
The Hangzhou-based startup’s announcement to develop R1 for less than Silicon Valley’s most recent models immediately questioned assumptions about the country’s dominance of AI and the sky-high market valuations of its top tech companies.
Some sceptics, however, have challenged DeepSeek’s account of working on a shoestring budget, suggesting that the firm likely had access to more advanced chips and more funding than it has acknowledged.
“It’s very much an open question whether DeepSeek’s claims can be taken at face value. The AI community will be digging into them and we’ll find out”, Pedro Domingos, professor emeritus of computer science and engineering at the University of Washington, told Al Jazeera.
“It’s plausible to me that they can train a model with $6m”, Domingos added.
“But it’s also quite possible that that’s just the cost of fine-tuning and post-processing models that cost more, that DeepSeek couldn’t have done it without building on more expensive models by others”.
The DeepSeek development team claimed to have used 2, 000 Nvidia H800 GPUs, a less sophisticated chip originally built to comply with US export restrictions, and had spent $5.6 million to train R1’s foundational model, V3. In a research paper released last week, they claimed to have used $5.6 million to train V3’s foundational model.
The chatbot GPT-4 was trained by OpenAI CEO Sam Altman for more than $100 million, according to analysts, who estimated that the model would require up to 25, 000 more advanced H100 GPUs.
The announcement from DeepSeek, which was founded in late 2023 by serial entrepreneur Liang Wenfeng, disproved the common belief that businesses seeking to be the pinnacle of AI must invest large sums of money in data centers and large amounts of pricey high-end chips.
Additionally, it raised questions about the effectiveness of Washington’s efforts to halt China’s AI sector by preventing the most advanced chips from exporting.
Shares of California-based Nvidia, which holds a near-monopoly on the supply of GPUs that power generative AI, on Monday plunged 17 percent, wiping nearly $593bn off the chip giant’s market value – a figure comparable with the gross domestic product (GDP) of Sweden.
While there is widespread agreement that DeepSeek’s release of R1 at least represents a significant achievement, some prominent observers have cautioned against accepting its claims for granted.
Palmer Luckey, the founder of virtual reality company Oculus VR, on Wednesday labelled DeepSeek’s claimed budget as “bogus” and accused too many “useful idiots” of falling for “Chinese propaganda”.
A Chinese hedge fund is pushing for “hide sanction evasion,” Luckey wrote in a post on X.
Because our media apparatus despises our technology companies and wants to see President Trump fail, “America is a fertile bed for psyops like this.”
In an interview with CNBC last week, Alexandr Wang, CEO of Scale AI, also cast doubt on DeepSeek’s account, saying it was his “understanding” that it had access to 50, 000 more advanced H100 chips that it could not talk about due to US export controls.
Wang’s claim was unsupported by any evidence.
Tech billionaire Elon Musk, one of US President Donald Trump’s closest confidants, backed DeepSeek’s sceptics, writing “Obviously” on X under a post about Wang’s claim.
Requests for comment were not responded to by DeepSeek.
But Zihan Wang, a PhD candidate who worked on an earlier DeepSeek model, hit back at the startup’s critics, saying, “Talk is cheap”.
In response to Al Jazeera’s questions about the suggestion that DeepSeek’s claims shouldn’t be taken literally, Wang said on X, “It’s easy to criticize.”
Wang added, quoting an English translation of a Chinese idiom about people who engage in idle talk, that “it would be better if they spent more time writing the code and reproducing the DeepSeek idea themselves.”
He did not immediately respond to a question about whether he thought DeepSeek had trained R1’s foundational model using less sophisticated chips and spent less than $6 million.
In a 2023 interview with Chinese media outlet Waves, Liang said his company had stockpiled 10, 000 of Nvidia’s A100 chips – which are older than the H800 – before the administration of then-US President Joe Biden banned their export.
R1 users also point out the restrictions it has because it was founded in China, such as the Tiananmen Square massacre in 1989 and Taiwan’s status, where it is censored.
In a sign that the initial panic about DeepSeek’s potential impact on the US tech sector had begun to recede, Nvidia’s stock price on Tuesday recovered nearly 9 percent.
The tech-heavy Nasdaq 100 rose 1.59 percent after dropping more than 3 percent the previous day.
Tim Miller, a professor specialising in AI at the University of Queensland, said it was difficult to say how much stock should be put in DeepSeek’s claims.
The model itself provides a few details about how it operates, Miller told Al Jazeera, but the costs of the main changes, which they claim don’t “show up” in the model itself as much, Miller said.
Miller claimed he had not seen any “alarm bells,” but Miller’s claims are valid. Both the research paper’s reliability and its reliability are disputed.
The breakthrough is incredible, almost in the “too good to be true” fashion. The breakdown of costs is unclear”, Miller said.
On the other hand, he said, breakthroughs do happen occasionally in computer science.
“These massive-scale models are a very recent phenomenon, so efficiencies are bound to be found”, Miller said.
“Given they knew that this would be reasonably straightforward for others to reproduce, they would have known that they would look stupid if they were b*********** everyone. A team has already made a commitment to attempting to reproduce the work.
Falling costs
Lucas Hansen, co-founder of the nonprofit CivAI, said while it was difficult to know whether DeepSeek circumvented US export controls, the startup’s , claimed training budget referred to V3, which is roughly equivalent to OpenAI’s GPT-4, not R1 itself.
“GPT-4 finished training late 2022. Since 2022, numerous algorithmic and hardware improvements have been made, lowering the cost of training a GPT-4 class model. GPT-2 experienced a similar circumstance. At the time it was a serious undertaking to train, but now you can train it for $20 in 90 minutes”, Hansen told Al Jazeera.
“DeepSeek made R1 by taking a base model , – in this case, V3 – and applying some clever methods to teach that base model to think more carefully”, Hansen added.
Source: Aljazeera
Leave a Reply