By Vincent Chow
Copyright scmp
Chinese artificial intelligence start-up DeepSeek has launched an update to its foundation model, V3, improving its agentic capabilities and addressing bugs based on user feedback, as the company sharpens its focus on AI agents.
Released on Monday, DeepSeek-V3.1-Terminus comes just two months after the previous version, V3.1, which was recognised by leading AI benchmark firm Artificial Analysis as the start-up’s most advanced model to date.
DeepSeek described V3.1 as the company’s “first step towards the agent era”, laying the groundwork to support software that helps users automate specific tasks.
The Hangzhou-based start-up said the updated model featured improved coding and search capabilities, as well as enhanced language consistency.
Before the update, users had shared screenshots showing instances where DeepSeek’s namesake chatbot produced responses containing illegible symbols and occasionally switched between Chinese and English without being prompted.
According to DeepSeek’s self-reported scores, V3.1-Terminus showed slight improvements on several popular benchmarks.
Those include Humanity’s Last Exam – a rigorous set of academic questions designed to test the limits of AI systems – as well as various coding benchmarks. Many AI experts believe that strong coding abilities are crucial for developing strong AI systems with general capabilities.
The updated model also showed improvements on the OpenAI-backed BrowseComp benchmark, which assesses the ability to retrieve hard-to-find information from the internet. On the Chinese-language version of the test, however, the model’s score fell from 49.2 per cent to 45 per cent.
BrowseComp’s Chinese benchmark appeared to be particularly challenging for DeepSeek models, said Zhou Peilin, a lead author of BrowseComp-ZH and an AI researcher at the Hong Kong University of Science and Technology.
Zhou pointed out that DeepSeek’s R1 reasoning model performed worse on that benchmark when connected to the internet, a phenomenon not seen in other leading models.
“Only by looking at a full technical report can we understand why [DeepSeek-V3.1-Terminus] performs worse than the previous version,” he said.
DeepSeek has faced increasing competition in the rapidly evolving domestic market for foundational models, with rivals like Alibaba Group Holding’s Qwen family and ByteDance’s Doubao gaining traction among both corporate and everyday users. Alibaba owns the South China Morning Post.
According to the Chinese cloud computing platform PPIO, DeepSeek accounted for more than 99 per cent of open-source AI model usage on its platform in the first quarter, but that dominance had significantly diminished by May, amid a surge in popularity for Qwen models.
Still, DeepSeek’s models continue to attract substantial global interest. The start-up was poised to become the first organisation to surpass 100,000 followers on Hugging Face, according to a social media post on Monday from Clément Delangue, CEO of the open-source development platform.
DeepSeek has released the weights for DeepSeek-V3.1-Terminus on Hugging Face and other open-source platforms, such as Alibaba-backed ModelScope, enabling global developers to download and build upon the model.