A new AI chatbot developed by the small Chinese tech company DeepSeek has created waves in the technology sector, surpassing OpenAI’s ChatGPT as the most-downloaded free iOS app in the United States. The release of the chatbot has also led to a dramatic $600 billion loss in Nvidia’s market value, setting a new US stock market record.
The buzz surrounding DeepSeek’s chatbot stems from its use of a “large language model” (LLM), which reportedly matches the reasoning capabilities of leading US models but at a fraction of the cost. According to DeepSeek, their model, R1, is significantly more efficient, requiring far less computational power and memory to operate compared to its Western counterparts. The company claims R1 was trained using just 2.788 million hours of processing time, costing under $6 million—compared to the over $100 million that OpenAI spent on training its GPT-4 model.
DeepSeek’s approach involves several technical innovations that have reduced the cost of training AI models. The company used around 2,000 modified Nvidia H800 GPUs to train R1, which were likely stockpiled before export restrictions tightened in October 2023. Despite Nvidia’s market setback, DeepSeek’s strategy could pave the way for future advancements in AI that require fewer resources to develop and run.
One of the key advantages of this breakthrough is the potential environmental benefits. AI systems require massive amounts of electricity and water to run, with significant carbon emissions associated with their operation. For example, estimates place ChatGPT’s monthly carbon footprint at over 260 tonnes of CO2, equivalent to 260 flights between London and New York. By improving computational efficiency, DeepSeek’s model could potentially reduce the environmental impact of AI, though the real-world savings in energy remain to be seen.
Founded in 2023 by Liang Wenfeng, DeepSeek’s rise has been swift. The company has been hailed as a leader in creating competitive AI models with limited resources. The DeepSeek model is built using a “mixture of experts” technique, combining several smaller models with expertise in specific domains. This allows the system to allocate tasks to the most qualified experts, improving efficiency.
Unlike other major AI companies, DeepSeek has released its model’s “weights” and a technical paper outlining its development process, making it more transparent and accessible to researchers worldwide. While some aspects of the model, such as its training datasets, remain undisclosed, the company’s openness contrasts with the secrecy surrounding other AI systems like those of OpenAI.
DeepSeek’s success raises important questions about the future of the AI industry. As smaller companies figure out ways to reduce the cost and resource requirements of developing sophisticated AI models, the landscape may shift away from dominance by large tech firms. As DeepSeek continues to innovate, it may spark broader changes in how AI is developed, deployed, and used in the future.
While the long-term impact of DeepSeek’s model on the AI market remains uncertain, its emergence signals that powerful, cost-effective AI tools could become more accessible, potentially revolutionizing industries and expanding adoption.