Behind the Scenes of DeepSeek-R1: A Landmark in AI Published in Nature

lelecao

On September 17, 2025, the DeepSeek-R1 paper was officially published as a cover article in Nature, marking the first large language model (LLM) to undergo rigorous peer review and appear in a top-tier scientific journal. This milestone demonstrates not only DeepSeek’s technical achievements but also a broader shift in how AI research is being evaluated and recognized within the scientific community.

Read the Nature paper here

Key Highlights of the Publication

Cover Recognition

The DeepSeek-R1 study appeared on the cover of Nature, with the striking tagline “Self-Help: Reinforcement learning teaches AI model to improve itself.” This signals the importance the scientific community attaches to the work, particularly in the area of AI reasoning and reinforcement learning (RL).

v2_9219473482fb4ebcae6c29f10c149f56@000000_oswg275067oswg1080oswg548_img_000.jpeg

A Model for Reasoning Tasks

R1 is specifically designed for reasoning-intensive tasks such as mathematics and programming. Unlike traditional LLMs, it prioritizes logical inference over text prediction. Nature highlighted it as a cost-effective rival to expensive US-developed AI tools, with the added advantage of being an open-weight model freely available for download. On Hugging Face, R1 has already surpassed 10.9 million downloads, making it the most popular reasoning-oriented open-source LLM to date.

Training Cost and Infrastructure

The supplementary materials of the paper revealed for the first time the training cost of R1:

Training R1 directly: ~ $294,000 USD
Base LLM investment: ~ $6 million USD
Comparison: Still far below the tens of millions typically invested by competitors.

Training was conducted primarily on NVIDIA H800 GPUs, which are subject to US export restrictions since 2023 and cannot be sold to China. Despite this constraint, DeepSeek achieved competitive performance at a fraction of the cost.

Peer Review and Revisions

Did They Have to Revise the Paper?

Yes. Despite being a landmark achievement, DeepSeek-R1 still underwent the standard peer-review process.

Reviewers requested the removal of anthropomorphic language and asked for more technical details, especially regarding data types and safety measures.
According to Ohio State University researcher Sun Huan, the process strengthened the validity and reliability of the results.
Hugging Face engineer Lewis Tunstall called it a “very welcome precedent”, stressing that peer review is critical for transparency and risk evaluation in LLM research.

This proves that even groundbreaking AI work cannot bypass the established standards of scientific rigor.

Innovation: Pure Reinforcement Learning

The core innovation of DeepSeek-R1 is its reliance on pure reinforcement learning (RL) rather than human-labeled reasoning datasets.

The model learns by receiving rewards for correct answers, enabling it to develop self-verification strategies without explicit human guidance.
Efficiency is enhanced through Group Relative Policy Optimization (GRPO), which allows the model to score and evaluate its own trial outputs without external algorithms.

As a result, R1 has become a major inspiration for subsequent RL research in AI throughout 2025, shaping how reasoning-focused models are trained.

Invitation or Self-Submission?

One of the main questions was whether this paper was invited by Nature or self-submitted. While no official confirmation exists, analysts strongly suspect it was invited:

The preprint version, released in January 2025, already received 3,598 citations and fueled an AI craze in China, including a hedge fund windfall for Qifan Quant through DeepSeek.
Nature has a history of chasing high-impact, hot-topic papers.
DeepSeek itself had little incentive to self-submit, given its prior success.

Thus, the balance of evidence suggests that Nature invited the paper.

Broader Impact

DeepSeek-R1’s publication signifies more than academic prestige:

It sets a precedent for peer-reviewed AI models, ensuring transparency and scientific credibility.
It demonstrates that cost-efficient AI development is possible, even under geopolitical constraints.
It shows how open-source models can drive global adoption and innovation.

Conclusion

DeepSeek-R1’s appearance in Nature is a defining moment for AI research. It bridges the gap between industrial innovation and scientific recognition, proving that large language models can meet the highest academic standards. The work also highlights the growing importance of reasoning, reinforcement learning, and cost-efficient AI in shaping the next generation of intelligent systems.

Full paper: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

CSPaper: peer review sidekick