When Acceptance Isn’t Enough: NeurIPS 2025 rejects 400 accepted papers due to venue crisis?

Sylvia

A Shock to the AI Research Community

NeurIPS 2025, the 39th iteration of the world’s premier AI conference, finds itself embroiled in controversy. Despite preparing two venues, one in San Diego (Dec 2–7) and another in Mexico City (Nov 30–Dec 5), the conference has still rejected (according to internal information leaked) approximately 400 already accepted papers due to physical venue limitations.

A Reddit thread surfaced titled:

"[D] NeurIPS is pushing to SACs to reject already accepted papers due to venue constraints."

A single SAC anonymously disclosed:

"About 400 papers were outright rejected, despite passing both AC and reviewer approval. It's unfair. If venue space is insufficient, they should expand further — not randomly discard quality papers."

This blunt revelation marks a turning point for the community’s trust in the fairness of peer review.

Rejection Despite Positive Reviews

Zeynep Akata, a SAC, tweeted in disbelief:

"This is wrong! If the review process cannot handle so many papers, the conference needs to split instead of arbitrarily rejecting 400 papers."

"We estimate that 300–400 papers recommended for acceptance by ACs will need to be rejected."

Equally shocking is this comment by Xin Eric Wang:

"I heard some NeurIPS ACs are rejecting papers with all positive reviews (5444) just to control acceptance rates, which is wrong."

This raises the ethical dilemma: What does "peer-reviewed acceptance" mean if physical space can override it?

The official NeurIPS 2025 Review Scoring Scale (from the table below) gives clear categories from Strong Accept (6) to Strong Reject (1), yet these boundaries were rendered meaningless in the face of logistical constraints.

Score	Meaning
6	Strong Accept
5	Accept
4	Borderline Accept
3	Borderline Reject
2	Reject
1	Strong Reject

Submission Tsunami: A Victim of Its Own Success

Submission IDs reportedly reached 23,000 as early as July, with projections nearing 30,000.

Jiaxuan You exclaimed:

“NeurIPS 2025 might break records... One of our submission IDs is already ~23,000 — final count could hit 30,000. Absolute madness.”

This is nearly 50% more than previous years, setting off what the article calls:

“A warehouse explosion crisis is unfolding.”

Community Backlash: “This Makes No Sense!”

Several top researchers called out this chaos:

Subbarao Kambhampati:

"Rejecting papers in AI Conferences because of ‘resource constraints’ is shooting ourselves in the foot as a community."
Ramchalam K:

“How can all reviewers accept but be rejected in the end. Makes no sense.”
Mark Schmidt:

“Getting ready to have your well-reviewed NeurIPS rejected for perplexing reasons.”
“We are doing this to ourselves!”

Borderline Papers: The Collateral Damage

For authors whose papers scored Borderline Accept (4) or Accept (5), this situation is especially heart-wrenching. A few reviewers reportedly used non-existent citations or LLM-generated hallucinations to justify “Strong Reject” after rebuttal — a phenomenon noted by Mehdi Ataei:

“LLM reviews citing non-existing works. It killed any prestige publishing had for me.”

Such behavior compounds the emotional toll on PhD students, early-career researchers, and underrepresented voices — often working on high-risk, high-reward topics that land right on the review borderline.

🧠 Oversupply of PhDs and Academic Arms Race

The article quotes a Redditor:

“These papers are being written by students trying to graduate. These conferences should be for sharing ideas, not gatekeeping degrees.”

In elite institutions, publishing at NeurIPS, ICLR, or ICML has become a non-negotiable credential. With >30,000 submissions at NeurIPS and >29,000 at AAAI 2026, it's a brutally competitive system funneling researchers into a narrow bottleneck.

The Real Issue: Scaling Without Strategy

While the double-venue approach (San Diego + Mexico City) seemed innovative, it wasn't enough. Suggestions from the community include:

Splitting NeurIPS by discipline (e.g., NLP, RL, CV)
Introducing a Findings track like ACL and TACL for solid-but-borderline papers
Better AI-assisted reviewing pipelines, like AAAI 2026, which now includes:
- AI-generated reviews
- AI-generated summaries
- Reviewer classification (human-written vs. AI-generated)

The AAAI guideline states:

“Only the human-written reviews and summaries will be used to decide whether the paper proceeds to Phase 2.”

But how long can human bandwidth keep up?

Tragic Efficiency: A Broken Cycle

The author summarizes with brutal honesty:

“Rejected papers don’t disappear; they are just resubmitted to another top conference. But the reviewers? Still the same people.”

“All of this (the time, energy, and soul invested) simply feeds a vicious loop of overburdened reviewers and undersupplied capacity.”

🧭 Call to Action: Break the Bottleneck, Rethink the System

If you're a researcher whose paper was borderline accepted, then rejected, especially on a hot sub-domain, you're not alone. Many thoughtful works have been lost in this tragedy of logistics.

Here’s something you can do now:

Try submitting your paper to https://review.cspaper.org/ — an experimental review platform that simulates conference-specific review insights on your paper. All surfaced related work are validated. It may be able to help you:

Understand how your paper might be reviewed in a less-constrained environment
Compare how close or divergent its judgment is versus NeurIPS reviews
Improve your submission for future resubmission cycles

Got Ideas?

Do you have suggestions for how CSPaper or other systems can help decentralize, democratize, or de-stress the academic review process?

Drop your ideas in the comments or create a follow-up post. The system won’t change itself — we must build better ones, together.

Roman

Thanks for the concise and insightful summary of this.

It seems to me like the most actionable problem in the review process is the increased volume (whether this is due to LLMs or simply an artefact of the pace of ML, I don't think is important). The submission-to-review-duty ratio is not sustainable when the number of submissions blows past five digits; the number of hours required is simply too large.

Splitting conferences into sub-venues is a nice idea, but I think it fails to address the above issue in that the ratio of submissions/reviewers stays the same (if not increases, as some disciplines likely are underrepresented in the submission count). We might instead consider ways to lower the work-hour density of reviewers during review periods by 1) capping the number of submissions per first author, and 2) extending the review period specifically with the goal of engaging in dialogue.

Regarding (1), it might also be viable to scale reviewer obligations with the number of papers submitted, but this would incentivize poor quality or LLM-generated reviews, most likely. A good solution would disincentivize these.

I'm also curious how venue-provided AI reviews will impact the process. AAAI seems to be playing out all right so far, assuming the AI reviews can provide correct critiques. At the end of the day, however, I think the backbone of peer review is trust, and I don't really see how a system can be trustworthy unless human reviewers with domain knowledge are reading papers themselves.

Sylvia

@Roman Thanks for your thoughtful perspective. I think you’ve highlighted some of the most pressing tensions here. I agree with you that sheer volume is the root driver of this breakdown. As you say, splitting into sub-venues or dual-location conferences doesn’t really shift the ratio of submissions to reviewers; it just redistributes the same workload.

On your point (1), capping the number of submissions per first author: while it could help, I do worry it risks disproportionately affecting early-career researchers who often experiment with multiple directions. But perhaps a more nuanced policy could be considered? for example, scaling expectations differently for student authors vs. senior authors.

On (2), extending the review period with more dialogue ... I think this would be hugely valuable. Rebuttals often feel compressed, and genuine discussion could help both sides. Of course, that requires balancing timelines with the demands of conference planning, but it seems like one of the most constructive levers we could realistically pull.

As for AI-assisted reviews, you’re right that trust is the backbone. I see them less as a replacement and more as scaffolding: useful for flagging inconsistencies, summarizing discussions, or spotting hallucinated citations — but never substituting for a human’s final judgment. AAAI’s experiment will be interesting to watch, though the challenge will be transparency: authors should know which parts of the review were machine-augmented.

Ultimately, maybe the problem isn’t just reviewer workload but also our collective reliance on a few “gatekeeper” venues. Until we diversify both where and how impactful work is recognized, these cycles of overload may keep repeating.

Curious what you think ... do we need systemic alternatives beyond just fixing NeurIPS/AAAI, or is the first step still making the existing model more sustainable?

CSPaper: peer review sidekick