AAAI 2026’s Two-Phase Turbulence: When Reviewer 2 Meets Reviewer GPT
-
23,680 submissions. 4,167 accepts. 17.6% acceptance. And a comment storm that won’t quit.
The final decisions for AAAI 2026 are out, and they’ve left the community dissecting not just which papers got in, but how; and whether the process rewarded rigor or roulette. Below I synthesize what changed this year, what authors and reviewers report from the trenches, and which failure modes we should fix before the next cycle.
What actually changed in 2026
A two-phase process.
- Phase 1: Two reviewers per paper; if both lean the same way, the paper can be filtered early.
- Phase 2: Only papers with disagreement or “borderline” status advance; a fresh discussion and an Area Chair (AC) or chair decision completes the call.
AI-assisted peer review pilot.
- Tools assisted with reviewer assignment, rebuttal summarization, and chair/AC briefing; the official line is “assistive, not decisive,” but even assistive summaries can shape outcomes.
Scale & pressure.
-
Submissions jumped to 23,680, with 4,167 accepts (17.6%); a page-2 chart in one article visualizes the acceptance and score distribution this year.
-
Rumors of ~29k submissions, and debates about geographic concentration, fueled the sense that “this doesn’t scale.”

What the community is saying (in their own words)
-
“This is the weirdest reviewing process I’ve ever experienced.” (PC member)
-
“Two lines of strength … then gave the score 10.” (on a Phase-1 review)
-
“Collusion isn’t the bug, it’s the acceptance criterion.” (community quip)
-
“If this paper is accepted, I’ll be very disappointed and will never submit or review [for] AAAI.” (frustrated reviewer)
These are short excerpts paraphrasing posts/screenshots circulated in the community; they capture the tone across threads while we keep quotes brief.
Seven failure modes that surfaced (with concrete examples)
1) Phase-2 Drift: “better” papers out, “weaker” papers in
Multiple accounts describe Phase-1 stacks where carefully argued, mid-to-positive reviews ended in rejection — yet in Phase-2, thinly justified enthusiasm pushed other papers forward. One recap highlights a case: “good papers got brushed off; weak ones were upgraded.”
Why it matters: When the tie-break round inverts Phase-1 signal, authors perceive arbitrary override, not consensus refinement.
2) The “10 after two lines” phenomenon
A viral anecdote: “One reviewer wrote two lines of strength, no weaknesses, then gave a 10.” Chairs may say final calls aren’t purely score-based, but this example epitomizes review depth imbalance.
Why it matters: If thin praise can outweigh detailed critique, the process rewards confidence, not evidence.
3) Anti-rebuttal whiplash
Authors reported cases where, after reading rebuttals, other reviewers lowered scores — “almost like they’re ganging up to get the papers rejected.”
Why it matters: Rebuttal should clarify misunderstandings, not trigger pile-ons. Without a norm against score-lowering post-rebuttal, authors see responses as risk, not remedy.
4) Personal-connection suspicion
A PC member wrote: “It feels like one reviewer is personally connected to a paper.” Even the appearance of conflict erodes trust when decisions concentrate in Phase-2.
Why it matters: With fewer voices in Phase-2, disclosure and recusal policies must be stricter, or the venue inherits the look of favoritism.
5) Topic monocultures and “same-lab datasets”
Commenters complained that, in narrow areas, “papers are from the same lab, using the same data and table, sidestepping the bold claims.”
Why it matters: If novelty narrows to a single pipeline + dataset, we get leaderboard drift rather than field progress.
6) Opaque chair power, amplified by AI summaries
The pilot tools summarize reviews and rebuttals for ACs/chairs. Officially, they don’t make decisions, but summaries can steer them — especially under time pressure.
Why it matters: If the summary layer becomes the decisive layer, we need auditability: What did the model emphasize or omit? Which evidence did the chair actually read?
A few bright spots (yes, there were some)
-
Selective, but still diverse accepts. Teams publicly celebrated oral/poster outcomes across multiple subareas, indicating that compelling work did land — despite the noise. (Several examples of multi-paper acceptances are cataloged, including orals.)
-
Process intent. The design intent — fast triage in Phase-1, deeper scrutiny in Phase-2, and AI to reduce clerical load — addresses real scaling pain points.
But intention without instrumentation is not enough.
What to fix before AAAI 2027 (actionable proposals)
-
Publish the weighting of scores, summaries, and chair discretion.
-
A simple decision card per paper: inputs considered, their weight, and the final rationale (2–3 lines).
-
Require chairs to confirm they read all full reviews (not just summaries). Log it.
-
-
Guardrails for rebuttal dynamics.
-
Allow score increases post-rebuttal; permit decreases only with a short, evidence-linked justification.
-
Auto-flag “large post-rebuttal score drops” for AC scrutiny.
-
-
Minimum review depth for extreme scores.
- A 9/10 or 1/2 must include specific experimental checks, ablations, or error analyses. Thin reviews can’t carry extreme recommendations.
-
Conflict-of-interest pressure test.
- Expand COI beyond coauthorship: same dataset/lab lineage, shared grants, or mentoring relationships within X years.
- Random audits of Phase-2 paper–reviewer ties.
-
AI summary audits.
- Store summary diffs: what points from reviews/rebuttals were included, collapsed, or omitted by the tool.
- Let authors request the summary artifact post-decision to check for gross omissions.
-
Counter-monoculture incentives.
-
Reserve a slice of accepts for out-of-cluster submissions that expand datasets, tasks, or methods beyond the mainline.
-
Encourage replication + stress tests with principled novelty, not just incremental leaderboard bumps.
-
-
Transparent statistics, not just headlines.
- Publish per-area acceptance rates and score–decision scatter.
Concrete vignettes to learn from
-
“SPC said ‘accept’, final: reject.” One account describes an SPC-endorsed paper turned down at the end — fueling the belief that finals can nullify expert consensus without written rationale.
-
“Rebuttal helped? Scores went down.” Multiple reports say rebuttals triggered score reductions, not clarifications. This suggests reviewers used rebuttal to coordinate or defend priors rather than test claims.
-
“Same-lab treadmill.” In narrow subfields, authors perceive that novelty ≈ the next tweak from the same pipeline. This is where cross-area reviewers and external datasets can diversify signal.
Why this moment matters
A selective venue can survive a tough year of admits; it cannot survive a downward trust curve. When authors feel that (a) thin reviews outrank deep analyses, (b) summaries outrank evidence, or (c) relationships outrank rules, they exit — or they game the system. The result is fewer risky ideas, more monocultures, and louder meta-drama.
AAAI’s move to two phases and AI assistance could scale peer review. But scale without governance produces exactly what we saw: hot takes eclipsing handbooks. The fixes above are lightweight and testable in one cycle. We should try them.
Before you submit again: wind-tunnel your paper
Want an early read on whether your paper survives Phase-1 thin-review + Phase-2 scrutiny? Try a simulation pass. Tools like CSPaper.org let you upload and receive structured community feedback quickly.
CSPaper implements a simple three-step flow: go to site → upload → get reviews. Use it to pressure-test ablations, claims, and clarity before the real thing.
Sources
- https://www.reddit.com/r/MachineLearning/comments/1oaf1v0/d_on_aaai_2026_discussion/
- https://aaai.org/conference/aaai/aaai-26/review-process/?utm_source=chatgpt.com
- https://aaai.org/wp-content/uploads/2025/08/FAQ-for-the-AI-Assisted-Peer-Review-Process-Pilot-Program.pdf?utm_source=chatgpt.com
- https://openaccept.org/c/ai/aaai/
- https://x.com/lyson_ober/status/1986939786163011775
- https://papercopilot.com/statistics/aaai-statistics/aaai-2026-statistics/
- https://mp.weixin.qq.com/s/0Vbdd0ve1isnJ_e4_aOOVg
- https://openaccept.org/c/ai/aaai/