Skip to content
  • Categories
  • CSPaper Review
  • Recent
  • Tags
  • Popular
  • Paper Copilot
  • OpenReview.net
  • Deadlines
  • CSRanking
  • OpenAccept
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
CSPaper Forum

CSPaper: peer review sidekick

  1. Home
  2. Peer Review in Computer Science: good, bad & broken
  3. Artificial intelligence & Machine Learning
  4. AAAI 2026’s Two-Phase Turbulence: When Reviewer 2 Meets Reviewer GPT

AAAI 2026’s Two-Phase Turbulence: When Reviewer 2 Meets Reviewer GPT

Scheduled Pinned Locked Moved Artificial intelligence & Machine Learning
aaai2026peer reviewdecisionacceptrejectdebatesubmissionrebuttalreviewer2
1 Posts 1 Posters 207 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • rootR Offline
    rootR Offline
    root
    wrote last edited by
    #1

    23,680 submissions. 4,167 accepts. 17.6% acceptance. And a comment storm that won’t quit.

    The final decisions for AAAI 2026 are out, and they’ve left the community dissecting not just which papers got in, but how; and whether the process rewarded rigor or roulette. Below I synthesize what changed this year, what authors and reviewers report from the trenches, and which failure modes we should fix before the next cycle.


    What actually changed in 2026

    A two-phase process.

    • Phase 1: Two reviewers per paper; if both lean the same way, the paper can be filtered early.
    • Phase 2: Only papers with disagreement or “borderline” status advance; a fresh discussion and an Area Chair (AC) or chair decision completes the call.

    AI-assisted peer review pilot.

    • Tools assisted with reviewer assignment, rebuttal summarization, and chair/AC briefing; the official line is “assistive, not decisive,” but even assistive summaries can shape outcomes.

    Scale & pressure.

    • Submissions jumped to 23,680, with 4,167 accepts (17.6%); a page-2 chart in one article visualizes the acceptance and score distribution this year.

    • Rumors of ~29k submissions, and debates about geographic concentration, fueled the sense that “this doesn’t scale.”

    Screenshot 2025-11-11 at 12.30.12.jpg

    What the community is saying (in their own words)

    • “This is the weirdest reviewing process I’ve ever experienced.” (PC member)

    • “Two lines of strength … then gave the score 10.” (on a Phase-1 review)

    • “Collusion isn’t the bug, it’s the acceptance criterion.” (community quip)

    • “If this paper is accepted, I’ll be very disappointed and will never submit or review [for] AAAI.” (frustrated reviewer)

    These are short excerpts paraphrasing posts/screenshots circulated in the community; they capture the tone across threads while we keep quotes brief.


    Seven failure modes that surfaced (with concrete examples)

    1) Phase-2 Drift: “better” papers out, “weaker” papers in

    Multiple accounts describe Phase-1 stacks where carefully argued, mid-to-positive reviews ended in rejection — yet in Phase-2, thinly justified enthusiasm pushed other papers forward. One recap highlights a case: “good papers got brushed off; weak ones were upgraded.”

    Why it matters: When the tie-break round inverts Phase-1 signal, authors perceive arbitrary override, not consensus refinement.

    2) The “10 after two lines” phenomenon

    A viral anecdote: “One reviewer wrote two lines of strength, no weaknesses, then gave a 10.” Chairs may say final calls aren’t purely score-based, but this example epitomizes review depth imbalance.

    Why it matters: If thin praise can outweigh detailed critique, the process rewards confidence, not evidence.

    3) Anti-rebuttal whiplash

    Authors reported cases where, after reading rebuttals, other reviewers lowered scores — “almost like they’re ganging up to get the papers rejected.”

    Why it matters: Rebuttal should clarify misunderstandings, not trigger pile-ons. Without a norm against score-lowering post-rebuttal, authors see responses as risk, not remedy.

    4) Personal-connection suspicion

    A PC member wrote: “It feels like one reviewer is personally connected to a paper.” Even the appearance of conflict erodes trust when decisions concentrate in Phase-2.

    Why it matters: With fewer voices in Phase-2, disclosure and recusal policies must be stricter, or the venue inherits the look of favoritism.

    5) Topic monocultures and “same-lab datasets”

    Commenters complained that, in narrow areas, “papers are from the same lab, using the same data and table, sidestepping the bold claims.”

    Why it matters: If novelty narrows to a single pipeline + dataset, we get leaderboard drift rather than field progress.

    6) Opaque chair power, amplified by AI summaries

    The pilot tools summarize reviews and rebuttals for ACs/chairs. Officially, they don’t make decisions, but summaries can steer them — especially under time pressure.

    Why it matters: If the summary layer becomes the decisive layer, we need auditability: What did the model emphasize or omit? Which evidence did the chair actually read?


    A few bright spots (yes, there were some)

    • Selective, but still diverse accepts. Teams publicly celebrated oral/poster outcomes across multiple subareas, indicating that compelling work did land — despite the noise. (Several examples of multi-paper acceptances are cataloged, including orals.)

    • Process intent. The design intent — fast triage in Phase-1, deeper scrutiny in Phase-2, and AI to reduce clerical load — addresses real scaling pain points.

    But intention without instrumentation is not enough.


    What to fix before AAAI 2027 (actionable proposals)

    1. Publish the weighting of scores, summaries, and chair discretion.

      • A simple decision card per paper: inputs considered, their weight, and the final rationale (2–3 lines).

      • Require chairs to confirm they read all full reviews (not just summaries). Log it.

    2. Guardrails for rebuttal dynamics.

      • Allow score increases post-rebuttal; permit decreases only with a short, evidence-linked justification.

      • Auto-flag “large post-rebuttal score drops” for AC scrutiny.

    3. Minimum review depth for extreme scores.

      • A 9/10 or 1/2 must include specific experimental checks, ablations, or error analyses. Thin reviews can’t carry extreme recommendations.
    4. Conflict-of-interest pressure test.

      • Expand COI beyond coauthorship: same dataset/lab lineage, shared grants, or mentoring relationships within X years.
      • Random audits of Phase-2 paper–reviewer ties.
    5. AI summary audits.

      • Store summary diffs: what points from reviews/rebuttals were included, collapsed, or omitted by the tool.
      • Let authors request the summary artifact post-decision to check for gross omissions.
    6. Counter-monoculture incentives.

      • Reserve a slice of accepts for out-of-cluster submissions that expand datasets, tasks, or methods beyond the mainline.

      • Encourage replication + stress tests with principled novelty, not just incremental leaderboard bumps.

    7. Transparent statistics, not just headlines.

      • Publish per-area acceptance rates and score–decision scatter.

    Concrete vignettes to learn from

    • “SPC said ‘accept’, final: reject.” One account describes an SPC-endorsed paper turned down at the end — fueling the belief that finals can nullify expert consensus without written rationale.

    • “Rebuttal helped? Scores went down.” Multiple reports say rebuttals triggered score reductions, not clarifications. This suggests reviewers used rebuttal to coordinate or defend priors rather than test claims.

    • “Same-lab treadmill.” In narrow subfields, authors perceive that novelty ≈ the next tweak from the same pipeline. This is where cross-area reviewers and external datasets can diversify signal.


    Why this moment matters

    A selective venue can survive a tough year of admits; it cannot survive a downward trust curve. When authors feel that (a) thin reviews outrank deep analyses, (b) summaries outrank evidence, or (c) relationships outrank rules, they exit — or they game the system. The result is fewer risky ideas, more monocultures, and louder meta-drama.

    AAAI’s move to two phases and AI assistance could scale peer review. But scale without governance produces exactly what we saw: hot takes eclipsing handbooks. The fixes above are lightweight and testable in one cycle. We should try them.


    Before you submit again: wind-tunnel your paper

    Want an early read on whether your paper survives Phase-1 thin-review + Phase-2 scrutiny? Try a simulation pass. Tools like CSPaper.org let you upload and receive structured community feedback quickly.

    CSPaper implements a simple three-step flow: go to site → upload → get reviews. Use it to pressure-test ablations, claims, and clarity before the real thing.


    Sources

    • https://www.reddit.com/r/MachineLearning/comments/1oaf1v0/d_on_aaai_2026_discussion/
    • https://aaai.org/conference/aaai/aaai-26/review-process/?utm_source=chatgpt.com
    • https://aaai.org/wp-content/uploads/2025/08/FAQ-for-the-AI-Assisted-Peer-Review-Process-Pilot-Program.pdf?utm_source=chatgpt.com
    • https://openaccept.org/c/ai/aaai/
    • https://x.com/lyson_ober/status/1986939786163011775
    • https://papercopilot.com/statistics/aaai-statistics/aaai-2026-statistics/
    • https://mp.weixin.qq.com/s/0Vbdd0ve1isnJ_e4_aOOVg
    • https://openaccept.org/c/ai/aaai/
    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    © 2025 CSPaper.org Sidekick of Peer Reviews
    Debating the highs and lows of peer review in computer science.
    • First post
      Last post
    0
    • Categories
    • CSPaper Review
    • Recent
    • Tags
    • Popular
    • Paper Copilot
    • OpenReview.net
    • Deadlines
    • CSRanking
    • OpenAccept