Skip to content
  • Categories
  • CSPaper Review
  • Recent
  • Tags
  • Popular
  • Paper Copilot
  • OpenReview.net
  • Deadlines
  • CSRanking
  • OpenAccept
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
CSPaper Forum

CSPaper: peer review sidekick

  1. Home
  2. Using CSPaper Review Tool: Questions, Feedback & Ideas
  3. ๐Ÿ”Š Release Note (2025-11-20): Gemini-3.0-Pro, GPT-5.1 and Benchmark page

๐Ÿ”Š Release Note (2025-11-20): Gemini-3.0-Pro, GPT-5.1 and Benchmark page

Scheduled Pinned Locked Moved Using CSPaper Review Tool: Questions, Feedback & Ideas
cspaperrelease notegeminigptllmbenchmarkbenchmarking resultperformance
1 Posts 1 Posters 322 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • rootR Offline
    rootR Offline
    root
    wrote last edited by
    #1

    Dear CSPaper Review Users,

    Weโ€™re excited to announce two major enhancements to CSPaper Review! ๐ŸŽ‰

    Support for Gemini-3.0-Pro and GPT-5.1

    You can now select the latest models โ€” Gemini-3.0-Pro and GPT-5.1 โ€” as the primary engines powering your agent workflows. As always, we also provide full benchmarking results for these new models.

    Screenshot 2025-11-20 at 01.03.48.png

    These models are available as a free trial for now. They will eventually become part of our premium offering once we finalize their long-term integration.

    A fun anecdote: many users previously told us that GPT-5 tended to be "meaner" than Gemini-2.5-Pro. Weโ€™re curious โ€” does this still hold for GPT-5.1 and Gemini-3.0-Pro? Let us know!

    New Benchmark Dashboard

    Weโ€™ve launched a dedicated benchmark page:
    https://cspaper.org/benchmark

    This page provides an up-to-date, comprehensive overview of performance across LLMs and venues (conference + track).

    Screenshot 2025-11-20 at 01.11.11.png

    Youโ€™ll also find detailed explanations of our metrics and what they mean in practice. From the latest results, we observe that:

    • LLM agents behave differently depending on the venue-specific review workflows.
    • Top performers include GPT-5, Gemini-2.5-Pro, Gemini-3.0-Pro, and GPT-5.1.

    We will continue improving the robustness and reliability of our benchmarks by expanding datasets and refining evaluation metrics.

    We Welcome Your Feedback!

    Weโ€™re already planning the next wave of features โ€” but your voice guides our direction.
    Please share your suggestions, feature requests, or bug reports. You can reply below or reach us anytime at support@cspaper.org.

    Cheers,
    The CSPaper Team โค๏ธ

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    ยฉ 2025 CSPaper.org Sidekick of Peer Reviews
    Debating the highs and lows of peer review in computer science.
    • First post
      Last post
    0
    • Categories
    • CSPaper Review
    • Recent
    • Tags
    • Popular
    • Paper Copilot
    • OpenReview.net
    • Deadlines
    • CSRanking
    • OpenAccept