Skip to content
  • Categories
  • CSPaper Review
  • Recent
  • Tags
  • Popular
  • Paper Copilot
  • OpenReview.net
  • Deadlines
  • CSRanking
  • OpenAccept
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
CSPaper Forum

CSPaper: peer review sidekick

  1. Home
  2. Using CSPaper Review Tool: Questions, Feedback & Ideas
  3. 🔊 Release Note (2025-11-06): LLM selection, benchmarking as a guide, and GenAI text analysis

🔊 Release Note (2025-11-06): LLM selection, benchmarking as a guide, and GenAI text analysis

Scheduled Pinned Locked Moved Using CSPaper Review Tool: Questions, Feedback & Ideas
release notecspaper reviewv1.2.0benchmarkllmgptgeminivenueconferencetrack
1 Posts 1 Posters 403 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • rootR Offline
    rootR Offline
    root
    wrote last edited by
    #1

    Dear CSPaper Review Users,

    After extensive preparation and foundational work, we’re thrilled to announce the release of CSPaper Review v1.2.0! 🎉

    Screenshot 2025-11-07 at 12.55.20.png


    🌟 Choose Your Favorite LLM

    We’ve introduced a new step after paper upload that allows you to select your preferred LLM for generating reviews.
    Currently supported models include:

    GPT-5, O3, O4-mini, GPT-4.1, Gemini-2.5-pro, and Gemini-2.5-flash.

    We plan to expand this list based on community feedback and our benchmarking capabilities across supported venues (conference + track).


    📊 Guided by Benchmarking Results

    When selecting a model, CSPaper now displays recommended LLMs together with benchmarking results for the selected conference and track.
    We visualize comparative performance (measured by NMAE) along with standard deviation (STD) values to make an informed model choice.

    Normalized Mean Absolute Error (NMAE) measures how closely the predicted paper ratings align with ground-truth ratings, normalized to account for each venue’s rating scale. Lower NMAE values indicate better accuracy.

    Screenshot 2025-11-07 at 22.01.16.png

    🔹 Note: Benchmark results are venue-specific and continuously updated as:

    1. Our benchmark dataset (currently 150 annotated papers) expands.
    2. The review agent’s prompts and templates are refined, affecting LLM performance dynamics.
    3. LLMs might update their sub-versions.

    🤖 GenAI Text Analysis (Pilot)

    As part of our roadmap, we’re piloting a new feature — GenAI Content Analysis — for selected venues:

    TheWebConf 2025, KDD 2025, and CVPR 2025.

    Each review may now include a section titled “GenAI Content Analysis”, offering a qualitative assessment of AI-assisted writing likelihood:

    • None / Minimal
    • Partial / Moderate
    • Extensive / Intensive

    If “Partial / Moderate” or “Extensive / Intensive” is detected, the agent will provide concise justifications with direct evidence, referencing specific sections, pages, paragraphs, or sentences.

    Screenshot 2025-11-07 at 21.47.49.png


    đź”® A Glimpse Into the Future

    We have fully refactored our agent architecture to enable the next generation of review intelligence:

    1. High-fidelity score calibration — ensuring review text and scores are more coherently aligned.
    2. Cross-review ranking — compare how your reviews rank among all generated reviews for the same venue.
    3. Custom review agents — create your own venue-specific agents with tailored review logic.

    Thank you for supporting CSPaper.org and contributing to our journey of building transparent, reliable, and intelligent academic review systems.

    — The CSPaper.org Team

    1 Reply Last reply
    1
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    © 2025 CSPaper.org Sidekick of Peer Reviews
    Debating the highs and lows of peer review in computer science.
    • First post
      Last post
    0
    • Categories
    • CSPaper Review
    • Recent
    • Tags
    • Popular
    • Paper Copilot
    • OpenReview.net
    • Deadlines
    • CSRanking
    • OpenAccept