ScholarEval: Research Idea Evaluation Grounded in Literature
Invited, AI2 (Allen Institute For Artificial Intelligence), Seattle, Washington, United States
The growing capabilities of large language models have led to their increased adoption across the scientific lifecycle, spanning different stages from idea conception to experiment execution, manuscript writing, and peer review. Recent interest in using AI for scientific hypothesis generation has shown promising results, with some studies demonstrating that AI-generated ideas can score higher than human-generated ones in terms of novelty and excitement. However, many scientific hypotheses generated by LLMs tend to yield poor execution results, leading to wasted resources, particularly in resource-intensive fields requiring considerable computation or wet-lab experiments. An integral missing component of the AI-assisted scientific lifecycle is rigorous idea evaluation to prioritize the most promising ideas for execution. To address this gap, we present our ongoing work on ScholarEval, a multi-disciplinary research idea evaluation system grounded in literature. ScholarEval evaluates research ideas along two key dimensions: soundness and contribution, generating comprehensive idea reviews accompanied by citations and scores. We aim to release ScholarEval as an open tool for scientists to evaluate both human and AI-generated research ideas against the current literature, thereby improving research idea refinement and resource allocation.