About me
I am PhD student at The Ohio State University in the Department of Computer Science and Engineering studying Natural Language Processing. My broad goal is to apply my background in Neuroscience to strengthen the scientific process in AI research (see AI has a Science Problem), and to bridge the gap between Language Model advances and practical benefits for scientists (see Science has an AI Problem).
I work with Sachin Kumar and Andrew Perrault
AI has a Science Problem
A lack of scientific rigor calls into question the validity of several fundamental methods and metrics. For instance, Interpretability Methods for LLMs (e.g. Logit Lens, Activation Patching, Mechanistic Interpretability) have widespread hype but are seldom tested for reliability. At ACL 2025, I had the honor to be 1 of 25 panelists selected from over 3,000 accepted papers to present my work Steering off Course: Reliability Challenges in Steering Language Models (Da Silva et al., ACL 2025) covering this important topic.
My current work dives deep into the relationships among thinking and output diversity, midtraining, Reinforcement Learning, and generalization.
Science has an AI Problem
As AI tools become increasingly common for research ideation, robust evaluation is critical to ensure the validity and usefulness of generated ideas. In our work ScholarEval: Research Idea Evaluation Grounded in Literature (Da Silva & Moussa et al., 2025), we create a literature-grounded research idea evaluator, rigorously tested with our new benchmark for idea evaluation (117 ideas across four disciplines) and blind, randomized expert user study (18 PhD+ for 46 evaluations total). We release this tool for free for scientists of any discipline.
My current work investigates how idea evaluators can provide dense rewards for refining AI-generated research ideas with Reinforcement Learning.