About Me
I am a Research Scientist with 8+ years of experience building NLP and generative AI systems — from foundational research to production deployment in healthcare. I hold a Ph.D. in Computer Science from the University of Massachusetts Amherst, advised by Prof. Andrew McCallum.
My work centers on trustworthy and safe AI — building systems that are factually grounded, clinically reliable, and rigorously evaluable. I have a strong track record of taking research from formulation to production, including two deployed clinical AI systems at Ensemble Health Partners and Mendel AI.
I have 15+ publications at EMNLP, NAACL, KDD, and ICML, hold 2 US patents, and am an experienced reviewer at NeurIPS, ICLR, SIGIR, and ARR.
Technical Skills
Experience
- Virtual Utilization Review (VUR): Designed and built a novel clinical AI system from scratch for real-time utilization review — a task with no public benchmark. Defined task formulation, evaluation criteria, and modeling approach end-to-end. Deployed to production via Azure ML; improved pipeline F1 by 9.54% over internal baseline through iterative error analysis.
- Insurance Appeal Generation: Built LLM pipelines combining RAG, self-refinement, and DPO to generate appeal letters grounded in clinical evidence, ICD/CPT coding, and payer-specific justification requirements; applied hallucination mitigation to reduce factual errors in generated clinical content.
- Hallucination Detection & Mitigation: Built a detection framework outperforming prior SOTA by 2.3–4.8%; used detection signals to drive LLM self-refinement and preference learning (DPO), achieving end-to-end mitigation of factual errors in clinical summarization.
- Clinical Text Summarization: Developed a semi-parametric memory mechanism allowing LLMs to reason across longitudinal patient records beyond context-window limits, targeting reconciliation of conflicting medical events over time.
- Clinical Trial Matching (ACR benchmark, BioKDD 2024): Co-developed a neuro-symbolic hybrid pipeline for large-scale cohort retrieval; outperformed pure LLM baselines (including GPT-4) by 10.1–26.7% F1 on 1,400 patients across 113 complex oncology queries.
- Conducted continued pre-training of Llama 3 (8B and 70B) on proprietary clinical corpora; fine-tuned for downstream medical summarization and clinical event extraction.
- Case-Based Reasoning for NLP: Introduced CBR-iKB, the first non-parametric CBR framework for knowledge-base QA — surpassed prior SOTA by 22.3% on WebQSP. Extended to unstructured text (CBR-MRC, EMNLP 2023), outperforming baselines by +11.5 EM on NaturalQuestions and +8.4 EM on NewsQA.
- Tabular Representation Learning (TABBIE, NAACL 2021): Co-developed a dual-Transformer model for tabular data using an ELECTRA-inspired corrupt-cell detection objective; achieved SOTA on column population (MAP 37.9 vs. TaBERT's 33.1) while requiring 10× less compute than TaBERT.
- Internships: Adobe Research (2017, 2018) — tabular QA and document understanding; IBM Research (2020, 2021) — knowledge-base QA and semantic parsing.
Independent Projects
Built as a submission to the Kaggle MedGemma Impact Challenge — an open-source agentic toolkit for generating clinically grounded insurance appeal letters. Implements a multi-step agentic workflow for evidence retrieval, clinical reasoning, and structured letter generation, with a HealthBench-compatible rubric for benchmarking across accuracy, grounding, and safety axes on MIMIC-IV claims data.
Selected Publications & Patents
Full list available on Google Scholar →
Education
Recognition & Service
Contact
Feel free to reach out about research, collaborations, or opportunities.