Research Engineer · IBM Research, India
LLM Post-Training, Evaluation & Statistical Measurement
Research engineer at IBM Research, India. ~10 years between research and engineering, currently on LLM post-training. I've been working on a particular failure mode in GRPO: groups of rollouts where every one succeeds or every one fails give zero advantage, so there's no signal for the policy to learn from. The fix I'm exploring is a calibrated measurement model fit alongside training that tracks per-prompt difficulty and policy ability. Plugging that in as the baseline produces a signed advantage even in those cases, and the same fit picks out which prompts to train on next.
I also train foundation-model systems for structured data, such as a structure-aware encoder fused with an 8B LLM decoder for table understanding (AISTATS 2026), and build deployed LLM agents for enterprise pipeline diagnosis at IBM (SIGMOD 2026 Demo). Earlier work was on systematic AI testing, fairness verification, explainability, AutoML pipeline configuration, and probabilistic program synthesis (CVPR, UAI, AAAI, PLDI).
Email is the easiest way to reach me if you want to talk about any of this.
Item response theory as a measurement layer for evaluation and training. We used IRT to rank 91 vision models from 10 calibrated ImageNet images (Kendall τ = 0.85 vs. the full benchmark). The same machinery now drives our GRPO-based RLVR work.
Tables are a modality, not flat text. We train structure-aware encoders and fuse them with LLMs. Trainable slots and a mixture of denoisers teach the model where to focus across rows, columns, and text.
Fairness verification, concept-based explainability, adversarial robustness. Making sure models do what we think they're doing.
We synthesize probabilistic programs from data and automate ML pipeline configuration with constrained optimization.