New Our demo paper "Root-Cause Diagnosis with Remediation Recommendations for ETL Pipelines via LLM-Based Reasoning" has been accepted at ACM SIGMOD 2026 (Demo Track). · Our paper "From Cells to Sentences" on robust table understanding has been accepted at AISTATS 2026.

Deepak Vijaykeerthy

Research Engineer · IBM Research, India

LLM Post-Training, Evaluation & Statistical Measurement

Research engineer at IBM Research, India. ~10 years between research and engineering, currently on LLM post-training. I've been working on a particular failure mode in GRPO: groups of rollouts where every one succeeds or every one fails give zero advantage, so there's no signal for the policy to learn from. The fix I'm exploring is a calibrated measurement model fit alongside training that tracks per-prompt difficulty and policy ability. Plugging that in as the baseline produces a signed advantage even in those cases, and the same fit picks out which prompts to train on next.

I also train foundation-model systems for structured data, such as a structure-aware encoder fused with an 8B LLM decoder for table understanding (AISTATS 2026), and build deployed LLM agents for enterprise pipeline diagnosis at IBM (SIGMOD 2026 Demo). Earlier work was on systematic AI testing, fairness verification, explainability, AutoML pipeline configuration, and probabilistic program synthesis (CVPR, UAI, AAAI, PLDI).

Email is the easiest way to reach me if you want to talk about any of this.

Research Themes

Psychometrics for Evaluation & Post-Training

Item response theory as a measurement layer for evaluation and training. We used IRT to rank 91 vision models from 10 calibrated ImageNet images (Kendall τ = 0.85 vs. the full benchmark). The same machinery now drives our GRPO-based RLVR work.

Foundation Models for Structured Data

Tables are a modality, not flat text. We train structure-aware encoders and fuse them with LLMs. Trainable slots and a mixture of denoisers teach the model where to focus across rows, columns, and text.

Trustworthy AI & Robustness

Fairness verification, concept-based explainability, adversarial robustness. Making sure models do what we think they're doing.

Probabilistic Programming & AutoML

We synthesize probabilistic programs from data and automate ML pipeline configuration with constrained optimization.

Selected Publications

Current project

Measurement-Driven RLVR

Ongoing

A calibrated measurement model fit alongside GRPO-style RLVR on math reasoning. It tracks per-prompt difficulty and policy ability as training proceeds. Most of the hand-tuned knobs in current pipelines (curriculum order, KL coefficient, replay weighting) are quantities the fit produces directly.

RLVR GRPO Measurement Layer

Featured · AISTATS 2026

From Cells to Sentences: An End-to-End Framework for Table Understanding

Deepak Vijaykeerthy, Arvind Agarwal

AISTATS 2026 (Poster)

Serializing a table to a string throws away its row-column structure, the type of each column, and the relationships between cells. We instead treat tables as a modality and fuse a structure-aware encoder with an 8B LLM decoder, the way VLMs fuse vision encoders with language models. The fusion challenge is norm mismatch: dense table features are an order of magnitude larger than LLM embeddings. To stabilize end-to-end training without freezing the LLM, we use four mechanisms: adaptor layers that project into LLM-compatible space, trainable slots that compress evidence into fixed-size summaries, moment-matching on the adaptor output to align distributions, and a mixture of denoisers as regularizers (corruption-aware in pretraining; task-specific in fine-tuning). We get the best results on 5 of 8 benchmarks across QA, fact verification, and text generation, and under schema corruption our accuracy drops less than 2 points where baselines lose 6 to 22.

Tables Multimodal LLM Foundation Models Pretraining

Foundation models, post-training, and agents

2026

Root-Cause Diagnosis with Remediation Recommendations for ETL Pipelines via LLM-Based Reasoning

Deepak Vijaykeerthy, Rajmohan C, Arvind Agarwal, Avirup Saha, Sameep Mehta, Emmanuel Adhoute, Nadav Vahav, Timur Porokhnia, Adrian Dąbrowski, Bat El Aharon Asher, Aviya Goldfarb

ACM SIGMOD 2026 (Demo)
2026

DataBench: A Benchmark Dataset to Evaluate Agents for Data Pipeline Remediation

Ritwik Chaudhuri, Deepak Vijaykeerthy, Avirup Saha, Arvind Agarwal, Kushal Mukherjee, Sameep Mehta

Under review at KDD 2026

Evaluation, safety, and trustworthy AI

2024

On Evaluation of Vision Datasets and Models using Human Competency Frameworks

Rahul Ramachandran, Tejal Kulkarni, Charchit Sharma, Deepak Vijaykeerthy, Vineeth N. Balasubramanian

DMLR Workshop, ICML 2024
2022

A Framework for Learning Ante-hoc Explainable Models via Concepts

Anirban Sarkar, Deepak Vijaykeerthy, Anindya Sarkar, Vineeth N. Balasubramanian

CVPR 2022
2021

Automated Testing of AI Models

Swagatam Haldar, Deepak Vijaykeerthy, Diptikalyan Saha

arXiv:2110.03320, 2021
2020

Verifying Individual Fairness in Machine Learning Models

Philips George John, Deepak Vijaykeerthy, Diptikalyan Saha

UAI 2020
2019

Hardening Deep Neural Networks via Adversarial Model Cascades

Deepak Vijaykeerthy, Anshuman Suri, Sameep Mehta, Ponnurangam Kumaraguru

IJCNN 2019

Probabilistic inference and optimization

2020

An ADMM Based Framework for AutoML Pipeline Configuration

Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, Alexander G. Gray

AAAI 2020
2015

Efficient Synthesis of Probabilistic Programs

Aditya V. Nori, Sherjil Ozair, Sriram K. Rajamani, Deepak Vijaykeerthy

PLDI 2015

View all publications →