New "Root-Cause Diagnosis with Remediation Recommendations for ETL Pipelines via LLM-Based Reasoning" accepted at ACM SIGMOD 2026 (Demo Track), with the system now deployed in IBM watsonx.data integration on production DataStage pipelines · "From Cells to Sentences: An End-to-End Framework for Table Understanding" accepted at AISTATS 2026.
Deepak Vijaykeerthy

Deepak Vijaykeerthy

Research Engineer · IBM Research, India

RL Post-Training · Agentic Reasoning · LLM Evaluation · Foundation Models for Structured Data

Research Engineer at IBM Research, India, with about a decade of work across applied research and engineering. I am currently exploring sample efficiency in GRPO-style RL fine-tuning for reasoning models. Generation dominates the compute budget of these methods, and a large fraction of the groups they produce are degenerate: every completion in the group either succeeds or fails, the empirical group-mean baseline cancels, and the advantage collapses to zero. The question, then, is what a training procedure should estimate, per prompt and per current policy, so that its rollouts yield groups whose advantages are non-zero and informative. A working paper on this is in preparation.

I also work on foundation models for structured (tabular) data. My most recent paper treats tables as a modality and pulls apart two problems that table-understanding work tends to bundle together: what the model has to learn about a table itself, despite header drift and cell-level noise, and how it should attend to the small subset of cells, and the surrounding text, that a given question actually depends on. The system is a structure-aware encoder fused end to end with an 8B-parameter decoder-only LLM, pretrained with a mixture of corruption-aware denoisers and aligned in fine-tuning to passages of linked external text (AISTATS 2026).

A separate strand is on LLM-based agents for root-cause diagnosis of ETL pipeline failures, now deployed in IBM's data integration product (SIGMOD 2026 Demo). A companion benchmark for evaluating this class of agents, DataBench, is under review at KDD 2026. Earlier work covered formal verification of individual fairness, concept-based ante-hoc explainability, AutoML pipeline configuration as constrained optimisation, MCMC-based synthesis of probabilistic programs, and automated test-input generation for ML systems.

Email is the best way to reach me.

Research Themes

Psychometrics for Evaluation & Post-Training

Psychometric measurement for LLM evaluation and training. Past work used 2PL IRT to rank 91 vision models from 10 calibrated ImageNet items, with Kendall τ = 0.85 against the full-benchmark ranking (ICML DMLR 2024). I am currently extending this measurement framework to GRPO-style RL post-training, with a working paper in preparation.

Foundation Models for Structured Data

The premise is that tables are a modality in their own right. The most recent paper separates two problems: teaching the model what a table is despite header drift and cell-level noise (via a mixture of corruption-aware denoisers over a structure-aware encoder), and getting it to attend to the cells and surrounding text that a question depends on (via end-to-end fusion with an 8B-parameter decoder-only LLM). AISTATS 2026.

Trustworthy AI & Robustness

Verifying and explaining what ML models actually do. Formal verification of individual fairness for tabular classifiers (UAI 2020), concept-based ante-hoc explanations (CVPR 2022), adversarial robustness via cascaded defenses (IJCNN 2019), and automated test-input generation for ML systems.

Probabilistic Programming & AutoML

MCMC-based synthesis of probabilistic programs from observation traces (PLDI 2015). ADMM-based formulations of the Combined Algorithm Selection and Hyperparameter optimisation (CASH) problem for AutoML pipeline configuration (AAAI 2020).

Code

I maintain two open-source projects. rl-experiments is a small PyTorch sandbox that compares RL post-training update rules on bandit and sequence tasks. It studies what those update rules do to a policy when reward is sparse, noisy, delayed, or vector-valued: which samples and tokens an update should weight, how stale reused off-policy data can be, and what drives entropy collapse. Companion setups cover on-policy distillation, multi-objective optimization under vector-valued rewards, and GRPO for tool-using LMs that recursively call themselves, where credit must propagate over a rollout tree rather than a flat sequence. minilab runs the full training pipeline (pretraining, SFT, preference optimization, RLVR) end to end on a single consumer GPU. At small scale, SFT and preference tuning shift response format faster than task accuracy, since re-weighting cannot add capability the base never learned, and GRPO produces no gradient on zero-variance groups, where every rollout earns the same reward. minilab also includes a masked-diffusion track that repeats the same four stages with diffusion-native objectives, including diffusion analogues of DPO and GRPO.

Selected Publications

Foundation models, post-training, and agents
Evaluation, safety, and trustworthy AI
Probabilistic inference and optimization

Full publication list on Google Scholar and DBLP.