Sanket Vaibhav Mehta

Sanket Vaibhav Mehta [SVM]

I am a Senior Research Scientist at Google DeepMind, focusing on the mid-training of large models!

I earned my Ph.D. from Language Technologies Institute (LTI) at School of Computer Science, Carnegie Mellon University, where I was advised by Emma Strubell. Before that I obtained a Master's degree (2019) from the LTI where I was advised by Jaime Carbonell and Barnabás Póczos.

Before joining CMU, I worked as a member of the research staff at Big Data Research Lab, Adobe Research (2015-17) where I worked on designing algorithms for identifying data-driven geo-fences to assist Adobe’s digital marketing offering.

I graduated from Indian Institute of Technology Roorkee with a B.Tech in Computer Science (2011-15) and a President's Gold Medal.

Email / CV / Google Scholar / X

Research

My current research focuses on advancing large language models (LLMs), particularly through innovative mid-training strategies. I explore how to efficiently adapt and enhance LLMs for new capabilities, drawing on principles of continual learning, transfer learning, and modularity.

My doctoral thesis focused on designing efficient lifelong learning systems to overcome catastrophic forgetting and enable continual learning of new tasks. Inspired by biological learning and deep learning advances, this thesis involved injecting inductive biases into core machine learning components: model (architecture & initialization), training (objective & optimization), and data (leveraging limited labeled & unlabeled data).

Latest News

[May 15, 2025] BIG-Bench Extra Hard (BBEH) paper accepted at ACL 2025! TL;DR!

[Feb. 9, 2024] My doctoral thesis is available online on CMU KiltHub!

[Dec. 11, 2023] I'm attending NeurIPS'23 to present our JMLR and Making Scalable Meta Learning Practical papers!

[Dec. 4, 2023] I'm attending EMNLP'23 to present our DSI++ paper!

[Nov. 30, 2023] I successfully defended my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)

[Oct. 7, 2023] DSI++ paper accepted at EMNLP 2023! TL;DR!

[Sept. 21, 2023] Making Scalable Meta Learning Practical paper accepted at NeurIPS 2023! TL;DR!

[Jul. 4, 2023] An Empirical Investigation of the Role of Pre-training in Lifelong Learning paper accepted at JMLR 2023 and will be presented at NeurIPS 2023 Journal-to-Conference Track! TL;DR!

[Jan. 23, 2023] I proposed my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)

[Oct. 6, 2022] Train Flat, Then Compress paper accepted at Findings of EMNLP 2022! TL;DR!

[July. 10, 2022] An Introduction to Lifelong Supervised Learning primer is out!

[May. 23, 2022] Excited to start summer internship with Yi Tay and Jai Gupta at Google Research!

[Feb. 24, 2022] Compositional Generalization for Data-to-Text Generation paper accepted at ACL 2022!

[Jan. 20, 2022] ExT5: Extreme Multi-Task Scaling paper accepted at ICLR 2022!

Publications

	BIG-Bench Extra Hard Mehran Kazemi, Bahare Fatemi, Hritik Bansal, John Palowitch, Chrysovalantis Anastasiou, Sanket Vaibhav Mehta, Lalit K. Jain, Virginia Aglietti, Disha Jindal, Peter Chen, Nishanth Dikkala, Gladys Tyen, Xin Liu, Uri Shalit, Silvia Chiappa, Kate Olszewska, Yi Tay, Vinh Q. Tran, Quoc V. Le, Orhan Firat ACL, 2025 bibtex / code / tweet
	Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data Sanket Vaibhav Mehta PhD Thesis, Carnegie Mellon University, 2023 bibtex / tweet
	DSI++: Updating Transformer Memory with New Documents Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler EMNLP, 2023 bibtex / tweet
	An Empirical Investigation of the Role of Pre-training in Lifelong Learning Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell Journal of Machine Learning Research, 2023 bibtex / code / tweet
	Making Scalable Meta Learning Practical Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing NeurIPS, 2023 bibtex / code / tweet
	Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models Clara Na, Sanket Vaibhav Mehta, Emma Strubell EMNLP Findings, 2022 bibtex / code / tweet
	An Introduction to Lifelong Supervised Learning Shagun Sodhani, Mojtaba Faramarzi, Sanket Vaibhav Mehta, Pranshu Malviya, Mohamed Abdelsalam, Janarthanan Janarthanan, Sarath Chandar arXiv, 2022 bibtex / tweet
	Improving Compositional Generalization with Self-Training for Data-to-Text Generation Sanket Vaibhav Mehta, Jinfeng Rao, Yi Tay, Mihir Kale, Ankur Parikh, Emma Strubell ACL, 2022 bibtex / code / poster
	ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler ICLR, 2022 bibtex / press / tweet
	An Empirical Investigation of the Role of Pre-training in Lifelong Learning Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell ICML Theory and Foundation of Continual Learning Workshop, 2021 (Spotlight) bibtex / code
	Efficient Meta Lifelong-Learning with Limited Memory Sanket Vaibhav Mehta, Zirui Wang, Barnabás Póczos, Jaime Carbonell EMNLP, 2020 bibtex
	Learning Rhyming Constraints using Structured Adversaries Harsh Jhamtani, Sanket Vaibhav Mehta, Jaime Carbonell, Taylor Berg-Kirkpatrick EMNLP, 2019 bibtex / code / poster
	Gradient-Based Inference for Networks with Output Constraints Jay-Yoon Lee, Sanket Vaibhav Mehta, Michael Wick, Jean-Baptiste Tristan, Jaime Carbonell AAAI, 2019 bibtex / code
	Towards Semi-Supervised Learning for Deep Semantic Role Labeling Sanket Vaibhav Mehta, Jay-Yoon Lee, Jaime Carbonell EMNLP, 2018 bibtex / code / poster
	An LSTM Based System for Prediction of Human Activities with Durations Kundan Krishna, Deepali Jain, Sanket Vaibhav Mehta, Sunav Choudhary IMWUT, 2017 bibtex
	Preventing Inadvertent Information Disclosures via Automatic Security Policies Tanya Goyal, Sanket Vaibhav Mehta, Balaji Vasan Srinivasan PAKDD, 2017 bibtex