Research
My current research focuses on advancing large language models (LLMs), particularly through innovative mid-training strategies. I explore how to efficiently adapt and enhance LLMs for new capabilities, drawing on principles of continual learning, transfer learning, and modularity.
My doctoral thesis focused on designing efficient lifelong learning systems to overcome catastrophic forgetting and enable continual learning of new tasks. Inspired by biological learning and deep learning advances, this thesis involved injecting inductive biases into core machine learning components: model (architecture & initialization),
training (objective & optimization), and
data (leveraging limited labeled & unlabeled data).
|
Latest News
- [May 15, 2025]
BIG-Bench Extra Hard (BBEH) paper accepted at ACL 2025! TL;DR!
- [Feb. 9, 2024]
My doctoral thesis is available online on CMU KiltHub!
- [Dec. 11, 2023]
I'm attending NeurIPS'23 to present our JMLR and Making Scalable Meta Learning Practical papers!
- [Dec. 4, 2023]
I'm attending EMNLP'23 to present our DSI++ paper!
- [Nov. 30, 2023]
I successfully defended my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)
- [Oct. 7, 2023]
DSI++ paper accepted at EMNLP 2023! TL;DR!
- [Sept. 21, 2023]
Making Scalable Meta Learning Practical paper accepted at NeurIPS 2023! TL;DR!
- [Jul. 4, 2023]
An Empirical Investigation of the Role of Pre-training in Lifelong Learning paper accepted at JMLR 2023 and will be presented at NeurIPS 2023 Journal-to-Conference Track! TL;DR!
- [Jan. 23, 2023]
I proposed my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)
- [Oct. 6, 2022]
Train Flat, Then Compress paper accepted at Findings of EMNLP 2022! TL;DR!
- [July. 10, 2022]
An Introduction to Lifelong Supervised Learning primer is out!
- [May. 23, 2022]
Excited to start summer internship with Yi Tay and Jai Gupta at Google Research!
- [Feb. 24, 2022]
Compositional Generalization for Data-to-Text Generation paper accepted at ACL 2022!
- [Jan. 20, 2022]
ExT5: Extreme Multi-Task Scaling paper accepted at ICLR 2022!
|
 |
BIG-Bench Extra Hard
Mehran Kazemi,
Bahare Fatemi,
Hritik Bansal,
John Palowitch,
Chrysovalantis Anastasiou,
Sanket Vaibhav Mehta,
Lalit K. Jain,
Virginia Aglietti,
Disha Jindal,
Peter Chen,
Nishanth Dikkala,
Gladys Tyen,
Xin Liu,
Uri Shalit,
Silvia Chiappa,
Kate Olszewska,
Yi Tay,
Vinh Q. Tran,
Quoc V. Le,
Orhan Firat
ACL, 2025
bibtex /
code /
tweet
|
 |
Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data
Sanket Vaibhav Mehta
PhD Thesis, Carnegie Mellon University, 2023
bibtex /
tweet
|
 |
DSI++: Updating Transformer Memory with New Documents
Sanket Vaibhav Mehta,
Jai Gupta,
Yi Tay,
Mostafa Dehghani,
Vinh Q. Tran,
Jinfeng Rao,
Marc Najork,
Emma Strubell,
Donald Metzler
EMNLP, 2023
bibtex /
tweet
|
 |
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta,
Darshan Patil,
Sarath Chandar,
Emma Strubell
Journal of Machine Learning Research, 2023
bibtex /
code /
tweet
|
 |
Making Scalable Meta Learning Practical
Sang Keun Choe,
Sanket Vaibhav Mehta,
Hwijeen Ahn,
Willie Neiswanger,
Pengtao Xie,
Emma Strubell,
Eric Xing
NeurIPS, 2023
bibtex /
code /
tweet
|
 |
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na,
Sanket Vaibhav Mehta,
Emma Strubell
EMNLP Findings, 2022
bibtex /
code /
tweet
|
 |
An Introduction to Lifelong Supervised Learning
Shagun Sodhani,
Mojtaba Faramarzi,
Sanket Vaibhav Mehta,
Pranshu Malviya,
Mohamed Abdelsalam,
Janarthanan Janarthanan,
Sarath Chandar
arXiv, 2022
bibtex /
tweet
|
 |
Improving Compositional Generalization with Self-Training for Data-to-Text Generation
Sanket Vaibhav Mehta,
Jinfeng Rao,
Yi Tay,
Mihir Kale,
Ankur Parikh,
Emma Strubell
ACL, 2022
bibtex /
code /
poster
|
 |
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta,
Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
ICLR, 2022
bibtex /
press /
tweet
|
 |
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta,
Darshan Patil,
Sarath Chandar,
Emma Strubell
ICML Theory and Foundation of Continual Learning Workshop, 2021 (Spotlight)
bibtex /
code
|
 |
Efficient Meta Lifelong-Learning with Limited Memory
Sanket Vaibhav Mehta*,
Zirui Wang*,
Barnabás Póczos,
Jaime Carbonell
EMNLP, 2020
bibtex
|
 |
Learning Rhyming Constraints using Structured Adversaries
Harsh Jhamtani,
Sanket Vaibhav Mehta,
Jaime Carbonell,
Taylor Berg-Kirkpatrick
EMNLP, 2019
bibtex /
code /
poster
|
 |
Gradient-Based Inference for Networks with Output Constraints
Jay-Yoon Lee,
Sanket Vaibhav Mehta,
Michael Wick,
Jean-Baptiste Tristan,
Jaime Carbonell
AAAI, 2019
bibtex /
code
|
 |
Towards Semi-Supervised Learning for Deep Semantic Role Labeling
Sanket Vaibhav Mehta*,
Jay-Yoon Lee*,
Jaime Carbonell
EMNLP, 2018
bibtex /
code /
poster
|
 |
An LSTM Based System for Prediction of Human Activities with Durations
Kundan Krishna,
Deepali Jain,
Sanket Vaibhav Mehta,
Sunav Choudhary
IMWUT, 2017
bibtex
|
 |
Preventing Inadvertent Information Disclosures via Automatic Security Policies
Tanya Goyal,
Sanket Vaibhav Mehta,
Balaji Vasan Srinivasan
PAKDD, 2017
bibtex
|
Issued Patents
|
1. Generating data-driven geo-fences (US 9,838,843)
|
2. Propagation of changes in master content to variant content (US 10,102,191)
|
3. Digital document update (US 10,489,498)
|
4. Tagging documents with security policies (US 10,783,262)
|
5. Digital document update using static and transient tags (US 10,846,466)
|
6. Tenant-side detection, classification, and mitigation of noisy-neighbor-induced performance degradation (US 11,086,646)
|
7. Intelligent customer journey mining and mapping (US 11,756,058)
|
Based on Jon Barron's website.
|