Research
I'm interested in machine learning, natural language processing and optimization with a specific focus on learning from limited labeled data, multiple tasks, non-stationary data distributions (Continual/ Lifelong Learning, Transfer Learning, Meta Learning, Multi-Task Learning, Modular Learning).
My doctoral thesis focuses on designing efficient lifelong learning systems that alleviate catastrophic forgetting of previously learned knowledge and facilitate continual learning of new tasks. Inspired by biological learning
processes and progress in deep learning, my work injects appropriate inductive biases into the three main components of data-driven machine learning:
model (architecture & initialization),
training (objective & optimization), and
data (limited labeled & unlabeled).
|
Latest News
- [Feb. 9, 2024]
My doctoral thesis is available online on CMU KiltHub!
- [Dec. 11, 2023]
I'm attending NeurIPS'23 to present our JMLR and Making Scalable Meta Learning Practical papers!
- [Dec. 4, 2023]
I'm attending EMNLP'23 to present our DSI++ paper!
- [Nov, 2023]
Excited to join Google Research as a full-time researcher.
- [Nov. 30, 2023]
I successfully defended my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)
- [Oct. 7, 2023]
DSI++ paper accepted at EMNLP 2023!
- [Sept. 21, 2023]
Making Scalable Meta Learning Practical paper accepted at NeurIPS 2023! TL;DR!
- [Jul. 4, 2023]
An Empirical Investigation of the Role of Pre-training in Lifelong Learning paper accepted at JMLR 2023 and will be presented at NeurIPS 2023 Journal-to-Conference Track! TL;DR!
- [Jun. 23, 2023]
Conditional Diffusion Replay for Continual Learning in Medical Settings paper accepted at Workshop on Challenges in Deployable Generative AI, ICML 2023!
- [Jun. 19, 2023]
Adapting to Gradual Distribution Shifts with Continual Weight Averaging paper accepted at Workshop on High-dimensional Learning Dynamics, ICML 2023!
- [Jan. 23, 2023]
I proposed my doctoral thesis! (Thesis committee: Emma Strubell, William Cohen, Aditi Raghunathan, Dani Yogatama)
- [Dec. 19, 2022]
DSI++ paper is out! TL;DR!
- [Nov. 28, 2022]
I'm attending NeurIPS'22!
- [Oct. 6, 2022]
Train Flat, Then Compress paper accepted at Findings of EMNLP 2022!
- [July. 10, 2022]
An Introduction to Lifelong Supervised Learning primer is out!
- [May. 23, 2022]
Excited to start summer internship with Yi Tay and Jai Gupta at Google Research!
- [Feb. 24, 2022]
Compositional Generalization for Data-to-Text Generation paper accepted at ACL 2022!
- [Jan. 20, 2022]
ExT5: Extreme Multi-Task Scaling paper accepted at ICLR 2022!
|
|
Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data
Sanket Vaibhav Mehta
PhD thesis, Carnegie Mellon University, 2023
bibtex /
tweet
|
|
DSI++: Updating Transformer Memory with New Documents
Sanket Vaibhav Mehta,
Jai Gupta,
Yi Tay,
Mostafa Dehghani,
Vinh Q. Tran,
Jinfeng Rao,
Marc Najork,
Emma Strubell,
Donald Metzler
EMNLP, 2023
bibtex /
tweet
|
|
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta,
Darshan Patil,
Sarath Chandar,
Emma Strubell
Journal of Machine Learning Research, 2023
bibtex /
code /
tweet
|
|
Making Scalable Meta Learning Practical
Sang Keun Choe,
Sanket Vaibhav Mehta,
Hwijeen Ahn,
Willie Neiswanger,
Pengtao Xie,
Emma Strubell,
Eric Xing
NeurIPS, 2023
bibtex /
code /
tweet
|
|
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na,
Sanket Vaibhav Mehta,
Emma Strubell
EMNLP Findings, 2022
bibtex /
code /
tweet
|
|
An Introduction to Lifelong Supervised Learning
Shagun Sodhani,
Mojtaba Faramarzi,
Sanket Vaibhav Mehta,
Pranshu Malviya,
Mohamed Abdelsalam,
Janarthanan Janarthanan,
Sarath Chandar
arXiv, 2022
bibtex /
tweet
|
|
Improving Compositional Generalization with Self-Training for Data-to-Text Generation
Sanket Vaibhav Mehta,
Jinfeng Rao,
Yi Tay,
Mihir Kale,
Ankur Parikh,
Emma Strubell
ACL, 2022
bibtex /
code /
poster
|
|
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta,
Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
ICLR, 2022
bibtex /
press /
tweet
|
|
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta,
Darshan Patil,
Sarath Chandar,
Emma Strubell
ICML Theory and Foundation of Continual Learning Workshop, 2021 (Spotlight)
bibtex /
code
|
|
Efficient Meta Lifelong-Learning with Limited Memory
Sanket Vaibhav Mehta*,
Zirui Wang*,
Barnabás Póczos,
Jaime Carbonell
EMNLP, 2020
bibtex
|
|
Learning Rhyming Constraints using Structured Adversaries
Harsh Jhamtani,
Sanket Vaibhav Mehta,
Jaime Carbonell,
Taylor Berg-Kirkpatrick
EMNLP, 2019
bibtex /
code /
poster
|
|
Gradient-Based Inference for Networks with Output Constraints
Jay-Yoon Lee,
Sanket Vaibhav Mehta,
Michael Wick,
Jean-Baptiste Tristan,
Jaime Carbonell
AAAI, 2019
bibtex /
code
|
|
Towards Semi-Supervised Learning for Deep Semantic Role Labeling
Sanket Vaibhav Mehta*,
Jay-Yoon Lee*,
Jaime Carbonell
EMNLP, 2018
bibtex /
code /
poster
|
|
An LSTM Based System for Prediction of Human Activities with Durations
Kundan Krishna,
Deepali Jain,
Sanket Vaibhav Mehta,
Sunav Choudhary
IMWUT, 2017
bibtex
|
|
Preventing Inadvertent Information Disclosures via Automatic Security Policies
Tanya Goyal,
Sanket Vaibhav Mehta,
Balaji Vasan Srinivasan
PAKDD, 2017
bibtex
|
Issued Patents
|
1. Generating data-driven geo-fences (US 9,838,843)
|
2. Propagation of changes in master content to variant content (US 10,102,191)
|
3. Digital document update (US 10,489,498)
|
4. Tagging documents with security policies (US 10,783,262)
|
5. Digital document update using static and transient tags (US 10,846,466)
|
6. Tenant-side detection, classification, and mitigation of noisy-neighbor-induced performance degradation (US 11,086,646)
|
7. Intelligent customer journey mining and mapping (US 11,756,058)
|
Based on Jon Barron's website.
|