Datamodels: Predicting Predictions with Training Data
Presenter
March 7, 2022
Keywords:
- machine learning
- robustness
- influence funcitons
MSC:
- 68T01
Abstract
Machine learning models tend to rely on an abundance of training data. Yet, understanding the underlying structure of this data---and models' exact dependence on it---remains a challenge.
In this talk, we will present a new framework---called datamodeling---for directly modeling predictions as functions of training data. This datamodeling framework, given a dataset and a learning algorithm, pinpoints---at varying levels of granularity---the relationships between train and test point pairs through the lens of the corresponding model class. Even in its most basic version, datamodels enable many applications, including discovering subpopulations, quantifying model brittleness via counterfactuals, and identifying train-test leakage.