Abstract
Standard machine learning produces models that are highly accurate on average but that degrade dramatically when the test distribution deviates from the training distribution. While one can train robust models, this often comes at the expense of standard accuracy (on the training distribution). We study this tradeoff in two settings, adversarial examples and minority groups, creating simple examples which highlight generalization issues as a major source of this tradeoff. For adversarial examples, we show that even augmenting with correct data can produce worse models, but we develop a simple method, robust self training, that mitigates this tradeoff using unlabeled data. For minority groups, we show that overparametrization of models can also hurt accuracy. These results suggest that the "more data" and "bigger models" strategy that works well for improving standard accuracy need not work on out-of-domain settings, even in favorable conditions.