Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Abstract

This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme.

Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Authors

Abstract

Resources

Stay in the loop

Pages

Tools

Details