On the Stability and Generalization of First-order Bilevel Minimax Optimization

Abstract

Bilevel optimization and bilevel minimax optimization have recently emerged as unifying frameworks for a range of machine-learning tasks, including hyperparameter optimization and reinforcement learning. The existing literature focuses on empirical efficiency and convergence guarantees, leaving a critical theoretical gap in understanding how well these algorithms generalize.

To bridge this gap, we provide the first systematic generalization analysis for first-order gradient-based bilevel minimax solvers with lower-level minimax problems. Specifically, by leveraging algorithmic stability arguments, we derive fine-grained generalization bounds for three representative algorithms, including single-timescale stochastic gradient descent-ascent, and two variants of two-timescale stochastic gradient descent-ascent.

Our results reveal a precise trade-off among algorithmic stability, generalization gaps, and practical settings. Furthermore, extensive empirical evaluations corroborate our theoretical insights on realistic optimization tasks with bilevel minimax structures.

On the Stability and Generalization of First-order Bilevel Minimax Optimization

Authors

Abstract

Resources

Stay in the loop

Pages

Tools

Details