Volume 54, pp. 610-619, 2021.
On the regularization effect of stochastic gradient descent applied to least-squares
Stefan Steinerberger
Abstract
We study the behavior of the stochastic gradient descent method
applied to for invertible matrices . We show that there is an explicit constant depending (mildly) on such that
This is a curious inequality since the last term involves
one additional matrix multiplication applied to the error compared to the remaining terms: if the projection of onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.
Full Text (PDF) [804 KB],
BibTeX
, DOI: 10.1553/etna_vol54s610
Key words
stochastic gradient descent, Kaczmarz method, least-squares, regularization
AMS subject classifications
65F10, 65K10, 65K15, 90C06, 93E24