Volume 54, pp. 610-619, 2021.

On the regularization effect of stochastic gradient descent applied to least-squares

Stefan Steinerberger

Abstract

We study the behavior of the stochastic gradient descent method applied to Axb22min for invertible matrices ARn×n. We show that there is an explicit constant cA depending (mildly) on A such that EAxk+1b22(1+cAAF2)Axkb222AF2ATA(xkx)22. This is a curious inequality since the last term involves one additional matrix multiplication applied to the error xkx compared to the remaining terms: if the projection of xkx onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.

Full Text (PDF) [804 KB], BibTeX , DOI: 10.1553/etna_vol54s610

Key words

stochastic gradient descent, Kaczmarz method, least-squares, regularization

AMS subject classifications

65F10, 65K10, 65K15, 90C06, 93E24