Wide Neural Networks are Gaussian Processes, Even When They Have Infinitely Many Layers With Shared Weights.
Neural networks with wide layers have gained increasing attention, as recent works have shown that they are equivalent to Gaussian processes. This equivalence allows overparameterized neural networks to fit training data perfectly while maintaining good performance on unseen data, a phenomenon known as benign overfitting. However, these results only hold for shallow or finite-depth neural networks. As inifinite-depth neural networks like deep equilibrium models and neural ODEs gain popularity, it becomes essential to analyze wide neural networks with infinite-depth layers.
In this work, we focus on analyzing the deep equilibrium model (DEQ), an infinite-depth neural network. Our analysis shows that DEQ tends to a Gaussian process as the width of its layers approaches infinity. Moreover, the same limit is obtained when the limits of depth and width are interchanged, which is not the case for common infinite-depth neural networks. Additionally, we demonstrate that the corresponding Gaussian vector is non-degenerate for any distinct training data. In another words, the smallest eigenvalue of the corresponding kernel matrix is always strictly positive. Then we successfully apply these results to the training and generalization of neural networks. We show that a simple first-order method such as Gradient descent converges to a global minimum as long as the network is sufficiently overparameterized, while achieving arbitrarily small generalization error.
Committee: Hongyang Gao (co-major professor), Hailiang Liu (co-major professor), Kevin Liu (co-major professor), Chris Quinn, and Namrata Vaswani
Join on Zoom: https://iastate.zoom.us/j/92533473482