Type
Master's thesis / Bachelor's thesis / guided research
Prerequisites
- Sound knowledge of machine learning
- Solid knowledge of probability theory
- Proficiency in Python
- (Preferred) Proficiency in deep learning frameworks (TensorFlow or PyTorch)
Description
Deep neural networks (DNNs) have shown success in various domains (such as image classification, speech recognition, playing games, etc.) over the last decades. However, the theoretical properties of DNNs, particularly their remarkable generalization abilities, largely remain unclear to the scientific community. DNNs used in practice are usually strongly over-parameterized, i.e., they have many more parameters than training samples. According to classical statistical learning theory, such models are prone to overfitting, i.e., they perform much worse on test data than on training data. Surprisingly, DNNs often generalize well to test data in practice, directly challenging classical statistical learning theory. Therefore, we need a new theory of DNN generalization and more thorough experiments that precede the theory.
References
- Generalization theory for deep learning requires new approaches
Understanding deep learning requires rethinking generalization https://arxiv.org/pdf/1611.03530 - Theory of double descent for random features models
Two models of double descent for weak features https://arxiv.org/pdf/1903.07571 - Example of generalization bound based on the NTK
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks http://proceedings.mlr.press/v97/arora19a/arora19a-supp.pdf