Expressivity of Neural Networks

Type

Master thesis / Bachelor thesis

Prerequisites

Knowledge of Python and deep learning frameworks (TensorFlow or PyTorch)
Preferably background in functional analysis

Description

Deep learning models have achieved tremendous success in many application areas. This also raises interesting theoretical questions such as: Why can deep learning models adapt to widely different tasks? Are we able to describe their expressive power? A longstanding result in this direction is the Universal Approximation Theorem, which states that neural networks with only two layers can, under weak conditions, approximate any continuous function on a compact set up to arbitrary precision. However, this is only a starting point for the above questions. A common goal is to relate the approximation accuracy of neural networks to their complexity. It turns out that for many well-known classical function spaces, one can derive upper and lower complexity bounds in terms of the number of weights, neurons, and layers. Further exciting lines of research try to explain the benefits of deep neural networks over shallow ones and study how neural architectural design choices influence expressivity. These examples are among the many interesting and open research questions within the field of neural network expressivity.

References

Function approximation with deep (ReLU) neural networks
Deep Neural Network Approximation Theory https://arxiv.org/pdf/1901.02220
Expressivity of transformer architecture
Sumformer: Universal Approximation for Efficient Transformers. https://proceedings.mlr.press/v221/alberti23a/alberti23a.pdf
Are transformers universal approximators of sequence-to-sequence functions? https://arxiv.org/pdf/1912.10077
Representational Strengths and Limitations of Transformers.https://arxiv.org/pdf/2306.02896
Expressivity of neural networks in terms of the number of linear regions
On the Number of Linear Regions of Deep Neural Networks.https://arxiv.org/pdf/1402.1869
Bounding and Counting Linear Regions of Deep Neural Networks. https://arxiv.org/abs/1711.02114