Research
Our research interests and projects
Research interests
Our research group is concerned with the mathematical foundations of machine learning and signal processing. We develop mathematical theory for existing methods in these areas and develop new or refined methods. By mathematical theory, we primarily mean guarantees or conditions for the success of such methods.
Our research focuses on the convergence behavior of learning algorithms, in particular (stochastic) gradient descent methods for deep learning and generalization bounds for learned neural networks. We also work on compressive sensing, i.e., signal reconstruction using a minimal number of measurements.
Our mathematical research uses tools from (high-dimensional) probability theory and random matrix theory, convex and non-convex optimization, and various subfields of analysis, such as harmonic analysis.
Theory of Learning Algorithms
Learning neural networks and other machine learning models from training data usually leads to the problem of computing an (approximate) minimizer of a non-convex loss function of the parameters of the model. Commonly variants of (stochastic) gradient descent (SGD) are used for the challenging task of minimizing such a function of a large number of parameters. Despite remarkable performance in practice, convergence properties are not yet very well understood. We aim at advancing corresponding convergence theory.
One direction of our research studies convergence for simplified network models, in particular linear neural networks where the activation function is the identity. While such networks only model linear functions which are not expressive enough for most applications, studying their training is still non-trivial due to non-convexity. For instance, we could establish convergence to global minimizers for fully connected linear neural networks as well as convergence properties for convolutional linear neural networks.
In modern deep learning scenarios it is common to have more training data than network parameters leading to loss functions that have many global minimizers - all corresponding to networks interpolating the data. In this scenario, the employed optimization algorithm (as well as its parameters such as initialization) has significant influence on the computed solution, which is termed implicit bias or implicit regularization. Surprisingly, often solutions are computed that generalize well to unseen data. Understanding the implicit bias is hence crucial for both theory and practice. We study this phenomenon for several simplified models including linear (diagonal) networs and ReLU-networks.
Compressive Sensing
Compressive sensing is concerned with signal and image reconstruction problems where the amount of available information is smaller than the signal length. Mathematically this leads to an underdetermined system of equations which usually has infinitely many solutions. In order to make reconstruction feasible, additional assumptions on the signal/vector to be recovered have to be made, most commonly that the signal can be well-approximated by a sparse one, having only few non-zero coefficients (in an appropriate basis). Under suitable assumptions, accurate reconstruction from few measurements via efficient algorithms is then possible. Algorithms in use include convex optimization approaches (l1-minimization), greedy algorithms and other iterative schemes. Remarkably, provably optimal measurement schemes are modelled by random matrices. Indeed, reconstruction of an s-sparse signal of length n from m random measurements is provably possible if m > c s log(n/s).
Our current research is concerned with several variations and extensions of compressive sensing:
- Structured random matrices: In practice, physical measurements do not allow to inject randomness in arbitrary ways leading to structured random matrices such as random partial Fourier matrices, subsampled random convolutions and more. Obtaining recovery guarantees for such type of matrices is significantly more challenging than for unstructured Gaussian matrices.
- Recovery of low rank matrices and tensors: For certain applications, one needs to replace the sparsity assumptions by a low rank assumption on a matrix or higher-order tensor to be recovered. Obtaining theoretical guarantees on the number of required for low rank tensor recovery via efficient algorithms is particularly challenging. We develop theory in this direction.
- Phase retrieval: In applications including cristallography, ptychography and antenna measurements, only phase information can be measured. The signal recovery problem is then called phase retrieval and shows connections to low rank recovery. We work on theoretical bounds on the required measurements and develop practical recovery methods.
- Connections to deep learning: Sparse and low rank recovery problems provide useful model problems for studying the training process in deep learning. Our research provides interesting theoretical insights on the phenomenon of implicit regularization of gradient descent methods.