Explainability of AI

Type

Master's thesis / Bachelor's thesis / supervised research

Prerequisites

Knowledge of deep learning with image data, natural language data or graph data
Proficiency in Python and deep learning frameworks (either PyTorch or Tensorflow)

Description

Over the last decade, deep learning methods have been deployed in numerous real-world, often safety-critical, applications. However, a major and growing concern remains the explainability of neural network decisions. A neural network operates as a black box: A priori, one can only comprehend the input and output of a neural net decision, not the reasoning leading to the decision. The explainable AI (XAI) field aims to develop explanation methods that “open the black box” and shed light on the reasoning behind neural network decisions.

References

Comprehensive book covering many XAI methods
Interpretable Machine Learning https://christophm.github.io/interpretable-ml-book/
Classic work on using Shapley values for XAI
A unified approach to interpreting model predictions https://arxiv.org/abs/1705.07874
One of the first works on concept based interpretability
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors https://arxiv.org/abs/1711.11279
A popular post-hoc method for image classifiers
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization https://arxiv.org/abs/1610.02391
Popular tree and prototype based method for explainability
Prototypical Trees for Interpretable Image Classification https://openaccess.thecvf.com/content/CVPR2021/papers/Nauta_Neural_Prototype_Trees_for_Interpretable_Fine-Grained_Image_Recognition_CVPR_2021_paper.pdf
Popular concept based method that is interpretable by design
Concept Bottleneck Models https://arxiv.org/abs/2007.04612
Another method for interpretability by design that is relevant to current XAI research in the chair
Variational Information Pursuit for Interpretable Predictions https://arxiv.org/abs/2302.02876