This repository provides the official implementation of the paper Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings by J. Macdonald, M. Besançon and S. Pokutta (2021).
TL;DR: We use a constrained optimization formulation of the Rate-Distortion Explanations (RDE) (Macdonald et al., 2019) method for relevance attribution and Frank-Wolfe algorithms for obtaining interpretable neural network predictions.
This repository contains subfolders with code for two independent experimental scenarios.
-
mnist
: Sparse relevance maps (relevance attribution) and relevance orderings for a relatively small LeNet-inspired neural network classifier on the MNIST dataset of greyscale images of handwritten digits. -
stl10
: Sparse relevance maps (relevance attribution) for a larger VGG-16 based neural network classifier on the STL-10 dataset of color images.
The package versions we used are specified in Project.toml
, Manifest.toml
, and setup.jl
.
To reproduce our computational environment run:
julia setup.jl
To test the installation run:
test_installation.jl
This should print all the installed Julia
and Python
packages.
The script rde.jl
can be used to obtain sparse relevance mappings.
The script rde_birkhoff.jl
can be used to obtain relevance orderings with deterministic Frank-Wolfe algorithms.
The script rde_birkhoff_stochastic.jl
can be used to obtain relevance orderings with stochastic Frank-Wolfe algorithms.
This repository is MIT licensed, as found in the LICENSE file.