Visual Spectral Probing

This archive contains implementations of the methods from the masters's thesis "Visual Spectral Probing".

It enables probing for the Vision Transformer using unfiltered, manually filtered and automatically filtered representations using the following filter definitions:

Filter Bypass Mode: No filtering is performed, allowing an assessment of the original contextual representations.
Auto-Filter Mode: The filter dynamically adjusts its weights based on the existing training objective, jointly learning which frequencies to amplify or filter out. This mode aims to optimize the model's performance on the classification task.
Band-Filter Mode: Indicated frequency bands are selectively removed from the contextual representations. This mode provides insights into how specific frequency ranges contribute to the overall understanding of the image.

The toolkit is relatively flexible and can be applied to any model size or dataset availiable at the Hugging Face transformers library.

After installing the required packages, and downloading external datasets, the experiments can be re-run using the train.py script. Please see the instructions below for details.

Repo description

coco_backgrounds directory contains the dataset script that automatically downloads the coco dataset and executes filtering based on the specified threshold
notebooks directory contains some examples from the datasets used in the experiments
src directory contains the model, callback, and support functions
train.py is the training script

Installation

This repository uses Python 3.11.6 and the associated packages listed in the requirements.txt (a virtual environment is recommended):

(venv) $ pip install -r requirements.txt

Data

COCO Backgrounds

To explore the role of low-frequency components in contextual representations for image classification, a specialized dataset, COCO Backgrounds, is curated. Derived from the COCO-Stuff dataset, it augments COCO 2017 images with pixel-wise annotations for 91 stuff classes. The dataset preparation involves a three-step approach, emphasizing meaningful labeling and resulting in a dataset tailored for studying low-frequency components. Various experiments, employing different thresholds, demonstrate the dataset's suitability for probing experiments.

For image examples, refer to the relative notebook.

Food 101

To complement the low-frequency analysis, the Food 101 dataset is employed. With 101 food categories and 101,000 images, it aligns with the goal of understanding scale-specific information in image embeddings. Selected for structured patterns and minimal background variability, the dataset aims to evaluate how disentangling high-frequency components influences the model's classification of food items. The focus is on uncovering insights into the impact of scale-specific information on classifying items with distinct visual structures.

For image examples, refer to relative notebook.

Experiments

Running an experiment involves training a classification head together with a specified filter (see train.py for details):

(venv) $ python train.py

Results

Experiment runs for differnt configurations can be viewed at the following links:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
coco_backgrounds		coco_backgrounds
misc		misc
notebooks		notebooks
src		src
README.md		README.md
thesis.pdf		thesis.pdf
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Spectral Probing

Repo description

Installation

Data

COCO Backgrounds

Food 101

Experiments

Results

About

Releases

Packages

Languages

AndreiCaraman/Visual-Spectral-Probing

Folders and files

Latest commit

History

Repository files navigation

Visual Spectral Probing

Repo description

Installation

Data

COCO Backgrounds

Food 101

Experiments

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages