GitHub - weihaox/UMBRAE: [ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality

UMBRAE: Unified Multimodal Brain Decoding (ECCV 2024)

Weihao Xia¹ Raoul de Charette² Cengiz Öztireli³ Jing-Hao Xue¹

¹University College London ²Inria ³University of Cambridge

UMBRAE decodes multimodal explanations from brain signals. (1) We introduce a universal brain encoder for multimodal-brain alignment and recover conceptual and spatial details by using multimodal large language models. (2) We introduce cross-subject training to overcome unique brain patterns of different individuals. This allows brain signals from multiple subjects to be trained within the same model. (3) Our method supports weakly-supervised subject adaptation, enabling the training of a model for a new subject in a data-efficient manner. (4) For evaluation, we introduce BrainHub, a brain understanding benchmark, based on NSD and COCO.

News 🚩

[2024/07/01] UMBRAE is accepted to ECCV 2024.
[2024/05/18] Update v1.4 checkpoint and Leaderboard.
[2024/04/16] Provide a Colab Demo for inference.
[2024/04/13] Update scripts for single-subject, cross-subject training, and new subject adaptation.
[2024/04/12] Inference and pretrained model available. Training code coming up soon.
[2024/04/11] BrainHub is available.
[2024/03/15] Both project and arXiv are available.

Method

Overview of UMBRAE. Our brain encoder includes subject-specific tokenizers and a universal perceive encoder. Brain signals from multiple subjects are mapped into a common feature space, enabling cross-subject training and weakly-supervised subject adaptation. The brain encoder learns to align neural signals with image features. During inference, the learned encoder interacts with MLLMs and performs brain understanding tasks according to given prompts.

Installation

Environment

conda create -n brainx python=3.10
conda activate brainx
pip install -r requirements.txt

Download Data and Checkpoints

The training and inference scripts support automatically downloading the dataset if the designated path is empty. However, this process can be quite slow. You can try the following script to download all data in advance if this happens. Please fill out the NSD Data Access form and agree to the Terms and Conditions.

Download Checkpoints from Hugging Face.

bash download_data.sh
bash download_checkpoint.sh

Inference

Our method inherits multimodal understanding capabilities of MLLMs, enabling the switch between different tasks through different prompts. You can either use the prompts listed in our paper or create customised instructions according to actual needs. Please specify brainx-v-1-4 or brainx.

exp='brainx-v-1-4' # 'brainx'

prompt_caption='Describe this image <image> as simply as possible.'

for sub in 1 2 5 7
do
python inference.py --data_path 'nsd_data' --fmri_encoder 'brainx' --subj $sub \
    --prompt "$prompt_caption" --brainx_path "train_logs/${exp}/last.pth" \
    --save_path "evaluation/eval_caption/${exp}"
done

Given that identified classes might be named differently, or simply absent from ground truth labels, we evaluate bounding boxes through REC. We use prompt "Locate <expr> in <image> and provide its coordinates, please", but others like "Can you point out <expr> in the image and provide the bounding boxes of its location?" shall also work.

for sub in 1 2 5 7
do
    python inference_rec.py --data_path 'nsd_data' --fmri_encoder 'brainx' \
      --subj $sub --brainx_path "train_logs/${exp}/last.pth" \
      --save_path "evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
done

Training

Single-Subject Training

accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train.py \
    --data_path 'nsd_data' --fmri_encoder 'brainxs' --subj 1 \
    --model_save_path 'train_logs/demo_single_subject/sub01_dim1024'

Cross-Subject Training

accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx.py \
    --data_path 'nsd_data' --fmri_encoder 'brainx' --batch_size 128 --num_epochs 300 \
    --model_save_path 'train_logs/demo_cross_subject' --subj 1 2 5 7

Weakly-Supervised Subject Adaptation

If you would like to adapt to a new subject, for example, S7, first train a model with other available subjects (S1, S2, S5) using the above cross-subject training. Then train the new subject using the following command.

sub=7
data_ratio=1.0
accelerate launch --num_processes=1 --num_machines=1 --gpu_ids='0' train_brainx_adaptation.py \
    --data_path 'nsd_data' --fmri_encoder 'brainxc' --batch_size 128 --num_epochs 240 \
    --subj $sub --data_ratio $data_ratio \
    --encoder_path 'train_logs/demo_cross_subject/brainx_adaptation_125/last.pth' \
    --model_save_path "train_logs/demo_weak_adaptation/brainx_adaptation_${sub}_${data_ratio}"

Evaluation

The benchmark, including groundtruth data, evaluation scripts, and baseline results, is in brainhub.

Download brainhub to the root path: git clone https://github.com/weihaox/BrainHub
Process groundtruth test images: python processing/decode_images.py
Run evaluation for brain captioning and grounding:

cd BrainHub
for sub in 1 2 5 7
do
    python eval_caption.py ../umbrae/evaluation/eval_caption/${exp}/sub0${sub}_dim1024/fmricap.json \
        caption/images --references_json caption/fmri_cococap.json
    python eval_bbox_rec.py --path_out "../umbrae/evaluation/eval_bbox_rec/${exp}/sub0${sub}_dim1024"
done

We also provide baseline results associated with BrainHub, including the captioning results from SDRecon, BrainCap, and OneLLM, as well as the captioning and grounding results from UMBRAE.

TODO

Release inference scripts and pretrained checkpoints.
Update training scripts.
Provide online demo.
Train on all 8 subjects in NSD.
Support other MLLMs such as NExT-Chat, CogVLM, Genixer

Acknowledgements

We thank the authors of SDRecon, BrainCap, and OneLLM for providing the codes or the results. We also express gratitude for NSD and COCO, which were used to construct our brainhub. The training script is based on MindEye. We utilize the pretrained models Shikra and LLaVA as the MLLMs. Thanks for the awesome research works.

Citation

@inproceedings{xia2024umbrae,
  author    = {Xia, Weihao and de Charette, Raoul and Öztireli, Cengiz and Xue, Jing-Hao},
  title     = {UMBRAE: Unified Multimodal Brain Decoding},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
umbrae		umbrae
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMBRAE: Unified Multimodal Brain Decoding (ECCV 2024)

News 🚩

Method

Installation

Environment

Download Data and Checkpoints

Inference

Training

Single-Subject Training

Cross-Subject Training

Weakly-Supervised Subject Adaptation

Evaluation

TODO

Acknowledgements

Citation

About

Languages

License

weihaox/UMBRAE

Folders and files

Latest commit

History

Repository files navigation

UMBRAE: Unified Multimodal Brain Decoding (ECCV 2024)

News 🚩

Method

Installation

Environment

Download Data and Checkpoints

Inference

Training

Single-Subject Training

Cross-Subject Training

Weakly-Supervised Subject Adaptation

Evaluation

TODO

Acknowledgements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages