ReAtCo

Re-Attentional Controllable Video Diffusion Editing, AAAI 2025

This work proposes a new text-guided video editing framework focusing on controllable video generation and editing, with a particular emphasis on the controllability of the spatial location of multiple foreground objects.

Video Demos


[Source Video]: "Two dolphins are swimming in the blue ocean."	"A jellyfish and a goldfish are swimming in the blue ocean, with the jellyfish is to the left of the goldfish."	"A turtle and a goldfish are swimming in the blue ocean, with the turtle is to the left of the goldfish."	"A jellyfish and a octopus are swimming in the blue ocean, with the jellyfish is to the left of the octopus."


[Source Video]: "Two hares are grazing in the grass."	"A swan and a hare are grazing in the grass, with the swan is to the left of the hare."	"A cat and a swan are grazing in the grass, with the cat is to the left of the swan."	"A cat and a swan are grazing in the yellow meadow, with the cat is to the left of the swan."

Overview Framework of ReAtCo

The main idea of ReAtCo is to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. More details can be found in our paper.

Usage

We now introduce how to run our codes and edit the controllable and desired target videos.

1. Requirements

We use the classic Tune-A-Video as the pretrained base video editing model so that the Requirements can follow Tune-A-Video's publicly available codes. Note: Due to the latest xformers requiring PyTorch 2.5.1, we have tested our codes on the latest version with the V100 GPU, and the full environment is reported in environment.txt

2. Pretrained Video Editing Model

Before obtaining the Tune-A-Video editing model, you need to download the pretrained Stable Diffusion v1-4 model, which should be placed in the ./checkpoints. Then run the following command:

accelerate launch train_tuneavideo.py --config=configs/dolphins-swimming.yaml

And, the pretrained video editing models are saved in ./tune_a_video_model.

3. ReAtCo Video Editing

Generating video latents with the following command:

python generation_video_latents.py

Editing videos with the following command:

python reatco_editing_dolphins-swimming.py

The edited videos are saved in ./edited_videos.

Note: In the script above, the default setting is the Resource-friendly ReAtCo Paradigm, which ensures that ReAtCo can edit videos on a consumer-grade GPU (e.g. RTX 4090/3090). More details can be found in the Appendix of our paper. In particular, we set the window_size=4 as default, which is compatible with RTX 4090/3090 GPU. If you have sufficient GPU resources and do not want to use the resource-friendly paradigm, please set window_size=video_length.

Citation

If you find the codes helpful in your research or work, please cite the following paper:

@article{ReAtCo,
  title={Re-Attentional Controllable Video Diffusion Editing},
  author={Wang, Yuanzhi and Li, Yong and Liu, Mengyi and Zhang, Xiaoya and Liu, Xin and Cui, Zhen and Chan, Antoni B.},
  journal={arXiv preprint arXiv:2412.11710},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ReAtCo

Re-Attentional Controllable Video Diffusion Editing, AAAI 2025

Video Demos

Overview Framework of ReAtCo

Usage

1. Requirements

2. Pretrained Video Editing Model

3. ReAtCo Video Editing

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

ReAtCo

Re-Attentional Controllable Video Diffusion Editing, AAAI 2025

Video Demos

Overview Framework of ReAtCo

Usage

1. Requirements

2. Pretrained Video Editing Model

3. ReAtCo Video Editing

Citation