Skip to content
/ ReAtCo Public

An official implementation of "Re-Attentional Controllable Video Diffusion Editing" in PyTorch. (AAAI 2025)

License

Notifications You must be signed in to change notification settings

mdswyz/ReAtCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReAtCo

Re-Attentional Controllable Video Diffusion Editing, AAAI 2025

This work proposes a new text-guided video editing framework focusing on controllable video generation and editing, with a particular emphasis on the controllability of the spatial location of multiple foreground objects.

Video Demos

[Source Video]: "Two dolphins are swimming in the blue ocean." "A jellyfish and a goldfish are swimming in the blue ocean, with the jellyfish is to the left of the goldfish." "A turtle and a goldfish are swimming in the blue ocean, with the turtle is to the left of the goldfish." "A jellyfish and a octopus are swimming in the blue ocean, with the jellyfish is to the left of the octopus."
[Source Video]: "Two hares are grazing in the grass." "A swan and a hare are grazing in the grass, with the swan is to the left of the hare." "A cat and a swan are grazing in the grass, with the cat is to the left of the swan." "A cat and a swan are grazing in the yellow meadow, with the cat is to the left of the swan."

Overview Framework of ReAtCo

The main idea of ReAtCo is to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. More details can be found in our paper.

Usage

We now introduce how to run our codes and edit the controllable and desired target videos.

1. Requirements

We use the classic Tune-A-Video as the pretrained base video editing model so that the Requirements can follow Tune-A-Video's publicly available codes. Note: Due to the latest xformers requiring PyTorch 2.5.1, we have tested our codes on the latest version with the V100 GPU, and the full environment is reported in environment.txt

2. Pretrained Video Editing Model

Before obtaining the Tune-A-Video editing model, you need to download the pretrained Stable Diffusion v1-4 model, which should be placed in the ./checkpoints. Then run the following command:

accelerate launch train_tuneavideo.py --config=configs/dolphins-swimming.yaml

And, the pretrained video editing models are saved in ./tune_a_video_model.

3. ReAtCo Video Editing

Generating video latents with the following command:

python generation_video_latents.py

Editing videos with the following command:

python reatco_editing_dolphins-swimming.py

The edited videos are saved in ./edited_videos.

Note: In the script above, the default setting is the Resource-friendly ReAtCo Paradigm, which ensures that ReAtCo can edit videos on a consumer-grade GPU (e.g. RTX 4090/3090). More details can be found in the Appendix of our paper. In particular, we set the window_size=4 as default, which is compatible with RTX 4090/3090 GPU. If you have sufficient GPU resources and do not want to use the resource-friendly paradigm, please set window_size=video_length.

Citation

If you find the codes helpful in your research or work, please cite the following paper:

@article{ReAtCo,
  title={Re-Attentional Controllable Video Diffusion Editing},
  author={Wang, Yuanzhi and Li, Yong and Liu, Mengyi and Zhang, Xiaoya and Liu, Xin and Cui, Zhen and Chan, Antoni B.},
  journal={arXiv preprint arXiv:2412.11710},
  year={2024}
}

About

An official implementation of "Re-Attentional Controllable Video Diffusion Editing" in PyTorch. (AAAI 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages