Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video inpainting using a sequence of masks #40

Open
AHHHZ975 opened this issue Dec 25, 2023 · 3 comments
Open

Video inpainting using a sequence of masks #40

AHHHZ975 opened this issue Dec 25, 2023 · 3 comments

Comments

@AHHHZ975
Copy link

AHHHZ975 commented Dec 25, 2023

Hi Shiwei, @Steven-SWZhang

Thank you for publically making available the great work you have done.
I have been trying to reproduce the results for the task "video inpainting using a sequence of masks". More specifically, I have a video including 10 frames and 10 masks corresponding to those 10 frames of videos. So, I would like to feed the video alongside the sequence of masks and a text prompt to the model. So, I expect to get a temporally consistent video as an output in a way that the output video adheres to the sequence of masks and the text prompt.

However, I could not see any argument for the input mask. So, I went through the code, and as far as I understood, it seems that the code itself generates a random mask on the input video. The code below (inference_single.py) shows my explanation:

image

in which the function "make_masked_images" is:

image

So, as far as I realized, this "mask" variable in line 564 of the first snapshot is initialized with the "batch" variable (which comes from the dataloader) in the picture below:

image

So, when I went through the "dataset.py" code, I found out the mask is somehow randomly generated as the following:

image

So, my understanding is that the code only conditions the model on this randomly generated mask. So, if my understanding is correct, does it mean that we cannot feed an external sequence of masks to the model? If the understanding is not correct, I would appreciate it if you could explain how I can feed the sequence of masks to the model as I could not find anything in the code.

Thank you in advance for putting time into this case.

Kind Regards,
Amir

@AHHHZ975
Copy link
Author

AHHHZ975 commented Jan 7, 2024

Hello,
I would appreciate it if someone could help me with this issue, please.
Best,
Amir

@Zeldalina
Copy link

同样的的问题求教。

@InkosiZhong
Copy link

I have tried to support customized mask sequence by modifying the implementation of __getitem__ in VideoDataset.
However, I observe that the make_masked_images is some kind weird.

def make_masked_images(imgs, masks):
    masked_imgs = []
    for i, mask in enumerate(masks):        
        # concatenation
        masked_imgs.append(torch.cat([imgs[i] * (1 - mask), (1 - mask)], dim=1))
    return torch.stack(masked_imgs, dim=0)

# line 562-564
if 'mask' in cfg.video_compositions:
    masked_video = make_masked_images(misc_data.sub(0.5).div_(0.5), mask)
    masked_video = rearrange(masked_video, 'b f c h w -> b c f h w')

It first normalizes the video sequence to $[-1,1]$, and then uses make_masked_images to set the masked pixels to $0$.
Normally, should we multiply by mask first and then normalize? Is this a design or a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants