Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which layer of DINOv2 do you align with? #11

Open
Luciennnnnnn opened this issue Jan 13, 2025 · 7 comments
Open

Which layer of DINOv2 do you align with? #11

Luciennnnnnn opened this issue Jan 13, 2025 · 7 comments

Comments

@Luciennnnnnn
Copy link

Hi, I'm trying to reproduce the training of VA-VAE, which layer of DINOv2 do you align with?

@JingfengYao
Copy link
Member

@Luciennnnnnn We simply use the last layer.

@Luciennnnnnn
Copy link
Author

@JingfengYao Is this right?

features = self.foundation_model.forward_features(rescale_inputs)["x_norm_patchtokens"]

@Luciennnnnnn
Copy link
Author

By the way, how do you align the resolution of latent vector and the feature of DINOv2?

@JingfengYao
Copy link
Member

JingfengYao commented Jan 13, 2025

Here are my implementations:

def get_dinov2_encoder():
    """
    Load the DINOv2 pretrained ViT-L encoder from the timm library.
    """
    model = timm.create_model("hf-hub:timm/vit_large_patch14_dinov2.lvd142m", pretrained=True, dynamic_img_size=True)
    model.requires_grad_(False)
    return model

def forward_dinov2(self, x):
    b, c, h, w = x.shape
    if h == 256 and w == 256:
        x = nn.functional.interpolate(x, size=(224, 224), mode='bilinear', align_corners=False)
    return self.model.forward_features(x)[:, 1:].reshape(b, h//16, w//16, -1).permute(0, 3, 1, 2)

@Luciennnnnnn
Copy link
Author

Thanks! I want to use a vae with 8x downsampling, what's your opinion on aligning resolution?

@JingfengYao
Copy link
Member

@Luciennnnnnn DINOv2 should support resolution between 224 to 518. In my case, I would likely begin by feeding a 448-sized image directly into DINOv2. That said, since this configuration remains untested, its efficacy cannot be ascertained at this stage.

@Luciennnnnnn
Copy link
Author

Luciennnnnnn commented Jan 13, 2025

That's sounds reasonable, I see REPA use same strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants