Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find *.pt files for region_4096_pretraining #13

Open
ramprs21 opened this issue Aug 1, 2022 · 5 comments
Open

Unable to find *.pt files for region_4096_pretraining #13

ramprs21 opened this issue Aug 1, 2022 · 5 comments

Comments

@ramprs21
Copy link

ramprs21 commented Aug 1, 2022

Am I right in expecting the patch level feature *.pt files (433779 files each containing 256 x 384 tensor) used for pretraining the second stage of HIPT to be present in the HIPT/3-Self-Supervised-Eval/embeddings_patch_lib/ directory?

Currently, I only see the following pickle files in that directory.

25M     bcss_train_resnet50_trunc.pkl
9.3M    bcss_train_vits_tcga_brca_dino.pkl
4.5M    bcss_val_resnet50_tcga_brca_simclr.pkl
2.3M    bcss_val_resnet50_trunc.pkl
868K    bcss_val_vits_tcga_brca_dino.pkl
19M     breastpathq_train_resnet50_tcga_brca_simclr.pkl
9.4M    breastpathq_train_resnet50_trunc.pkl
3.6M    breastpathq_train_vits_tcga_brca_dino.pkl
1.5M    breastpathq_val_resnet50_tcga_brca_simclr.pkl
744K    breastpathq_val_resnet50_trunc.pkl
280K    breastpathq_val_vits_tcga_brca_dino.pkl
783M    crc100knonorm_train_resnet50_tcga_brca_simclr.pkl
393M    crc100knonorm_train_resnet50_trunc.pkl
149M    crc100knonorm_train_vits_tcga_brca_dino.pkl
57M     crc100knonorm_val_resnet50_tcga_brca_simclr.pkl
29M     crc100knonorm_val_resnet50_trunc.pkl
11M     crc100knonorm_val_vits_tcga_brca_dino.pkl
783M    crc100k_train_resnet50_tcga_brca_simclr.pkl
393M    crc100k_train_resnet50_trunc.pkl
149M    crc100k_train_vits_tcga_brca_dino.pkl
57M     crc100k_val_resnet50_tcga_brca_simclr.pkl
29M     crc100k_val_resnet50_trunc.pkl
11M     crc100k_val_vits_tcga_brca_dino.pkl

Thanks in advance.

@Richarizardd
Copy link
Collaborator

@ramprs21
Copy link
Author

ramprs21 commented Aug 1, 2022

Hi @Richarizardd, the *.pt files in 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings seem to not be the right dimensions (192 instead of 384), so I believe they are computed using outputs of 2nd stage. See below,

Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> data = torch.load('3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings/TCGA-BA-6869-01Z-00-DX1.6e58648e-3309-47bb-b2c7-b71bcd9dc69b.pt')
>>> data.shape
torch.Size([52, 192])

Whereas I am looking for inputs to the 2nd stage pre-training which I believe are a list of *.pt files each containing tensor of dimension (256x384).

@Richarizardd
Copy link
Collaborator

Hi @ramprs21 - apologies for the confusion. The previous link refers to the already pre-extracted "region-level" feature embeddings for each slide in TCGA. Regarding the *.pt files for hierarchical pretraining, it is logistically difficult at the moment to make available all [M x 256 x 384] "patch-level" feature embeddings, where M is the number of regions. Looking into ways to make this more available!

@ramprs21
Copy link
Author

ramprs21 commented Aug 4, 2022

Thank you @Richarizardd. Could you please update here whenever you make the 1st stage features available? Thank you :)

@bryanwong17
Copy link

Hi @ramprs21 @Richarizardd , For the hierarchical pretraining (2nd stage), will the training time be much faster than the 1st stage since one region can now be converted into [256,384] which, when trained, will be reshaped again into [1,384,16,16]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants