Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use the dataset wrappers #98

Closed
Jio0728 opened this issue Jan 3, 2023 · 9 comments
Closed

How to use the dataset wrappers #98

Jio0728 opened this issue Jan 3, 2023 · 9 comments
Labels
question Further information is requested

Comments

@Jio0728
Copy link

Jio0728 commented Jan 3, 2023

Hi I installed pytorch-adapt with pip.
But when I tried
from pytorch_adapt.datasets import (
CombinedSourceAndTargetDataset,
SourceDataset,
TargetDataset,
)
"No module named 'pytorch_adapt'" occured.

My python version is 3.9.5.

Thank you.

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Jan 3, 2023

Are you sure you're in the same python environment that you installed pytorch-adapt in?

If you're using conda, you can do conda list to see a list of installed packages.

Maybe there were errors when you ran pip install pytorch-adapt ?

@Jio0728
Copy link
Author

Jio0728 commented Jan 4, 2023

It was my mistake! Thank you for your reply.

@Jio0728
Copy link
Author

Jio0728 commented Jan 4, 2023

May I ask you another question?

I tried to run MCD with my own dataset but I am stuck with an error.
I followed the MCD instruction in the colab-paper implemenation file.

Could you please tell me where I am doing wrong?

I constructed dataset as follows:
source_train_dataset = SourceDataset(source_train_dataset)
source_val_dataset = SourceDataset(source_val_dataset)
target_train_dataset = TargetDataset(target_train_dataset)
target_val_dataset = TargetDataset(target_val_dataset)

train_names = ["source_train_dataset", "target_train_dataset"]
val_names = ["source_val_dataset", "target_val_dataset"]
dc = DataloaderCreator(batch_size=32, num_workers=8, train_names = train_names, val_names = val_names)
dataloaders= dc(source_train_dataset = source_train_dataset, target_train_dataset = target_train_dataset,
source_val_dataset = source_val_dataset, target_val_dataset = target_val_dataset)

Here, getitem() in source_train_dataset and source_val_dataset returns an image and a corresponding label. and getitem() in target_train_dataset and target_val_dataset returns only an image.

And I constructed hook as follows:
device = torch.device("cuda")

generator = timm.create_model("tf_efficientnet_b4", num_classes=0)
clf1 = ResClassifier().to(device)
clf_models = MultipleModels(clf1, c_f.reinit(copy.deepcopy(clf1)).to(device))

G_opt = torch.optim.Adam(generator.parameters())
C_opt_ = torch.optim.Adam(clf_models.parameters())

hook = MCDHook(g_opts=[G_opt], c_opts=[C_opt_])

models = {"G": generator, "C": clf_models}

_, losses = hook({**models, **dataloaders})

When I ran the last line, the following error appeared.
KeyError Traceback (most recent call last)
in
----> 1 _, losses = hook({**models, **dataloaders})

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, inputs, losses)
50 try:
51 inputs = c_f.map_keys(inputs, self.key_map)
---> 52 x = self.call(inputs, losses)
53 if isinstance(x, (bool, np.bool_)):
54 self.logger.reset()

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, *args, **kwargs)
192 def call(self, *args, **kwargs):
193 """"""
--> 194 return self.hook(*args, **kwargs)
195
196 def _loss_keys(self):

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, inputs, losses)
50 try:
51 inputs = c_f.map_keys(inputs, self.key_map)
---> 52 x = self.call(inputs, losses)
53 if isinstance(x, (bool, np.bool_)):
54 self.logger.reset()
...
in FeaturesHook: call
FeaturesHook: Getting src
FeaturesHook: Getting output: ['src_imgs_features']
FeaturesHook: Using model G with inputs: src_imgs
src_imgs

I would be so grateful if you help me. Thank you.

@Jio0728 Jio0728 closed this as completed Jan 4, 2023
@Jio0728 Jio0728 reopened this Jan 4, 2023
@KevinMusgrave
Copy link
Owner

In your code you are passing **dataloaders to hook, but you actually need to pass in a single item of dataloaders:

from tqdm import tqdm
from pytorch_adapt.utils.common_functions import batch_to_device

for data in tqdm(dataloaders["train"]):
    data = batch_to_device(data, device)
    _, loss = hook({**models, **data})

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Jan 4, 2023

Also you need to use CombinedSourceAndTargetDataset for the train dataset.

from pytorch_adapt.datasets import CombinedSourceAndTargetDataset

# I'm assuming the original "unwrapped" datasets are 
# source_train_dataset, source_val_dataset, target_train_dataset, target_val_dataset.
# The derived datasets are src_train, src_val, etc.

src_train = SourceDataset(source_train_dataset)
src_val = SourceDataset(source_val_dataset)
target_train = TargetDataset(target_train_dataset)
target_val = TargetDataset(target_val_dataset)
train_dataset = CombinedSourceAndTargetDataset(src_train, target_train)

dc = DataloaderCreator(batch_size=32, num_workers=8)
dataloaders= dc(train=train_dataset,
    src_train=src_train, 
    target_train=target_train, 
    src_val=src_val, 
    target_val=target_val,
)

This way dataloaders["train"] returns both source and target data at each iteration, and it's randomly sampled.

The other keys are just for validation, and they will return the specific datasets with random sampling turned off. For example, dataloaders["src_train"] is the src_train dataset.

Maybe during validation you just want to compute accuracy on the target validation set. In that case, you don't need to pass in so many datasets:

src_train = SourceDataset(source_train_dataset)
target_train = TargetDataset(target_train_dataset)
train_dataset = CombinedSourceAndTargetDataset(src_train, target_train)

# During validation you typically don't want randomness in your dataset.
# So the input dataset here should use a non-random transform (e.g. no random cropping, or random flips).
# (I've made the variable name indicate that.)
# The supervised=True argument means that target_val will return tuples of (data, label).
# This only works if the original target_val_dataset also returns tuples of (data, label).
target_val = TargetDataset(target_val_dataset_without_random_transform, supervised=True)
dc = DataloaderCreator(batch_size=32, num_workers=8)
dataloaders = dc(train=train_dataset, target_val=target_val)

# During training use "train"
for data in tqdm(dataloaders["train"]):
    data = batch_to_device(data, device)
    _, loss = hook({**models, **data})


# During validation use "target_val"
for data in tqdm(dataloaders["target_val"]):
    # validation code here

@Jio0728
Copy link
Author

Jio0728 commented Jan 4, 2023

Could you give me some more days to follow up your code? I want to close this comment after I understand it :D

@KevinMusgrave
Copy link
Owner

Sure, I'll leave this open

@KevinMusgrave KevinMusgrave changed the title from pytorch_adapt error How to use the dataset wrappers Jan 11, 2023
@Jio0728
Copy link
Author

Jio0728 commented Jan 11, 2023

Hello! Thanks to your help, I was able to understand how I can utilize dataset wrappers.
I have a following question, and it would be so grateful if you help me.

I want to train and evaluate MCD, and I want to plot x_loss, y_loss and z_loss during both training and evaluation.
But when I looked at your example colab ipynb files, it seems like you only recieved accuracy scores, not losses.

I guess that belowing code will help me to evaluate losses during evaluation while not updating model parameters. But I am not sure whether it is right. So, it would be extremely helpful if you let me know if it is right.
'''python
src_val = SourceDataset(source_val_dataset)
target_val = TargetDataset(target_val_dataset)
val_dataset = CombinedSourceAndTargetDataset(src_val , target_val )

dc = DataloaderCreator(batch_size=BATCH_SIZE, num_workers=NUM_WORKERS)
dataloaders= dc(train = train_dataset, src_val = src_val, target_val = target_val, val_dataset=val_dataset)

validation code here

generator.eval()
clf1.eval()
clf2.eval()
for data in tqdm(dataloaders["val_dataset"]):
_, loss = hook({**models, **data})
'''

Thank you for your help!

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Jan 11, 2023

Unfortunately there's no easy way to use a hook with optimization turned off. The best you can do right now is create a separate identical hook for eval, using no-op optimizers:

from pytorch_adapt.layers import DoNothingOptimizer

eval_hook = MCDHook(g_opts=[DoNothingOptimizer()], c_opts=[DoNothingOptimizer()])

generator.eval()
clf1.eval()
clf2.eval()
outputs, losses = eval_hook({**models, **data})

@KevinMusgrave KevinMusgrave added the question Further information is requested label Jan 17, 2023
@KevinMusgrave KevinMusgrave pinned this issue Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants