How to use the dataset wrappers #98

Jio0728 · 2023-01-03T05:39:48Z

Hi I installed pytorch-adapt with pip.
But when I tried
from pytorch_adapt.datasets import (
CombinedSourceAndTargetDataset,
SourceDataset,
TargetDataset,
)
"No module named 'pytorch_adapt'" occured.

My python version is 3.9.5.

Thank you.

KevinMusgrave · 2023-01-03T08:40:09Z

Are you sure you're in the same python environment that you installed pytorch-adapt in?

If you're using conda, you can do conda list to see a list of installed packages.

Maybe there were errors when you ran pip install pytorch-adapt ?

Jio0728 · 2023-01-04T00:36:46Z

It was my mistake! Thank you for your reply.

Jio0728 · 2023-01-04T00:48:49Z

May I ask you another question?

I tried to run MCD with my own dataset but I am stuck with an error.
I followed the MCD instruction in the colab-paper implemenation file.

Could you please tell me where I am doing wrong?

I constructed dataset as follows:
source_train_dataset = SourceDataset(source_train_dataset)
source_val_dataset = SourceDataset(source_val_dataset)
target_train_dataset = TargetDataset(target_train_dataset)
target_val_dataset = TargetDataset(target_val_dataset)

train_names = ["source_train_dataset", "target_train_dataset"]
val_names = ["source_val_dataset", "target_val_dataset"]
dc = DataloaderCreator(batch_size=32, num_workers=8, train_names = train_names, val_names = val_names)
dataloaders= dc(source_train_dataset = source_train_dataset, target_train_dataset = target_train_dataset,
source_val_dataset = source_val_dataset, target_val_dataset = target_val_dataset)

Here, getitem() in source_train_dataset and source_val_dataset returns an image and a corresponding label. and getitem() in target_train_dataset and target_val_dataset returns only an image.

And I constructed hook as follows:
device = torch.device("cuda")

generator = timm.create_model("tf_efficientnet_b4", num_classes=0)
clf1 = ResClassifier().to(device)
clf_models = MultipleModels(clf1, c_f.reinit(copy.deepcopy(clf1)).to(device))

G_opt = torch.optim.Adam(generator.parameters())
C_opt_ = torch.optim.Adam(clf_models.parameters())

hook = MCDHook(g_opts=[G_opt], c_opts=[C_opt_])

models = {"G": generator, "C": clf_models}

_, losses = hook({**models, **dataloaders})

When I ran the last line, the following error appeared.
KeyError Traceback (most recent call last)
in
----> 1 _, losses = hook({**models, **dataloaders})

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, inputs, losses)
50 try:
51 inputs = c_f.map_keys(inputs, self.key_map)
---> 52 x = self.call(inputs, losses)
53 if isinstance(x, (bool, np.bool_)):
54 self.logger.reset()

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, *args, **kwargs)
192 def call(self, *args, **kwargs):
193 """"""
--> 194 return self.hook(*args, **kwargs)
195
196 def _loss_keys(self):

/opt/conda/envs/skn_dev/lib/python3.9/site-packages/pytorch_adapt/hooks/base.py in call(self, inputs, losses)
50 try:
51 inputs = c_f.map_keys(inputs, self.key_map)
---> 52 x = self.call(inputs, losses)
53 if isinstance(x, (bool, np.bool_)):
54 self.logger.reset()
...
in FeaturesHook: call
FeaturesHook: Getting src
FeaturesHook: Getting output: ['src_imgs_features']
FeaturesHook: Using model G with inputs: src_imgs
src_imgs

I would be so grateful if you help me. Thank you.

KevinMusgrave · 2023-01-04T02:50:36Z

In your code you are passing **dataloaders to hook, but you actually need to pass in a single item of dataloaders:

from tqdm import tqdm
from pytorch_adapt.utils.common_functions import batch_to_device

for data in tqdm(dataloaders["train"]):
    data = batch_to_device(data, device)
    _, loss = hook({**models, **data})

KevinMusgrave · 2023-01-04T03:15:15Z

Also you need to use CombinedSourceAndTargetDataset for the train dataset.

from pytorch_adapt.datasets import CombinedSourceAndTargetDataset

# I'm assuming the original "unwrapped" datasets are 
# source_train_dataset, source_val_dataset, target_train_dataset, target_val_dataset.
# The derived datasets are src_train, src_val, etc.

src_train = SourceDataset(source_train_dataset)
src_val = SourceDataset(source_val_dataset)
target_train = TargetDataset(target_train_dataset)
target_val = TargetDataset(target_val_dataset)
train_dataset = CombinedSourceAndTargetDataset(src_train, target_train)

dc = DataloaderCreator(batch_size=32, num_workers=8)
dataloaders= dc(train=train_dataset,
    src_train=src_train, 
    target_train=target_train, 
    src_val=src_val, 
    target_val=target_val,
)

This way dataloaders["train"] returns both source and target data at each iteration, and it's randomly sampled.

The other keys are just for validation, and they will return the specific datasets with random sampling turned off. For example, dataloaders["src_train"] is the src_train dataset.

Maybe during validation you just want to compute accuracy on the target validation set. In that case, you don't need to pass in so many datasets:

src_train = SourceDataset(source_train_dataset)
target_train = TargetDataset(target_train_dataset)
train_dataset = CombinedSourceAndTargetDataset(src_train, target_train)

# During validation you typically don't want randomness in your dataset.
# So the input dataset here should use a non-random transform (e.g. no random cropping, or random flips).
# (I've made the variable name indicate that.)
# The supervised=True argument means that target_val will return tuples of (data, label).
# This only works if the original target_val_dataset also returns tuples of (data, label).
target_val = TargetDataset(target_val_dataset_without_random_transform, supervised=True)
dc = DataloaderCreator(batch_size=32, num_workers=8)
dataloaders = dc(train=train_dataset, target_val=target_val)

# During training use "train"
for data in tqdm(dataloaders["train"]):
    data = batch_to_device(data, device)
    _, loss = hook({**models, **data})


# During validation use "target_val"
for data in tqdm(dataloaders["target_val"]):
    # validation code here

Jio0728 · 2023-01-04T06:45:58Z

Could you give me some more days to follow up your code? I want to close this comment after I understand it :D

KevinMusgrave · 2023-01-04T09:57:22Z

Sure, I'll leave this open

Jio0728 · 2023-01-11T07:42:16Z

Hello! Thanks to your help, I was able to understand how I can utilize dataset wrappers.
I have a following question, and it would be so grateful if you help me.

I want to train and evaluate MCD, and I want to plot x_loss, y_loss and z_loss during both training and evaluation.
But when I looked at your example colab ipynb files, it seems like you only recieved accuracy scores, not losses.

I guess that belowing code will help me to evaluate losses during evaluation while not updating model parameters. But I am not sure whether it is right. So, it would be extremely helpful if you let me know if it is right.
'''python
src_val = SourceDataset(source_val_dataset)
target_val = TargetDataset(target_val_dataset)
val_dataset = CombinedSourceAndTargetDataset(src_val , target_val )

dc = DataloaderCreator(batch_size=BATCH_SIZE, num_workers=NUM_WORKERS)
dataloaders= dc(train = train_dataset, src_val = src_val, target_val = target_val, val_dataset=val_dataset)

validation code here

generator.eval()
clf1.eval()
clf2.eval()
for data in tqdm(dataloaders["val_dataset"]):
_, loss = hook({**models, **data})
'''

Thank you for your help!

KevinMusgrave · 2023-01-11T13:18:29Z

Unfortunately there's no easy way to use a hook with optimization turned off. The best you can do right now is create a separate identical hook for eval, using no-op optimizers:

from pytorch_adapt.layers import DoNothingOptimizer

eval_hook = MCDHook(g_opts=[DoNothingOptimizer()], c_opts=[DoNothingOptimizer()])

generator.eval()
clf1.eval()
clf2.eval()
outputs, losses = eval_hook({**models, **data})

Jio0728 closed this as completed Jan 4, 2023

Jio0728 reopened this Jan 4, 2023

KevinMusgrave changed the title ~~from pytorch_adapt error~~ How to use the dataset wrappers Jan 11, 2023

KevinMusgrave mentioned this issue Jan 11, 2023

Make it easy to use a hook with optimization turned off #99

Open

KevinMusgrave closed this as completed Jan 17, 2023

KevinMusgrave added the question Further information is requested label Jan 17, 2023

KevinMusgrave pinned this issue Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the dataset wrappers #98

How to use the dataset wrappers #98

Jio0728 commented Jan 3, 2023

KevinMusgrave commented Jan 3, 2023 •

edited

Loading

Jio0728 commented Jan 4, 2023

Jio0728 commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023 •

edited

Loading

Jio0728 commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023

Jio0728 commented Jan 11, 2023

KevinMusgrave commented Jan 11, 2023 •

edited

Loading

How to use the dataset wrappers #98

How to use the dataset wrappers #98

Comments

Jio0728 commented Jan 3, 2023

KevinMusgrave commented Jan 3, 2023 • edited Loading

Jio0728 commented Jan 4, 2023

Jio0728 commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023 • edited Loading

Jio0728 commented Jan 4, 2023

KevinMusgrave commented Jan 4, 2023

Jio0728 commented Jan 11, 2023

validation code here

KevinMusgrave commented Jan 11, 2023 • edited Loading

KevinMusgrave commented Jan 3, 2023 •

edited

Loading

KevinMusgrave commented Jan 4, 2023 •

edited

Loading

KevinMusgrave commented Jan 11, 2023 •

edited

Loading