Naive CL Strategy on Permuted MNIST #412

en-tropyc · 2021-03-14T18:32:10Z

en-tropyc
Mar 14, 2021

Hello! I am applying the naive CL strategy to the Permuted MNIST scenario and it seems the naive method doesn't demonstrate catastrophic forgetting. After training each new experience up to 3 experiences, the evaluation accuracies are surprisingly high for all previous experiences (~90% and above with only 1 training epoch). Are these results to be expected using the Permuted MNIST dataset?

Below is my code and output. Appreciate any and all feedback :)

import torchvision
from torch.nn import CrossEntropyLoss
from torch.optim import SGD

from avalanche.benchmarks.classic import PermutedMNIST
from avalanche.models import SimpleMLP
from avalanche.training.strategies import Naive

import numpy as np
import matplotlib.pyplot as plt

# Config
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# model
model = SimpleMLP(num_classes=10)

# CL Benchmark Creation
perm_mnist = PermutedMNIST(n_experiences=3)
train_stream = perm_mnist.train_stream
test_stream = perm_mnist.test_stream

# Prepare for training & testing
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

# Continual learning strategy
cl_strategy = Naive(model, optimizer, criterion, train_mb_size=4, train_epochs=1, eval_mb_size=4, device=device)

# Train and test loop
results = []
for train_task in train_stream:
    cl_strategy.train(train_task, num_workers=16)
    results.append(cl_strategy.eval(test_stream))

-- >> Start of training phase << --
-- Starting training on experience 0 (Task 0) from train stream --
100%|██████████| 15000/15000 [00:29<00:00, 503.56it/s]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.2635
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9194
-- >> End of training phase << --
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|██████████| 2500/2500 [00:04<00:00, 624.58it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 0.1234
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp000 = 0.9620
-- Starting eval on experience 1 (Task 1) from test stream --
100%|██████████| 2500/2500 [00:04<00:00, 611.49it/s]
> Eval on experience 1 (Task 1) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task001/Exp001 = 3.3497
	Top1_Acc_Exp/eval_phase/test_stream/Task001/Exp001 = 0.1301
-- Starting eval on experience 2 (Task 2) from test stream --
100%|██████████| 2500/2500 [00:03<00:00, 635.19it/s]
> Eval on experience 2 (Task 2) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task002/Exp002 = 3.5586
	Top1_Acc_Exp/eval_phase/test_stream/Task002/Exp002 = 0.0743
-- >> End of eval phase << --
	Loss_Stream/eval_phase/test_stream = 2.3439
	Top1_Acc_Stream/eval_phase/test_stream = 0.3888
-- >> Start of training phase << --
-- Starting training on experience 1 (Task 1) from train stream --
100%|██████████| 15000/15000 [00:29<00:00, 508.36it/s]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task001 = 0.2493
	Top1_Acc_Epoch/train_phase/train_stream/Task001 = 0.9239
-- >> End of training phase << --
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|██████████| 2500/2500 [00:03<00:00, 642.18it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 0.1591
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp000 = 0.9505
-- Starting eval on experience 1 (Task 1) from test stream --
100%|██████████| 2500/2500 [00:04<00:00, 623.29it/s]
> Eval on experience 1 (Task 1) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task001/Exp001 = 0.1024
	Top1_Acc_Exp/eval_phase/test_stream/Task001/Exp001 = 0.9665
-- Starting eval on experience 2 (Task 2) from test stream --
100%|██████████| 2500/2500 [00:03<00:00, 646.72it/s]
> Eval on experience 2 (Task 2) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task002/Exp002 = 3.8632
	Top1_Acc_Exp/eval_phase/test_stream/Task002/Exp002 = 0.0744
-- >> End of eval phase << --
	Loss_Stream/eval_phase/test_stream = 1.3749
	Top1_Acc_Stream/eval_phase/test_stream = 0.6638
-- >> Start of training phase << --
-- Starting training on experience 2 (Task 2) from train stream --
100%|██████████| 15000/15000 [00:30<00:00, 499.10it/s]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task002 = 0.2539
	Top1_Acc_Epoch/train_phase/train_stream/Task002 = 0.9233
-- >> End of training phase << --
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|██████████| 2500/2500 [00:04<00:00, 567.53it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 0.3364
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp000 = 0.8925
-- Starting eval on experience 1 (Task 1) from test stream --
100%|██████████| 2500/2500 [00:04<00:00, 624.32it/s]
> Eval on experience 1 (Task 1) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task001/Exp001 = 0.1383
	Top1_Acc_Exp/eval_phase/test_stream/Task001/Exp001 = 0.9572
-- Starting eval on experience 2 (Task 2) from test stream --
100%|██████████| 2500/2500 [00:03<00:00, 655.88it/s]
> Eval on experience 2 (Task 2) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task002/Exp002 = 0.1108
	Top1_Acc_Exp/eval_phase/test_stream/Task002/Exp002 = 0.9659
-- >> End of eval phase << --
	Loss_Stream/eval_phase/test_stream = 0.1952
	Top1_Acc_Stream/eval_phase/test_stream = 0.9385

Answered by AndreaCossu

Mar 15, 2021

Hi @christam96 ! Thank you for reaching out. Yes, I think your results are correct. PMNIST is a setting which shows relatively low forgetting. Moreover, with only one epoch the model is able to retain most of the original performance. Forgetting should increase if you increase the number of training epochs (e.g. from 1 to 5-10), since you focus more and more on the current data, at the expenses of previous experiences. You can also try to use more experiences, since in the literature PMNIST is usually used with 10 or more experiences. You should see a larger forgetting on the first experience as you add more experiences.

View full answer

AndreaCossu · 2021-03-15T09:58:11Z

AndreaCossu
Mar 15, 2021
Maintainer

Hi @christam96 ! Thank you for reaching out. Yes, I think your results are correct. PMNIST is a setting which shows relatively low forgetting. Moreover, with only one epoch the model is able to retain most of the original performance. Forgetting should increase if you increase the number of training epochs (e.g. from 1 to 5-10), since you focus more and more on the current data, at the expenses of previous experiences. You can also try to use more experiences, since in the literature PMNIST is usually used with 10 or more experiences. You should see a larger forgetting on the first experience as you add more experiences.

1 reply

en-tropyc Mar 16, 2021
Author

You are right! Thanks for the helpful explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naive CL Strategy on Permuted MNIST #412

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Naive CL Strategy on Permuted MNIST #412

en-tropyc Mar 14, 2021

Replies: 1 comment · 1 reply

AndreaCossu Mar 15, 2021 Maintainer

en-tropyc Mar 16, 2021 Author

en-tropyc
Mar 14, 2021

Replies: 1 comment 1 reply

AndreaCossu
Mar 15, 2021
Maintainer

en-tropyc Mar 16, 2021
Author