- By Ruizhi Liao, Junhai zhai.
- This repo is the pytorch implementation of [Optimization model based on attention for Few-shot Learning ]
- Make sure Mini-Imagenet is split properly. For example:
- data/ - miniImagenet/ - train/ - n01532829/ - n0153282900000005.jpg - ... - n01558993/ - ... - val/ - n01855672/ - ... - test/ - ... - main.py - ...
- It'd be set if you download and extract Mini-Imagenet from the link above
- Check out
scripts/train_5s_5c.sh
, make sure--data-root
is properly set
For 5-shot, 5-class training, run
bash scripts/train_5s_5c.sh
Hyper-parameters are referred to the author's repo.
For 5-shot, 5-class evaluation, run (remember to change --resume
and --seed
arguments)
bash scripts/eval_5s_5c.sh
- Training with the default settings takes ~2.5 hours on a single Titan Xp while occupying ~2GB GPU memory.
- The implementation replicates two learners similar to the author's repo:
learner_w_grad
functions as a regular model, get gradients and loss as inputs to meta learner.learner_wo_grad
constructs the graph for meta learner:- All the parameters in
learner_wo_grad
are replaced bycI
output by meta learner. nn.Parameters
in this model are casted totorch.Tensor
to connect the graph to meta learner.
- All the parameters in
- Several ways to copy a parameters from meta learner to learner depends on the scenario:
copy_flat_params
: we only need the parameter values and keep the originalgrad_fn
.transfer_params
: we want the values as well as thegrad_fn
(fromcI
tolearner_wo_grad
)..data.copy_
v.s.clone()
-> the latter retains all the properties of a tensor includinggrad_fn
.- To maintain the batch statistics,
load_state_dict
is used (fromlearner_w_grad
tolearner_wo_grad
).
- This code borrows heavily from the meta-learning-lstm framework.