Formalization

Conventional Machine Learning

We are given a training dataset $\mathcal D={(x_1,y_1),...,(x_n,y_n)}$, on which we can train a predictive model $\hat{y} = f_\theta(x)$ parameterized by $\theta$ by solving: $$\theta^*=\arg\min_\theta\mathcal L(D;\theta,\omega)$$ where $\mathcal L$ is a loss function that measures the error between true labels and those predicted by $f_\theta(\cdot)$. The conditioning on $\omega$ denotes the dependence of this solution on assumptions about 'how to learn' such as the choice of optimizer for $\theta$. Generalization is then measured by evaluating a number of test points with known labels.

Task-Distributed View

WE can evaluate the performance of $\omega$ over a distribution of tasks $p(\mathcal T)$. Here we loosely define a task to be a dataset and a loss function $\mathcal T={\mathcal D,\mathcal L}$. Learning how to learn thus becomes $$\min_\omega\underset{\mathcal T\sim p(\mathcal T)}{\mathbb E}\mathcal L(\mathcal D;\omega)$$ where $\mathcal L(\mathcal D;\omega)$ measures the performance of a model trained using $\omega$ on dataset $\mathcal D$. 'How to learn', i.e. $\omega$, is often referred to as across-task knowledge.

Meta-Training

We denote the set of $M$ source tasks used in the meta-training stage as $\mathfrak D_{\text{source}}={(\mathcal D_{\text{source}}^{\text{train}}\mathcal D_{\text{source}}^\text{val})^{(i)}}^M_{i=1}$ where each task has both training and validation data. Often, the source train and validation data. The meta-training steo if learning how to learn can be written as: $$\omega^*=\arg\max_\omega\log p(\omega|\mathfrak D_\text{source})$$

Meta-Testing

Now we denote the set of $Q$ target tasks used in the meta-testing stage as $\mathfrak D_{\text{target}}={(\mathcal D_{\text{target}}^{\text{train}}\mathcal D_{\text{target}}^\text{val})^{(i)}}^Q_{i=1}$ where each task has both training and test data. In the meta-testing stage we use the learned meta-knowledge $\omega^$ to train the abse model on each previously unseen target task $i$: $$\theta^{\ (i)}=\arg\max_\theta\log p(\theta|\omega^*,\mathcal D_\text{target}^{\text{train}\ (i)})$$

Meta under-/overfitting

Meta overfitting is an issue whereby the meta knowledge learned on the source tasks does not generalize to the target tasks. It is relatively common, especially in the case where only a small number of source tasks are available. It can be seen as learning an inductive bias $\omega$ that constrains the hypothesis space of $\theta$ too tightly around solutions to the source tasks

Bilevel Optimization View

The task distributed view does not specify how to solve the [[Meta Learning#Meta-Training|meta training]] step. This is commonly done by casting as a [[Bilevel Optimization|bilevel optimization]] problem. $$\begin{align} \omega^* &=\underset{\omega}{\arg\min}\sum^M_{i=1}\mathcal L^\text{meta}(\theta^{\ (i)}(\omega),\omega,\mathcal D^{\text{val}\ (i)}_\text{source})\ \text{s.t.}; \theta^{\ (i)} (\omega) &= \underset{\theta}{\arg\min}\ \mathcal L^\text{task}(\theta,\omega, \mathcal D^{\text{train}\ (i)}_\text{source}) \end{align}$$ where $\mathcal L^\text{meta}$ and $\mathcal L^\text{task}$ refer to the outer and inner objectives respectively.

Feed-Forward Model View

It can be helpful to understand this family of approaches to instantiate the abstract objective of the [[Meta Learning#Task-Distributed View|task distributed view]] as a meta training linear regression toy example. $$\min_\omega\underset{\underset{(\mathcal D^\text{tr},\mathcal D^\text{val})\in \mathcal T}{\mathcal T\sim p(\mathcal T)}}{\mathbb E}\sum_{(x,y)\in D^{val}}\left[(x^Tg_\omega(\mathcal D^\text{tr})-y)^2\right]$$ Here we meta-train by optimizing over a distribution of tasks. For each task a train and validation set is drawn. The train set $\mathcal D^\text{tr}$ is embedded into a vector $g_\omega$ which defines the linear regression weights

![[Pasted image 20230711112510.png]]

Metric-Based Approaches

The core idea of metric based approaches is to compare two samples in a latent(metric) space: In this space, samples of the same class are supposed to be close to each other, while tow samples from different classes are supposed to have a large distance.

Model Based Approaches

Model-based approaches are neural architectures that are deliberately designed for fast adaption to new tasks without an inclination to overfit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta Learning.md

Meta Learning.md

Formalization

Conventional Machine Learning

Task-Distributed View

Meta-Training

Meta-Testing

Meta under-/overfitting

Bilevel Optimization View

Feed-Forward Model View

Metric-Based Approaches

Model Based Approaches

Optimization Based approach

Reference

Files

Meta Learning.md

Latest commit

History

Meta Learning.md

File metadata and controls

Formalization

Conventional Machine Learning

Task-Distributed View

Meta-Training

Meta-Testing

Meta under-/overfitting

Bilevel Optimization View

Feed-Forward Model View

Metric-Based Approaches

Model Based Approaches

Optimization Based approach

Reference