DARPA_L2M_PI_2022.html

<!DOCTYPE html>
<html>


<head>
  <title>Learning</title>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <link rel="stylesheet" href="fonts/quadon/quadon.css">
  <link rel="stylesheet" href="fonts/gentona/gentona.css">
  <!-- <link rel="stylesheet" href="slides_style_v2.css"> -->
  <link rel="stylesheet" href="slides_style_i.css">
  <script type="text/javascript" src="assets/plotly/plotly-latest.min.js"></script>

</head>

<body>
  <textarea id="source">


# Lifelong Learning:<br>What is it anyway?
<br>

PI: Joshua T. Vogelstein, [JHU](https://www.jhu.edu/) <br>
Co-PI: Vova Braverman, [JHU](https://www.jhu.edu/) <br>

Jayanta Dey, Ali Geisa, Will LeVine, Hayden Helm, Ronak Mehta, Haoran Li, Aditya Krishnan, Rahul Ramesh, Pratik Chaudhari, Ashwin De Silva,   Jingfeng Wu,
Carey E. Priebe
<!-- | Joshua T. Vogelstein <br> -->
<!-- [Microsoft Research](https://www.microsoft.com/en-us/research/): Weiwei Yang | Jonathan Larson | Bryan Tower | Chris White -->


<img src="images/neurodata_blue.png" width="20%" style="vertical-align: top">
<img src="images/jhu.png" width="8%" style="vertical-align: top">

---

<img src="images/L2M_hava.png" width="1000">


---

## Formalizing Learning

- Assume  $(X\_i,Y\_i) \sim^{iid} P, \quad  i \in [n]$ 
- Let $f((X_i,Y_i)^n) \rightarrow h_n$  
- Let R (risk) be expected loss
- Let $\delta, \epsilon > 0$
- a learner  .ye[weakly] learns if $\, \exists N$ s.t. $\forall n > N$

$$Pr[ R(h\_n) < R\_{chance} ] \geq 1 - \delta$$

- a learner .ye[strongly] learns if $\, \exists N$ s.t. $\forall n > N$

$$Pr[ R(h\_n) - R^* <  \epsilon ] \geq 1 - \delta$$

- strong learner thm: if $f$ weakly learns, it also strongly learns
- implication: if your AI is doing ok, just get more data!

---

## So learning is solved?

- No. 
- In lifelong learning, the samples are not all from the same distribution. 
- Consider the simplest possible generalization:
    - we have 2 datasets from two different distributions
- We want to know whether we learn about one from the other
- This is called '.ye[out of distribution]' learning 
- Lifelong learning is a special (sequential) case

---

## Formalizing OOD Learning 

- Assume  $(X\_i,Y\_i) \sim^{iid} P, \quad  i \in [n]$ 
- Assume  $(X\_i,Y\_i) \sim^{iid} Q, \quad  i \in ${$n+1, \ldots, n+m$} 
- Let $f((X\_i,Y\_i)^{n+m}) \rightarrow h\_{n,m}$  
- a learner  .ye[weakly] learns about $P$ from $Q$  if  $\, \exists M$ s.t. $\forall m > M$

$$Pr[ R(h\_{n,m}) < R(h\_n) ] \geq 1 - \delta$$

- a learner .ye[strongly] learns about $P$ from $Q$ if $\, \exists M$ s.t. $\forall m > M$

$$Pr[ R(h\_{n,m}) - R^* <  \epsilon ] \geq 1 - \delta$$

- strong OOD learner thm: if $f$ weakly learns, it does not necessarily strongly learn
- implications: more data does not necessarily help!

.footnote[.small[Geisa, et al., Towards a theory of out-of-distribution learning. 2021]]

---

### A generalized definition of learning 

- Let $S = (X\_i,Y\_i) \sim^{iid} P, \, \forall i \in [n]$
- Let $S'  = (X\_i,Y\_i) \sim^{iid} Q, \, \forall i \in ${$n+1,\ldots, n+m$} 
- Let $\mathcal{E}^t_f(S)$ := expected risk for task $t$ of learner $f$ using data $S$ 
- Let .ye[Learning Efficiency] be defined:

$$LE^t_f(S,S') = \frac{ \mathcal{E}^t_f(S)}{\mathcal{E}^t_f(S')}$$


- $f$ .ye[learns] about $P$ from $Q$ whenever $\log LE^t_f(S,S') >0$
- $f$ .ye[forgets] about $P$ from $Q$ whenever $\log LE^t_f(S,S') < 0$
- .ye[catastrophic forgetting]: $LE \ll 0$
- forward transfer,  backward transfer, performance relative to STE, etc. are special cases

.footnote[.small[Vogelstein, et al. arxiv, 2020]]

---

### Fundamental Question of OOD Learning 

- Assume  $S = (X\_i,Y\_i) \sim^{iid} P, \quad  i \in [n]$ 
- Assume  $T = (X\_i,Y\_i) \sim^{iid} Q, \quad  i \in ${$n+1, \ldots, n+m$} 
- Assume $S \perp T$ 
 <!-- the $n$ samples from $P$ and the $m$ samples from $Q$ are independent of each other  -->
- What are the conditions on $P, Q, n, m$ such that $f$ can learn about $P$ from $Q$ (i.e., achieve $\log LE > 0$)?


We hope that for a given $P, Q, n$, there exists an $M$ such that if $m > M$, then data from $Q$ makes $f$ learn about (rather than forget about) $P$

---

## The simplest example 

- $P$ is two gaussians
- $Q$ is two gaussians shifted by $\Delta$

<img src="https://github.com/neurodata/ood-tl/blob/main/reports/figures/gausstask_fig.png?raw=true" width="700">

<!-- ![:scale 100%](https://github.com/neurodata/ood-tl/blob/main/reports/figures/gausstask_fig.png?raw=true) -->

---

## A terrible thing 

<img src="https://github.com/neurodata/ood-tl/blob/main/reports/figures/gaussian_task_analytical_plot.svg?raw=true" width="1000">

- Sometimes, for a fixed $\Delta$, error is non-monotonic wrt $m$
- implications: just because a little OOD data helps, does not mean that more will help!


.footnote[.small[de Silva, et al. Non-monotonic Out-of-Distribution Risk, in prep]]

---

## Not just in simulated data 

- CIFAR 10
- Task 1: Bird vs. Cat 
- Task 2: Deer vs. Dog

<img src="https://github.com/neurodata/ood-tl/blob/main/reports/figures/bridcat_deerdog.svg?raw=true" width="1000">


---

## Summary so far 

- OOD learning is different in kind from in-distribution learning 
- More data is not necessarily adequate to get arbitrary performance
- More data can actually hurt 
- Time to think carefully about how much data of different kinds to use, and how to combine datasets, rather than throwing everything in a bucket


---

### 

<img src="images/S1.png" width="1200">
---
### 

<img src="images/S2.png" width="1200">

---
### 

<img src="images/S3.png" width="1200">


---

.small[
### Publications


1. A. Geisa et al. [Towards a theory of out-of-distribution learning](https://arxiv.org/abs/2109.14501), arXiv, 2021.
1. J. T. Vogelstein et al. [Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity](https://arxiv.org/abs/2004.12908), arXiv, 2022.
1. H. Xu et al. [Simplest Streaming Trees](https://arxiv.org/abs/2110.08483), arXiv, 2022.
1. J. Dey et al. [Out-of-distribution and in-distribution posterior calibration using Kernel Density Polytopes](https://arxiv.org/abs/2201.13001), arXiv, 2022.
1. C. E. Priebe et al. [Modern Machine Learning: Partition and Vote](https://doi.org/10.1101/2020.04.29.068460), 2020.
1. R. Guo, et al. [Estimating Information-Theoretic Quantities with Uncertainty Forests](https://arxiv.org/abs/1907.00325). arXiv, 2019.
1. R. Perry, et al. [Manifold Forests: Closing the Gap on Neural Networks](https://openreview.net/forum?id=B1xewR4KvH). arXiv, 2019.
1. C. Shen and J. T. Vogelstein. [Decision Forests Induce Characteristic Kernels](https://arxiv.org/abs/1812.00029). arXiv, 2019.
1. M. Madhya, et al. [Geodesic Learning via Unsupervised Decision Forests](https://arxiv.org/abs/1907.02844). arXiv, 2019.
1. M. Madhya, et al. [PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment](https://arxiv.org/abs/2011.05383). arXiv, 2020.
<!-- ### Conferences -->
1. J. T. Vogelstein et al. A biological implementation of lifelong learning in the pursuit of artificial general intelligence.  NAISys, 2020.
2. B. Pedigo et al.  A quantitative comparison of a complete connectome to artificial intelligence architectures. NAISys, 2020.
3. L. Haoran, et al. [Lifelong Learning with Sketched Structural Regularization](https://arxiv.org/abs/2104.08604). ACML, 2021.

]


---
#### Acknowledgements


<!-- <div class="small-container">
  <img src="faces/ebridge.jpg"/>
  <div class="centered">Eric Bridgeford</div>
</div>

<div class="small-container">
  <img src="faces/pedigo.jpg"/>
  <div class="centered">Ben Pedigo</div>
</div>

<div class="small-container">
  <img src="faces/jaewon.jpg"/>
  <div class="centered">Jaewon Chung</div>
</div> -->


<div class="small-container">
  <img src="faces/yummy.jpg"/>
  <div class="centered">yummy</div>
</div>

<div class="small-container">
  <img src="faces/lion.jpg"/>
  <div class="centered">lion</div>
</div>

<div class="small-container">
  <img src="faces/violet.jpg"/>
  <div class="centered">baby girl</div>
</div>

<div class="small-container">
  <img src="faces/family.jpg"/>
  <div class="centered">family</div>
</div>

<div class="small-container">
  <img src="faces/earth.jpg"/>
  <div class="centered">earth</div>
</div>


<div class="small-container">
  <img src="faces/milkyway.jpg"/>
  <div class="centered">milkyway</div>
</div>


##### JHU

<div class="small-container">
  <img src="faces/cep.png"/>
  <div class="centered">Carey Priebe</div>
</div>

<div class="small-container">
  <img src="faces/vova_lab/Vova.JPG"/>
  <div class="centered">Vova Braverman</div>
</div>

<!-- <div class="small-container">
  <img src="faces/randal.jpg"/>
  <div class="centered">Randal Burns</div>
</div> -->


<!-- <div class="small-container">
  <img src="faces/cshen.jpg"/>
  <div class="centered">Cencheng Shen</div>
</div> -->


<!-- <div class="small-container">
  <img src="faces/bruce_rosen.jpg"/>
  <div class="centered">Bruce Rosen</div>
</div>


<div class="small-container">
  <img src="faces/kent.jpg"/>
  <div class="centered">Kent Kiehl</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/mim.jpg"/>
  <div class="centered">Michael Miller</div>
</div>

<div class="small-container">
  <img src="faces/dtward.jpg"/>
  <div class="centered">Daniel Tward</div>
</div> -->


<!-- <div class="small-container">
  <img src="faces/vikram.jpg"/>
  <div class="centered">Vikram Chandrashekhar</div>
</div>


<div class="small-container">
  <img src="faces/drishti.jpg"/>
  <div class="centered">Drishti Mannan</div>
</div> -->

<div class="small-container">
  <img src="faces/jesse.jpg"/>
  <div class="centered">Jesse Patsolic</div>
</div>

<!-- <div class="small-container">
  <img src="faces/falk_ben.jpg"/>
  <div class="centered">Benjamin Falk</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/kwame.jpg"/>
  <div class="centered">Kwame Kutten</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/perlman.jpg"/>
  <div class="centered">Eric Perlman</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/loftus.jpg"/>
  <div class="centered">Alex Loftus</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/bcaffo.jpg"/>
  <div class="centered">Brian Caffo</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/minh.jpg"/>
  <div class="centered">Minh Tang</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/avanti.jpg"/>
  <div class="centered">Avanti Athreya</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/vince.jpg"/>
  <div class="centered">Vince Lyzinski</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/dpmcsuss.jpg"/>
  <div class="centered">Daniel Sussman</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/youngser.jpg"/>
  <div class="centered">Youngser Park</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/shangsi.jpg"/>
  <div class="centered">Shangsi Wang</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/tyler.jpg"/>
  <div class="centered">Tyler Tomita</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/james.jpg"/>
  <div class="centered">James Brown</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/disa.jpg"/>
  <div class="centered">Disa Mhembere</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/gkiar.jpg"/>
  <div class="centered">Greg Kiar</div>
</div> -->


<!-- <div class="small-container">
  <img src="faces/jeremias.png"/>
  <div class="centered">Jeremias Sulam</div>
</div> -->


<div class="small-container">
  <img src="faces/meghana.png"/>
  <div class="centered">Meghana Madhya</div>
</div>


<!-- <div class="small-container">
  <img src="faces/percy.png"/>
  <div class="centered">Percy Li</div>
</div>
-->

<div class="small-container">
  <img src="faces/hayden.png"/>
  <div class="centered">Hayden Helm</div>
</div>


<div class="small-container">
  <img src="faces/rguo.jpg"/>
  <div class="centered">Richard Gou</div>
</div>

<div class="small-container">
  <img src="faces/ronak.jpg"/>
  <div class="centered">Ronak Mehta</div>
</div>

<div class="small-container">
  <img src="faces/jayanta.jpg"/>
  <div class="centered">Jayanta Dey</div>
</div>

<div class="small-container">
  <img src="faces/will.jpg"/>
  <div class="centered">Will LeVine</div>
</div>

<div class="small-container">
  <img src="faces/hao.jpg"/>
  <div class="centered">Haoyin Xu</div>
</div>

<div class="small-container">
  <img src="faces/nhahn.jpg"/>
  <div class="centered">Nick Hahn</div>
</div>

<div class="small-container">
  <img src="faces/vova_lab/Haoran.JPG"/>
  <div class="centered">Haoran Li</div>
</div>

<div class="small-container">
  <img src="faces/vova_lab/Aditya.jpeg"/>
  <div class="centered">Aditya Krishnan</div>
</div>

<div class="small-container">
  <img src="faces/vova_lab/Jingfeng.jpg"/>
  <div class="centered">Jingfeng Wu</div>
</div>

##### HRL & Microsoft Research
<div class="small-container">
  <img src="faces/vova_lab/Soheil.PNG"/>
  <div class="centered">Soheil Kolouri</div>
</div>

<div class="small-container">
  <img src="faces/vova_lab/Praveen.JPG"/>
  <div class="centered">Praveen K. Pilly</div>
</div>

<!-- ##### Microsoft Research -->

<div class="small-container">
  <img src="faces/chwh-180x180.jpg"/>
  <div class="centered">Chris White</div>
</div>


<div class="small-container">
  <img src="faces/weiwei.jpg"/>
  <div class="centered">Weiwei Yang</div>
</div>

<div class="small-container">
  <img src="faces/jolarso150px.png"/>
  <div class="centered">Jonathan Larson</div>
</div>

<div class="small-container">
  <img src="faces/brtower-180x180.jpg"/>
  <div class="centered">Bryan Tower</div>
</div>


##### DARPA L2M
<!-- Hava, Ben, Robert, Jennifer, Ted. -->

<!-- {[BME](https://www.bme.jhu.edu/),[CIS](http://cis.jhu.edu/), [ICM](https://icm.jhu.edu/), [KNDI](http://kavlijhu.org/)}@[JHU](https://www.jhu.edu/) | [neurodata](https://neurodata.io) -->
<!-- <br> -->
<!-- [jovo&#0064;jhu.edu](mailto:j1c@jhu.edu) | <http://neurodata.io/talks> | [@neuro_data](https://twitter.com/neuro_data) -->


</div>
<!-- <img src="images/funding/nsf_fpo.png" STYLE="HEIGHT:95px;"/> -->
<!-- <img src="images/funding/nih_fpo.png" STYLE="HEIGHT:95px;"/> -->
<!-- <img src="images/funding/darpa_fpo.png" STYLE=" HEIGHT:95px;"/> -->
<!-- <img src="images/funding/iarpa_fpo.jpg" STYLE="HEIGHT:95px;"/> -->
<!-- <img src="images/funding/KAVLI.jpg" STYLE="HEIGHT:95px;"/> -->
<!-- <img src="images/funding/schmidt.jpg" STYLE="HEIGHT:95px;"/> -->

---
background-image: url(images/l_and_v.jpeg)

.footnote[Questions?]

---
class: middle
# .center[Appendix]


---
### Biological learning is on top

<img src="images/learning-table.png" width="1000">

---
### Spoken Digit dataset
.pull-left[
- *Spoken Digit* contains recording from 6 different speakers.
- Each digit has 50 recordings (3000 total recordings).
- For each recording spectrogram was extracted using using Hanning windows of duration 16 ms with an overlap of 4 ms.
-  The spectrograms were resized down to 28×28.
]
.pull-right[
<img src="images/spectrogram.png" style="left:500px; width:400px;"/>
]
---
### Synergistic Algorithms on Spoken Digit Task
<img src="images/spoken_digit.png" width="1000">
<!-- ![:scale 105%]() -->

</textarea>
  <!-- <script src="https://gnab.github.io/remark/downloads/remark-latest.min.js"></script> -->
  <!-- <script src="remark-latest.min.js"></script> -->
  <script src="remark-latest.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.5.1/katex.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.5.1/contrib/auto-render.min.js"></script>
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.5.1/katex.min.css">
  <script type="text/javascript">

    var options = {};
    var renderMath = function () {
      renderMathInElement(document.body);
      // or if you want to use $...$ for math,
      renderMathInElement(document.body, {
        delimiters: [ // mind the order of delimiters(!?)
          { left: "$$", right: "$$", display: true },
          { left: "$", right: "$", display: false },
          { left: "\\[", right: "\\]", display: true },
          { left: "\\(", right: "\\)", display: false },
        ]
      });
    }

    // remark.macros.scale = function (percentage) {
    //   var url = this;
    //   return '<img src="' + url + '" style="width: ' + percentage + '" />';
    // };

    // var slideshow = remark.create({
    // Set the slideshow display ratio
    // Default: '4:3'
    // Alternatives: '16:9', ...
    // {
    // ratio: '16:9',
    // });

    var slideshow = remark.create(options, renderMath);


  </script>
</body>

</html>