Add experiments notebooks (#193) · understanding-search/maze-transformer@2b6ee4a

Commit

Add experiments notebooks (#193)

adding in some experiments notebooks and required tooling. Will need another PR for next set of experiments.

# experiments notebooks


## `direct_logit_attribution.ipynb`

- multiple well defined tasks in `LOGIT_ATTRIB_TASKS`
- viewing distribution of logits on correct/incorrect tokens via `plot_logits()`
- comparison of various ways of computing logit differences via `logits_diff_multi()` (this might be broken? idk)
- logit lens, showing logit diff & attrib at each layer `plot_logit_lens()`
- direct logit attribution for heads and weights via `plot_direct_logit_attribution()`
- inspection of weights via `plot_important_neurons()`
- fancy plotting of attention in multiple ways across tokens and across the actual maze via `plot_attention_final_token()`

## `residual_stream_decoding.ipynb`

- fancy interactive PCA of embeddings via `compute_pca()` and `plot_pca_colored()`
- `compute_distances_and_correlation()`
- `plot_distances_matrix()` for raw matrix of token distances
- `compute_grid_distances()` and `plot_distance_grid()` for a fancy plot of showing which tokens a coord is similar to on the actual grid
- `plot_distance_correlation()` for distance between embeddings as a function of distance between coordinates

# Other fixes and edits

- notebooks:
  - removed hallway notebook, added that config to the regular training notebook
  - probably various edits to notebooks
- deps/versioning:
  - changed `checks.yml` action to handle multiple python/pytorch versions
  - added `seaborn`, `scipy`, and `scikit-learn` to deps, since we use those in the new experiment notebooks
  - maze-dataset version now from pypi and not git
- experiments:
  - added zanj files of example dataset and hallway model
  - `mechinterp` module, which has a lot of the logic for experiments. lots of stuff here.
- housekeeping:
  - shortcut for model properties (`d_model`, `d_head`, `n_layers`, `n_heads`) on `ConfigHolder` which accesses the `BaseGPTConfig`
  - added `maze_transformer.utils.dict_shapes` for easier info about a state dict of a model (or activation cache) and tensor shapes in it
  - `ZanjHookedTransformer` was not inheriting correctly from `ConfiguredModel`



# raw commit history

* small todo

* WIP notebooks

* data and trained model for hallway

* removing duplicate notebook

* update dataset demo nb

* update maze-dataset dep to be from pypi

* fixed up dataset cfg refs

* fixed gen_dfs kwarg name change

* wip direct logit attribution refactor

* plotting logits

* logit plots

* moved plot logits out

* wip

* direct logit attribution logit diff wip

i have no idea why, but the scaling is wrong. The relationship is clearly linear,
but its different depending on whether it's diff to all other tokens or some random set of tokens

* wip

* direct logit attribution

* split up notebook

* minor fixes

* format, manually fix some imports

* added a few different tasks to logit attribution

* moved plot_attention.py

* updated path start notebook still has activation patching

* ???

* git add .git add .!

* wip

* bumped maze-dataset dep

* format

* fixed paths in eval_model notebook

* run eval_model, rename wandb conversion nb

* wip

* better maze plotting

* dla report automation, more flexible functions

* fixed bug in maze attention plots, improved logit plots

* wip

* wip

* wip???

* dla for neurons

* viewing weights

* wip

* run dla notebook

* wip

* exporting embeddings vis

* PCA of embeddings

* wip

* add pca result options

* Exported embeddings PCA functionality

* log scale for PCA option

* fixing up some tests -- they should all pass

* wip distance correlations

* wip residual stream structure

* exporting functions wip

* refactoring, added correlation info

* wip

* distance plot

* wip

* embedding similarity refactor

* more refactor, cleanup

* format

* update maze-dataset to 0.4.1 (out soon)

* update deps

* added seaborn to deps

* workflow for multiple python/torch versions

* workflow for multiple python/torch versions

* error in github actions

* oops

* update poetry and torch_cpu source

* im dumb

* wip fixing test

* fixed tokenizer padding issue in tests

* added scipy dep, specified python<3.10

* added scikit-learn dep

* format

Loading branch information

mivanit authored Oct 1, 2023

1 parent 0e60074 commit 2b6ee4a

.github/workflows/checks.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -36,15 +36,31 @@ jobs: @@
         env:
           WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
         runs-on: ubuntu-latest
+        strategy:
+          matrix:
+            versions:
+              - python: "3.10"
+                torch: "1.13.1"
+              - python: "3.10"
+                torch: "2.0.1"
+              - python: "3.11"
+                torch: "2.0.1"
         steps:
           - name: Checkout code
             uses: actions/checkout@v3
+          - name: Set up python
+            uses: actions/setup-python@v4
+            with:
+              python-version: ${{ matrix.versions.python }}
           - name: Install dependencies
             run: |
               curl -sSL https://install.python-poetry.org | python3 -
               poetry lock --check
-              poetry install
+              export CUDA_VISIBLE_DEVICES=0
+              poetry add torch@${{ matrix.versions.torch }}+cpu --source torch_cpu
+              poetry install --all-extras
           - name: Unit tests
             run: make unit
@@ Expand Down @@

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,7 @@ models/ @@
     wandb
     tests/_temp/**
     tests/**/_temp/**
+    notebooks/data/**
     .coverage
     htmlcov/
@@ Expand Down @@

examples/datasets/custom-hallway-g8-n100-a_dfs-h20593.zanj

Binary file not shown.

examples/hallway-medium_2023-06-16-03-40-47.iter_26554.zanj

Binary file not shown.

0 comments on commit `2b6ee4a`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `2b6ee4a`

Commit

There are no files selected for viewing

0 comments on commit 2b6ee4a

0 comments on commit `2b6ee4a`