Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding in some experiments notebooks and required tooling. Will need another PR for next set of experiments. # experiments notebooks ## `direct_logit_attribution.ipynb` - multiple well defined tasks in `LOGIT_ATTRIB_TASKS` - viewing distribution of logits on correct/incorrect tokens via `plot_logits()` - comparison of various ways of computing logit differences via `logits_diff_multi()` (this might be broken? idk) - logit lens, showing logit diff & attrib at each layer `plot_logit_lens()` - direct logit attribution for heads and weights via `plot_direct_logit_attribution()` - inspection of weights via `plot_important_neurons()` - fancy plotting of attention in multiple ways across tokens and across the actual maze via `plot_attention_final_token()` ## `residual_stream_decoding.ipynb` - fancy interactive PCA of embeddings via `compute_pca()` and `plot_pca_colored()` - `compute_distances_and_correlation()` - `plot_distances_matrix()` for raw matrix of token distances - `compute_grid_distances()` and `plot_distance_grid()` for a fancy plot of showing which tokens a coord is similar to on the actual grid - `plot_distance_correlation()` for distance between embeddings as a function of distance between coordinates # Other fixes and edits - notebooks: - removed hallway notebook, added that config to the regular training notebook - probably various edits to notebooks - deps/versioning: - changed `checks.yml` action to handle multiple python/pytorch versions - added `seaborn`, `scipy`, and `scikit-learn` to deps, since we use those in the new experiment notebooks - maze-dataset version now from pypi and not git - experiments: - added zanj files of example dataset and hallway model - `mechinterp` module, which has a lot of the logic for experiments. lots of stuff here. - housekeeping: - shortcut for model properties (`d_model`, `d_head`, `n_layers`, `n_heads`) on `ConfigHolder` which accesses the `BaseGPTConfig` - added `maze_transformer.utils.dict_shapes` for easier info about a state dict of a model (or activation cache) and tensor shapes in it - `ZanjHookedTransformer` was not inheriting correctly from `ConfiguredModel` # raw commit history * small todo * WIP notebooks * data and trained model for hallway * removing duplicate notebook * update dataset demo nb * update maze-dataset dep to be from pypi * fixed up dataset cfg refs * fixed gen_dfs kwarg name change * wip direct logit attribution refactor * plotting logits * logit plots * moved plot logits out * wip * direct logit attribution logit diff wip i have no idea why, but the scaling is wrong. The relationship is clearly linear, but its different depending on whether it's diff to all other tokens or some random set of tokens * wip * direct logit attribution * split up notebook * minor fixes * format, manually fix some imports * added a few different tasks to logit attribution * moved plot_attention.py * updated path start notebook still has activation patching * ??? * git add .git add .! * wip * bumped maze-dataset dep * format * fixed paths in eval_model notebook * run eval_model, rename wandb conversion nb * wip * better maze plotting * dla report automation, more flexible functions * fixed bug in maze attention plots, improved logit plots * wip * wip * wip??? * dla for neurons * viewing weights * wip * run dla notebook * wip * exporting embeddings vis * PCA of embeddings * wip * add pca result options * Exported embeddings PCA functionality * log scale for PCA option * fixing up some tests -- they should all pass * wip distance correlations * wip residual stream structure * exporting functions wip * refactoring, added correlation info * wip * distance plot * wip * embedding similarity refactor * more refactor, cleanup * format * update maze-dataset to 0.4.1 (out soon) * update deps * added seaborn to deps * workflow for multiple python/torch versions * workflow for multiple python/torch versions * error in github actions * oops * update poetry and torch_cpu source * im dumb * wip fixing test * fixed tokenizer padding issue in tests * added scipy dep, specified python<3.10 * added scikit-learn dep * format
- Loading branch information