diff --git a/README.md b/README.md
index 079d1229..d3590abd 100644
--- a/README.md
+++ b/README.md
@@ -4,17 +4,17 @@
 
 [![Main](https://travis-ci.com/AlexImmer/Laplace.svg?token=rpuRxEjQS6cCZi7ptL9y&branch=main)](https://travis-ci.com/AlexImmer/Laplace)
 
-The laplace package facilitates the application of Laplace approximations for entire neural networks or just their last layer.
+The laplace package facilitates the application of Laplace approximations for entire neural networks, subnetworks of neural networks, or just their last layer.
 The package enables posterior approximations, marginal-likelihood estimation, and various posterior predictive computations.
 The library documentation is available at [https://aleximmer.github.io/Laplace](https://aleximmer.github.io/Laplace).
 
 There is also a corresponding paper, [*Laplace Redux — Effortless Bayesian Deep Learning*](https://arxiv.org/abs/2106.14806), which introduces the library, provides an introduction to the Laplace approximation, reviews its use in deep learning, and empirically demonstrates its versatility and competitiveness. Please consider referring to the paper when using our library:
 ```bibtex
-@article{daxberger2021laplace,
-  title={Laplace Redux--Effortless Bayesian Deep Learning},
-  author={Daxberger, Erik and Kristiadi, Agustinus and Immer, Alexander
-          and Eschenhagen, Runa and Bauer, Matthias and Hennig, Philipp},
-  journal={arXiv preprint arXiv:2106.14806},
+@inproceedings{laplace2021,
+  title={Laplace Redux--Effortless {B}ayesian Deep Learning},
+  author={Erik Daxberger and Agustinus Kristiadi and Alexander Immer 
+          and Runa Eschenhagen and Matthias Bauer and Philipp Hennig},
+  booktitle={{N}eur{IPS}},
   year={2021}
 }
 ```
@@ -39,18 +39,24 @@ pytest tests/
 ## Structure
 The laplace package consists of two main components:
 
-1. The subclasses of [`laplace.BaseLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py) that implement different sparsity structures: different subsets of weights (`'all'` and `'last_layer'`) and different structures of the Hessian approximation (`'full'`, `'kron'`, and `'diag'`). This results in six currently available options: `laplace.FullLaplace`, `laplace.KronLaplace`, `laplace.DiagLaplace`, and the corresponding last-layer variations `laplace.FullLLLaplace`, `laplace.KronLLLaplace`,  and `laplace.DiagLLLaplace`, which are all subclasses of [`laplace.LLLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py). All of these can be conveniently accessed via the [`laplace.Laplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py) function.
+1. The subclasses of [`laplace.BaseLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py) that implement different sparsity structures: different subsets of weights (`'all'`, `'subnetwork'` and `'last_layer'`) and different structures of the Hessian approximation (`'full'`, `'kron'`, `'lowrank'` and `'diag'`). This results in _eight_ currently available options: `laplace.FullLaplace`, `laplace.KronLaplace`, `laplace.DiagLaplace`, the corresponding last-layer variations `laplace.FullLLLaplace`, `laplace.KronLLLaplace`,  and `laplace.DiagLLLaplace` (which are all subclasses of [`laplace.LLLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py)), [`laplace.SubnetLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/subnetlaplace.py) (which only supports a `'full'` Hessian approximation) and `laplace.LowRankLaplace` (which only supports inference over `'all'` weights). All of these can be conveniently accessed via the [`laplace.Laplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py) function.
 2. The backends in [`laplace.curvature`](https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/) which provide access to Hessian approximations of
 the corresponding sparsity structures, for example, the diagonal GGN.
 
 Additionally, the package provides utilities for
-decomposing a neural network into feature extractor and last layer for `LLLaplace` subclasses ([`laplace.feature_extractor`](https://github.com/AlexImmer/Laplace/blob/main/laplace/feature_extractor.py))
+decomposing a neural network into feature extractor and last layer for `LLLaplace` subclasses ([`laplace.utils.feature_extractor`](https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/feature_extractor.py))
 and
-effectively dealing with Kronecker factors ([`laplace.matrix`](https://github.com/AlexImmer/Laplace/blob/main/laplace/matrix.py)).
+effectively dealing with Kronecker factors ([`laplace.utils.matrix`](https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/matrix.py)).
+
+Finally, the package implements several options to select/specify a subnetwork for `SubnetLaplace` (as subclasses of [`laplace.utils.subnetmask.SubnetMask`](https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/subnetmask.py)).
+Automatic subnetwork selection strategies include: uniformly at random (`laplace.utils.subnetmask.RandomSubnetMask`), by largest parameter magnitudes (`LargestMagnitudeSubnetMask`), and by largest marginal parameter variances (`LargestVarianceDiagLaplaceSubnetMask` and `LargestVarianceSWAGSubnetMask`).
+In addition to that, subnetworks can also be specified manually, by listing the names of either the model parameters (`ParamNameSubnetMask`) or modules (`ModuleNameSubnetMask`) to perform Laplace inference over.
 
 ## Extendability
 To extend the laplace package, new `BaseLaplace` subclasses can be designed, for example,
-a block-diagonal structure or subset-of-weights Laplace.
+Laplace with a block-diagonal Hessian structure.
+One can also implement custom subnetwork selection strategies as new subclasses of `SubnetMask`.
+
 Alternatively, extending or integrating backends (subclasses of [`curvature.curvature`](https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/curvature.py)) allows to provide different Hessian
 approximations to the Laplace approximations.
 For example, currently the [`curvature.BackPackInterface`](https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/backpack.py) based on [BackPACK](https://github.com/f-dangel/backpack/) and [`curvature.AsdlInterface`](https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/asdl.py) based on [ASDL](https://github.com/kazukiosawa/asdfghjkl) are available.
@@ -60,10 +66,11 @@ for a regression (MSELoss) loss function.
 
 ## Example usage
 
-### *Post-hoc* prior precision tuning of last-layer LA
+### *Post-hoc* prior precision tuning of diagonal LA
 
 In the following example, a pre-trained model is loaded,
-then the Laplace approximation is fit to the training data,
+then the Laplace approximation is fit to the training data
+(using a diagonal Hessian approximation over all parameters),
 and the prior precision is optimized with cross-validation `'CV'`.
 After that, the resulting LA is used for prediction with
 the `'probit'` predictive for classification.
@@ -71,7 +78,7 @@ the `'probit'` predictive for classification.
 ```python
 from laplace import Laplace
 
-# pre-trained model
+# Pre-trained model
 model = load_map_model()  
 
 # User-specified LA flavor
@@ -87,7 +94,7 @@ pred = la(x, link_approx='probit')
 
 ### Differentiating the log marginal likelihood w.r.t. hyperparameters
 
-The marginal likelihood can be used for model selection and is differentiable
+The marginal likelihood can be used for model selection [10] and is differentiable
 for continuous hyperparameters like the prior precision or observation noise.
 Here, we fit the library default, KFAC last-layer LA and differentiate
 the log marginal likelihood.
@@ -107,6 +114,45 @@ ml = la.log_marginal_likelihood(prior_prec, obs_noise)
 ml.backward()
 ```
 
+### Applying the LA over only a subset of the model parameters
+
+This example shows how to fit the Laplace approximation over only
+a subnetwork within a neural network (while keeping all other parameters
+fixed at their MAP estimates), as proposed in [11]. It also exemplifies
+different ways to specify the subnetwork to perform inference over.
+
+```python
+from laplace import Laplace
+
+# Pre-trained model
+model = load_model()
+
+# Examples of different ways to specify the subnetwork
+# via indices of the vectorized model parameters
+#
+# Example 1: select the 128 parameters with the largest magnitude
+from laplace.utils import LargestMagnitudeSubnetMask
+subnetwork_mask = LargestMagnitudeSubnetMask(model, n_params_subnet=128)
+subnetwork_indices = subnetwork_mask.select()
+
+# Example 2: specify the layers that define the subnetwork
+from laplace.utils import ModuleNameSubnetMask
+subnetwork_mask = ModuleNameSubnetMask(model, module_names=['layer.1', 'layer.3'])
+subnetwork_mask.select()
+subnetwork_indices = subnetwork_mask.indices
+
+# Example 3: manually define the subnetwork via custom subnetwork indices
+import torch
+subnetwork_indices = torch.tensor([0, 4, 11, 42, 123, 2021])
+
+# Define and fit subnetwork LA using the specified subnetwork indices
+la = Laplace(model, 'classification',
+             subset_of_weights='subnetwork',
+             hessian_structure='full',
+             subnetwork_indices=subnetwork_indices)
+la.fit(train_loader)
+```
+
 ## Documentation
 
 The documentation is available [here](https://aleximmer.github.io/Laplace) or can be generated and/or viewed locally:
@@ -122,7 +168,7 @@ pdoc --http 0.0.0.0:8080 laplace --template-dir template
 
 ## References
 
-This package relies on various improvements to the Laplace approximation for neural networks, which was originally due to MacKay [1].
+This package relies on various improvements to the Laplace approximation for neural networks, which was originally due to MacKay [1]. Please consider citing the respective papers if you use any of their proposed methods via our laplace library.
 
 - [1] MacKay, DJC. [*A Practical Bayesian Framework for Backpropagation Networks*](https://authors.library.caltech.edu/13793/). Neural Computation 1992.
 - [2] Gibbs, M. N. [*Bayesian Gaussian Processes for Regression and Classification*](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.1130&rep=rep1&type=pdf). PhD Thesis 1997.
@@ -132,4 +178,6 @@ This package relies on various improvements to the Laplace approximation for neu
 - [6] Khan, M. E., Immer, A., Abedi, E., Korzepa, M. [*Approximate Inference Turns Deep Networks into Gaussian Processes*](https://arxiv.org/abs/1906.01930). NeurIPS 2019.
 - [7] Kristiadi, A., Hein, M., Hennig, P. [*Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks*](https://arxiv.org/abs/2002.10118). ICML 2020.
 - [8] Immer, A., Korzepa, M., Bauer, M. [*Improving predictions of Bayesian neural nets via local linearization*](https://arxiv.org/abs/2008.08400). AISTATS 2021.
-- [9] Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, EM. [*Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning*](https://arxiv.org/abs/2104.04975). ICML 2021.
+- [9] Sharma, A., Azizan, N., Pavone, M. [*Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks*](https://arxiv.org/abs/2102.12567). UAI 2021.
+- [10] Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, EM. [*Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning*](https://arxiv.org/abs/2104.04975). ICML 2021.
+- [11] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM. [*Bayesian Deep Learning via Subnetwork Inference*](https://arxiv.org/abs/2010.14689). ICML 2021.
\ No newline at end of file
diff --git a/docs/baselaplace.html b/docs/baselaplace.html
index fbaa0a07..ea13c62b 100644
--- a/docs/baselaplace.html
+++ b/docs/baselaplace.html
@@ -172,6 +172,253 @@ <h2 id="parameters">Parameters</h2>
 </dd>
 </dl>
 </dd>
+<dt id="laplace.baselaplace.ParametricLaplace"><code class="flex name class">
+<span>class <span class="ident">ParametricLaplace</span></span>
+<span>(</span><span>model, likelihood, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, backend_kwargs=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Parametric Laplace class.</p>
+<p>Subclasses need to specify how the Hessian approximation is initialized,
+how to add up curvature over training data, how to sample from the
+Laplace approximation, and how to compute the functional variance.</p>
+<p>A Laplace approximation is represented by a MAP which is given by the
+<code>model</code> parameter and a posterior precision or covariance specifying
+a Gaussian distribution <span><span class="MathJax_Preview">\mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex">\mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.
+The goal of this class is to compute the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>
+which sums as
+<span><span class="MathJax_Preview">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</span><script type="math/tex; mode=display">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</script></span>
+Every subclass implements different approximations to the log likelihood Hessians,
+for example, a diagonal one. The prior is assumed to be Gaussian and therefore we have
+a simple form for <span><span class="MathJax_Preview">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </span><script type="math/tex">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </script></span>.
+In particular, we assume a scalar, layer-wise, or diagonal prior precision so that in
+all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
+</ul>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.DiagLaplace" href="#laplace.baselaplace.DiagLaplace">DiagLaplace</a></li>
+<li><a title="laplace.baselaplace.FullLaplace" href="#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
+<li><a title="laplace.baselaplace.KronLaplace" href="#laplace.baselaplace.KronLaplace">KronLaplace</a></li>
+<li><a title="laplace.baselaplace.LowRankLaplace" href="#laplace.baselaplace.LowRankLaplace">LowRankLaplace</a></li>
+<li><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.baselaplace.ParametricLaplace.scatter"><code class="name">var <span class="ident">scatter</span></code></dt>
+<dd>
+<div class="desc"><p>Computes the <em>scatter</em>, a term of the log marginal likelihood that
+corresponds to L-2 regularization:
+<code>scatter</code> = <span><span class="MathJax_Preview">(\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) </span><script type="math/tex">(\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) </script></span>.</p>
+<h2 id="returns">Returns</h2>
+<p>[type]
+[description]</p></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.log_det_prior_precision"><code class="name">var <span class="ident">log_det_prior_precision</span></code></dt>
+<dd>
+<div class="desc"><p>Compute log determinant of the prior precision
+<span><span class="MathJax_Preview">\log \det P_0</span><script type="math/tex">\log \det P_0</script></span></p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_det</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.log_det_posterior_precision"><code class="name">var <span class="ident">log_det_posterior_precision</span></code></dt>
+<dd>
+<div class="desc"><p>Compute log determinant of the posterior precision
+<span><span class="MathJax_Preview">\log \det P</span><script type="math/tex">\log \det P</script></span> which depends on the subclasses structure
+used for the Hessian approximation.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_det</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.log_det_ratio"><code class="name">var <span class="ident">log_det_ratio</span></code></dt>
+<dd>
+<div class="desc"><p>Compute the log determinant ratio, a part of the log marginal likelihood.
+<span><span class="MathJax_Preview">
+\log \frac{\det P}{\det P_0} = \log \det P - \log \det P_0
+</span><script type="math/tex; mode=display">
+\log \frac{\det P}{\det P_0} = \log \det P - \log \det P_0
+</script></span></p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_det_ratio</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.posterior_precision"><code class="name">var <span class="ident">posterior_precision</span></code></dt>
+<dd>
+<div class="desc"><p>Compute or return the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>posterior_prec</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.baselaplace.ParametricLaplace.fit"><code class="name flex">
+<span>def <span class="ident">fit</span></span>(<span>self, train_loader, override=True)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Fit the local Laplace approximation at the parameters of the model.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
+<dd>each iterate is a training batch (X, y);
+<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+<dt><strong><code>override</code></strong> :&ensp;<code>bool</code>, default=<code>True</code></dt>
+<dd>whether to initialize H, loss, and n_data again; setting to False is useful for
+online learning settings to accumulate a sequential posterior approximation.</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.square_norm"><code class="name flex">
+<span>def <span class="ident">square_norm</span></span>(<span>self, value)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute the square norm under post. Precision with <code>value-self.mean</code> as 𝛥:
+<span><span class="MathJax_Preview">
+\Delta^
+op P \Delta
+</span><script type="math/tex; mode=display">
+\Delta^
+op P \Delta
+</script></span>
+Returns</p>
+<hr>
+<dl>
+<dt><code>square_form</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.log_prob"><code class="name flex">
+<span>def <span class="ident">log_prob</span></span>(<span>self, value, normalized=True)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute the log probability under the (current) Laplace approximation.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>normalized</code></strong> :&ensp;<code>bool</code>, default=<code>True</code></dt>
+<dd>whether to return log of a properly normalized Gaussian or just the
+terms that depend on <code>value</code>.</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_prob</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood"><code class="name flex">
+<span>def <span class="ident">log_marginal_likelihood</span></span>(<span>self, prior_precision=None, sigma_noise=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute the Laplace approximation to the log marginal likelihood subject
+to specific Hessian approximations that subclasses implement.
+Requires that the Laplace approximation has been fit before.
+The resulting torch.Tensor is differentiable in <code>prior_precision</code> and
+<code>sigma_noise</code> if these have gradients enabled.
+By passing <code>prior_precision</code> or <code>sigma_noise</code>, the current value is
+overwritten. This is useful for iterating on the log marginal likelihood.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>prior_precision</code></strong> :&ensp;<code>torch.Tensor</code>, optional</dt>
+<dd>prior precision if should be changed from current <code>prior_precision</code> value</dd>
+<dt><strong><code>sigma_noise</code></strong> :&ensp;<code>[type]</code>, optional</dt>
+<dd>observation noise standard deviation if should be changed</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_marglik</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.predictive_samples"><code class="name flex">
+<span>def <span class="ident">predictive_samples</span></span>(<span>self, x, pred_type='glm', n_samples=100)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Sample from the posterior predictive on input data <code>x</code>.
+Can be used, for example, for Thompson sampling.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>input data <code>(batch_size, input_shape)</code></dd>
+<dt><strong><code>pred_type</code></strong> :&ensp;<code>{'glm', 'nn'}</code>, default=<code>'glm'</code></dt>
+<dd>type of posterior predictive, linearized GLM predictive or neural
+network sampling predictive. The GLM predictive is consistent with
+the curvature approximations used here.</dd>
+<dt><strong><code>n_samples</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of samples</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>samples</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>samples <code>(n_samples, batch_size, output_shape)</code></dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.functional_variance"><code class="name flex">
+<span>def <span class="ident">functional_variance</span></span>(<span>self, Jacs)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute functional variance for the <code>'glm'</code> predictive:
+<code>f_var[i] = Jacs[i] @ P.inv() @ Jacs[i].T</code>, which is a output x output
+predictive covariance matrix.
+Mathematically, we have for a single Jacobian
+<span><span class="MathJax_Preview">\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}</span><script type="math/tex">\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}</script></span>
+the output covariance matrix
+<span><span class="MathJax_Preview"> \mathcal{J} P^{-1} \mathcal{J}^T </span><script type="math/tex"> \mathcal{J} P^{-1} \mathcal{J}^T </script></span>.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>Jacs</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>Jacobians of model output wrt parameters
+<code>(batch, outputs, parameters)</code></dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>f_var</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>output covariance <code>(batch, outputs, outputs)</code></dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.sample"><code class="name flex">
+<span>def <span class="ident">sample</span></span>(<span>self, n_samples=100)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Sample from the Laplace posterior approximation, i.e.,
+<span><span class="MathJax_Preview"> \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex"> \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>n_samples</code></strong> :&ensp;<code>int</code>, default=<code>100</code></dt>
+<dd>number of samples</dd>
+</dl></div>
+</dd>
+<dt id="laplace.baselaplace.ParametricLaplace.optimize_prior_precision"><code class="name flex">
+<span>def <span class="ident">optimize_prior_precision</span></span>(<span>self, method='marglik', pred_type='glm', n_steps=100, lr=0.1, init_prior_prec=1.0, val_loader=None, loss=&lt;function get_nll&gt;, log_prior_prec_min=-4, log_prior_prec_max=4, grid_size=100, link_approx='probit', n_samples=100, verbose=False, cv_loss_with_var=False)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.baselaplace.BaseLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.BaseLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.BaseLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
 <dt id="laplace.baselaplace.FullLaplace"><code class="flex name class">
 <span>class <span class="ident">FullLaplace</span></span>
 <span>(</span><span>model, likelihood, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, backend_kwargs=None)</span>
@@ -190,6 +437,7 @@ <h3>Ancestors</h3>
 <h3>Subclasses</h3>
 <ul class="hlist">
 <li><a title="laplace.lllaplace.FullLLLaplace" href="lllaplace.html#laplace.lllaplace.FullLLLaplace">FullLLLaplace</a></li>
+<li><a title="laplace.subnetlaplace.SubnetLaplace" href="subnetlaplace.html#laplace.subnetlaplace.SubnetLaplace">SubnetLaplace</a></li>
 </ul>
 <h3>Instance variables</h3>
 <dl>
@@ -233,11 +481,13 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -252,7 +502,7 @@ <h3>Inherited members</h3>
 Mathematically, we have for each parameter group, e.g., torch.nn.Module,
 that \P\approx Q \otimes H.
 See <code><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></code> for the full interface and see
-<code><a title="laplace.matrix.Kron" href="matrix.html#laplace.matrix.Kron">Kron</a></code> and <code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
+<code><a title="laplace.utils.matrix.Kron" href="utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code> and <code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
 the Kronecker factors. <code>Kron</code> is used to aggregate factors by summing up and
 <code>KronDecomposed</code> is used to add the prior, a Hessian factor (e.g. temperature),
 and computing posterior covariances, marginal likelihood, etc.
@@ -273,7 +523,7 @@ <h3>Instance variables</h3>
 <div class="desc"><p>Kronecker factored Posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>.</p>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>precision</code></strong> :&ensp;<code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code></dt>
+<dt><strong><code>precision</code></strong> :&ensp;<code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code></dt>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
@@ -293,11 +543,13 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -361,218 +613,80 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
 </dd>
-<dt id="laplace.baselaplace.ParametricLaplace"><code class="flex name class">
-<span>class <span class="ident">ParametricLaplace</span></span>
-<span>(</span><span>model, likelihood, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, backend_kwargs=None)</span>
+<dt id="laplace.baselaplace.LowRankLaplace"><code class="flex name class">
+<span>class <span class="ident">LowRankLaplace</span></span>
+<span>(</span><span>model, likelihood, sigma_noise=1, prior_precision=1, prior_mean=0, temperature=1, backend=laplace.curvature.asdl.AsdlHessian, backend_kwargs=None)</span>
 </code></dt>
 <dd>
-<div class="desc"><p>Parametric Laplace class.</p>
-<p>Subclasses need to specify how the Hessian approximation is initialized,
-how to add up curvature over training data, how to sample from the
-Laplace approximation, and how to compute the functional variance.</p>
-<p>A Laplace approximation is represented by a MAP which is given by the
-<code>model</code> parameter and a posterior precision or covariance specifying
-a Gaussian distribution <span><span class="MathJax_Preview">\mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex">\mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.
-The goal of this class is to compute the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>
-which sums as
-<span><span class="MathJax_Preview">
-P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
-\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
-</span><script type="math/tex; mode=display">
-P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
-\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
-</script></span>
-Every subclass implements different approximations to the log likelihood Hessians,
-for example, a diagonal one. The prior is assumed to be Gaussian and therefore we have
-a simple form for <span><span class="MathJax_Preview">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </span><script type="math/tex">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </script></span>.
-In particular, we assume a scalar, layer-wise, or diagonal prior precision so that in
-all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p></div>
+<div class="desc"><p>Laplace approximation with low-rank log likelihood Hessian (approximation).
+The low-rank matrix is represented by an eigendecomposition (vecs, values).
+Based on the chosen <code>backend</code>, either a true Hessian or, for example, GGN
+approximation could be used.
+The posterior precision is computed as
+<span><span class="MathJax_Preview"> P = V diag(l) V^T + P_0.</span><script type="math/tex"> P = V diag(l) V^T + P_0.</script></span>
+To sample, compute the functional variance, and log determinant, algebraic tricks
+are usedto reduce the costs of inversion to the that of a <span><span class="MathJax_Preview">K
+imes K</span><script type="math/tex">K
+imes K</script></span> matrix
+if we have a rank of K.</p>
+<p>See <code><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></code> for the full interface.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
+<li><a title="laplace.baselaplace.ParametricLaplace" href="#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
-<h3>Subclasses</h3>
-<ul class="hlist">
-<li><a title="laplace.baselaplace.DiagLaplace" href="#laplace.baselaplace.DiagLaplace">DiagLaplace</a></li>
-<li><a title="laplace.baselaplace.FullLaplace" href="#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
-<li><a title="laplace.baselaplace.KronLaplace" href="#laplace.baselaplace.KronLaplace">KronLaplace</a></li>
-<li>laplace.lllaplace.LLLaplace</li>
-</ul>
 <h3>Instance variables</h3>
 <dl>
-<dt id="laplace.baselaplace.ParametricLaplace.scatter"><code class="name">var <span class="ident">scatter</span></code></dt>
-<dd>
-<div class="desc"><p>Computes the <em>scatter</em>, a term of the log marginal likelihood that
-corresponds to L-2 regularization:
-<code>scatter</code> = <span><span class="MathJax_Preview">(\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) </span><script type="math/tex">(\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) </script></span>.</p>
-<h2 id="returns">Returns</h2>
-<p>[type]
-[description]</p></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.log_det_prior_precision"><code class="name">var <span class="ident">log_det_prior_precision</span></code></dt>
-<dd>
-<div class="desc"><p>Compute log determinant of the prior precision
-<span><span class="MathJax_Preview">\log \det P_0</span><script type="math/tex">\log \det P_0</script></span></p>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>log_det</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>&nbsp;</dd>
-</dl></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.log_det_posterior_precision"><code class="name">var <span class="ident">log_det_posterior_precision</span></code></dt>
+<dt id="laplace.baselaplace.LowRankLaplace.V"><code class="name">var <span class="ident">V</span></code></dt>
 <dd>
-<div class="desc"><p>Compute log determinant of the posterior precision
-<span><span class="MathJax_Preview">\log \det P</span><script type="math/tex">\log \det P</script></span> which depends on the subclasses structure
-used for the Hessian approximation.</p>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>log_det</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>&nbsp;</dd>
-</dl></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.log_det_ratio"><code class="name">var <span class="ident">log_det_ratio</span></code></dt>
-<dd>
-<div class="desc"><p>Compute the log determinant ratio, a part of the log marginal likelihood.
-<span><span class="MathJax_Preview">
-\log \frac{\det P}{\det P_0} = \log \det P - \log \det P_0
-</span><script type="math/tex; mode=display">
-\log \frac{\det P}{\det P_0} = \log \det P - \log \det P_0
-</script></span></p>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>log_det_ratio</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>&nbsp;</dd>
-</dl></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.posterior_precision"><code class="name">var <span class="ident">posterior_precision</span></code></dt>
-<dd>
-<div class="desc"><p>Compute or return the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>.</p>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>posterior_prec</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>&nbsp;</dd>
-</dl></div>
-</dd>
-</dl>
-<h3>Methods</h3>
-<dl>
-<dt id="laplace.baselaplace.ParametricLaplace.fit"><code class="name flex">
-<span>def <span class="ident">fit</span></span>(<span>self, train_loader)</span>
-</code></dt>
-<dd>
-<div class="desc"><p>Fit the local Laplace approximation at the parameters of the model.</p>
-<h2 id="parameters">Parameters</h2>
-<dl>
-<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
-<dd>each iterate is a training batch (X, y);
-<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
-</dl></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood"><code class="name flex">
-<span>def <span class="ident">log_marginal_likelihood</span></span>(<span>self, prior_precision=None, sigma_noise=None)</span>
-</code></dt>
-<dd>
-<div class="desc"><p>Compute the Laplace approximation to the log marginal likelihood subject
-to specific Hessian approximations that subclasses implement.
-Requires that the Laplace approximation has been fit before.
-The resulting torch.Tensor is differentiable in <code>prior_precision</code> and
-<code>sigma_noise</code> if these have gradients enabled.
-By passing <code>prior_precision</code> or <code>sigma_noise</code>, the current value is
-overwritten. This is useful for iterating on the log marginal likelihood.</p>
-<h2 id="parameters">Parameters</h2>
-<dl>
-<dt><strong><code>prior_precision</code></strong> :&ensp;<code>torch.Tensor</code>, optional</dt>
-<dd>prior precision if should be changed from current <code>prior_precision</code> value</dd>
-<dt><strong><code>sigma_noise</code></strong> :&ensp;<code>[type]</code>, optional</dt>
-<dd>observation noise standard deviation if should be changed</dd>
-</dl>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>log_marglik</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>&nbsp;</dd>
-</dl></div>
+<div class="desc"></div>
 </dd>
-<dt id="laplace.baselaplace.ParametricLaplace.predictive_samples"><code class="name flex">
-<span>def <span class="ident">predictive_samples</span></span>(<span>self, x, pred_type='glm', n_samples=100)</span>
-</code></dt>
+<dt id="laplace.baselaplace.LowRankLaplace.Kinv"><code class="name">var <span class="ident">Kinv</span></code></dt>
 <dd>
-<div class="desc"><p>Sample from the posterior predictive on input data <code>x</code>.
-Can be used, for example, for Thompson sampling.</p>
-<h2 id="parameters">Parameters</h2>
-<dl>
-<dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>input data <code>(batch_size, input_shape)</code></dd>
-<dt><strong><code>pred_type</code></strong> :&ensp;<code>{'glm', 'nn'}</code>, default=<code>'glm'</code></dt>
-<dd>type of posterior predictive, linearized GLM predictive or neural
-network sampling predictive. The GLM predictive is consistent with
-the curvature approximations used here.</dd>
-<dt><strong><code>n_samples</code></strong> :&ensp;<code>int</code></dt>
-<dd>number of samples</dd>
-</dl>
-<h2 id="returns">Returns</h2>
-<dl>
-<dt><strong><code>samples</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>samples <code>(n_samples, batch_size, output_shape)</code></dd>
-</dl></div>
+<div class="desc"></div>
 </dd>
-<dt id="laplace.baselaplace.ParametricLaplace.functional_variance"><code class="name flex">
-<span>def <span class="ident">functional_variance</span></span>(<span>self, Jacs)</span>
-</code></dt>
+<dt id="laplace.baselaplace.LowRankLaplace.posterior_precision"><code class="name">var <span class="ident">posterior_precision</span></code></dt>
 <dd>
-<div class="desc"><p>Compute functional variance for the <code>'glm'</code> predictive:
-<code>f_var[i] = Jacs[i] @ P.inv() @ Jacs[i].T</code>, which is a output x output
-predictive covariance matrix.
-Mathematically, we have for a single Jacobian
-<span><span class="MathJax_Preview">\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}</span><script type="math/tex">\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}</script></span>
-the output covariance matrix
-<span><span class="MathJax_Preview"> \mathcal{J} P^{-1} \mathcal{J}^T </span><script type="math/tex"> \mathcal{J} P^{-1} \mathcal{J}^T </script></span>.</p>
-<h2 id="parameters">Parameters</h2>
-<dl>
-<dt><strong><code>Jacs</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>Jacobians of model output wrt parameters
-<code>(batch, outputs, parameters)</code></dd>
-</dl>
+<div class="desc"><p>Return correctly scaled posterior precision that would be constructed
+as H[0] @ diag(H[1]) @ H[0].T + self.prior_precision_diag.</p>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>f_var</code></strong> :&ensp;<code>torch.Tensor</code></dt>
-<dd>output covariance <code>(batch, outputs, outputs)</code></dd>
-</dl></div>
-</dd>
-<dt id="laplace.baselaplace.ParametricLaplace.sample"><code class="name flex">
-<span>def <span class="ident">sample</span></span>(<span>self, n_samples=100)</span>
-</code></dt>
-<dd>
-<div class="desc"><p>Sample from the Laplace posterior approximation, i.e.,
-<span><span class="MathJax_Preview"> \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex"> \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.</p>
-<h2 id="parameters">Parameters</h2>
-<dl>
-<dt><strong><code>n_samples</code></strong> :&ensp;<code>int</code>, default=<code>100</code></dt>
-<dd>number of samples</dd>
+<dt><strong><code>H</code></strong> :&ensp;<code>tuple(eigenvectors, eigenvalues)</code></dt>
+<dd>scaled self.H with temperature and loss factors.</dd>
+<dt><strong><code>prior_precision_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>diagonal prior precision shape <code>parameters</code> to be added to H.</dd>
 </dl></div>
 </dd>
-<dt id="laplace.baselaplace.ParametricLaplace.optimize_prior_precision"><code class="name flex">
-<span>def <span class="ident">optimize_prior_precision</span></span>(<span>self, method='marglik', pred_type='glm', n_steps=100, lr=0.1, init_prior_prec=1.0, val_loader=None, loss=&lt;function get_nll&gt;, log_prior_prec_min=-4, log_prior_prec_max=4, grid_size=100, link_approx='probit', n_samples=100, verbose=False, cv_loss_with_var=False)</span>
-</code></dt>
-<dd>
-<div class="desc"></div>
-</dd>
 </dl>
 <h3>Inherited members</h3>
 <ul class="hlist">
-<li><code><b><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.BaseLaplace">BaseLaplace</a></b></code>:
+<li><code><b><a title="laplace.baselaplace.ParametricLaplace" href="#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.BaseLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.BaseLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
-<li><code><a title="laplace.baselaplace.BaseLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.fit" href="#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.functional_variance" href="#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_posterior_precision" href="#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_prior_precision" href="#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -603,18 +717,11 @@ <h4><code><a title="laplace.baselaplace.BaseLaplace" href="#laplace.baselaplace.
 </ul>
 </li>
 <li>
-<h4><code><a title="laplace.baselaplace.FullLaplace" href="#laplace.baselaplace.FullLaplace">FullLaplace</a></code></h4>
-</li>
-<li>
-<h4><code><a title="laplace.baselaplace.KronLaplace" href="#laplace.baselaplace.KronLaplace">KronLaplace</a></code></h4>
-</li>
-<li>
-<h4><code><a title="laplace.baselaplace.DiagLaplace" href="#laplace.baselaplace.DiagLaplace">DiagLaplace</a></code></h4>
-</li>
-<li>
 <h4><code><a title="laplace.baselaplace.ParametricLaplace" href="#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></code></h4>
 <ul class="">
 <li><code><a title="laplace.baselaplace.ParametricLaplace.fit" href="#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.functional_variance" href="#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
@@ -622,6 +729,18 @@ <h4><code><a title="laplace.baselaplace.ParametricLaplace" href="#laplace.basela
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision" href="#laplace.baselaplace.ParametricLaplace.optimize_prior_precision">optimize_prior_precision</a></code></li>
 </ul>
 </li>
+<li>
+<h4><code><a title="laplace.baselaplace.FullLaplace" href="#laplace.baselaplace.FullLaplace">FullLaplace</a></code></h4>
+</li>
+<li>
+<h4><code><a title="laplace.baselaplace.KronLaplace" href="#laplace.baselaplace.KronLaplace">KronLaplace</a></code></h4>
+</li>
+<li>
+<h4><code><a title="laplace.baselaplace.DiagLaplace" href="#laplace.baselaplace.DiagLaplace">DiagLaplace</a></code></h4>
+</li>
+<li>
+<h4><code><a title="laplace.baselaplace.LowRankLaplace" href="#laplace.baselaplace.LowRankLaplace">LowRankLaplace</a></code></h4>
+</li>
 </ul>
 </li>
 </ul>
diff --git a/docs/curvature/asdl.html b/docs/curvature/asdl.html
index 23b40a34..ecba76dd 100644
--- a/docs/curvature/asdl.html
+++ b/docs/curvature/asdl.html
@@ -35,7 +35,7 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
 <dt id="laplace.curvature.asdl.AsdlInterface"><code class="flex name class">
 <span>class <span class="ident">AsdlInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for asdfghjkl backend.</p></div>
@@ -47,19 +47,18 @@ <h3>Subclasses</h3>
 <ul class="hlist">
 <li><a title="laplace.curvature.asdl.AsdlEF" href="#laplace.curvature.asdl.AsdlEF">AsdlEF</a></li>
 <li><a title="laplace.curvature.asdl.AsdlGGN" href="#laplace.curvature.asdl.AsdlGGN">AsdlGGN</a></li>
+<li><a title="laplace.curvature.asdl.AsdlHessian" href="#laplace.curvature.asdl.AsdlHessian">AsdlHessian</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.asdl.AsdlInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_\theta f(x;\theta)</span><script type="math/tex">\nabla_\theta f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>
 using asdfghjkl's gradient per output dimension.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -71,9 +70,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.asdl.AsdlInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -108,9 +104,43 @@ <h3>Inherited members</h3>
 </li>
 </ul>
 </dd>
+<dt id="laplace.curvature.asdl.AsdlHessian"><code class="flex name class">
+<span>class <span class="ident">AsdlHessian</span></span>
+<span>(</span><span>model, likelihood, last_layer=False, low_rank=10)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Interface for asdfghjkl backend.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.curvature.asdl.AsdlInterface" href="#laplace.curvature.asdl.AsdlInterface">AsdlInterface</a></li>
+<li><a title="laplace.curvature.curvature.CurvatureInterface" href="curvature.html#laplace.curvature.curvature.CurvatureInterface">CurvatureInterface</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.curvature.asdl.AsdlHessian.eig_lowrank"><code class="name flex">
+<span>def <span class="ident">eig_lowrank</span></span>(<span>self, data_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.curvature.asdl.AsdlInterface" href="#laplace.curvature.asdl.AsdlInterface">AsdlInterface</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.diag" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.diag">diag</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.full" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.full">full</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.gradients" href="#laplace.curvature.asdl.AsdlInterface.gradients">gradients</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.jacobians" href="#laplace.curvature.asdl.AsdlInterface.jacobians">jacobians</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.kron" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.kron">kron</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.last_layer_jacobians" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.last_layer_jacobians">last_layer_jacobians</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
 <dt id="laplace.curvature.asdl.AsdlGGN"><code class="flex name class">
 <span>class <span class="ident">AsdlGGN</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of the <code>GGNInterface</code> using asdfghjkl.</p></div>
@@ -184,6 +214,12 @@ <h4><code><a title="laplace.curvature.asdl.AsdlInterface" href="#laplace.curvatu
 </ul>
 </li>
 <li>
+<h4><code><a title="laplace.curvature.asdl.AsdlHessian" href="#laplace.curvature.asdl.AsdlHessian">AsdlHessian</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.curvature.asdl.AsdlHessian.eig_lowrank" href="#laplace.curvature.asdl.AsdlHessian.eig_lowrank">eig_lowrank</a></code></li>
+</ul>
+</li>
+<li>
 <h4><code><a title="laplace.curvature.asdl.AsdlGGN" href="#laplace.curvature.asdl.AsdlGGN">AsdlGGN</a></code></h4>
 </li>
 <li>
diff --git a/docs/curvature/backpack.html b/docs/curvature/backpack.html
index 0e610d54..1ae69561 100644
--- a/docs/curvature/backpack.html
+++ b/docs/curvature/backpack.html
@@ -35,7 +35,7 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
 <dt id="laplace.curvature.backpack.BackPackInterface"><code class="flex name class">
 <span>class <span class="ident">BackPackInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for Backpack backend.</p></div>
@@ -48,18 +48,16 @@ <h3>Subclasses</h3>
 <li><a title="laplace.curvature.backpack.BackPackEF" href="#laplace.curvature.backpack.BackPackEF">BackPackEF</a></li>
 <li><a title="laplace.curvature.backpack.BackPackGGN" href="#laplace.curvature.backpack.BackPackGGN">BackPackGGN</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.backpack.BackPackInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_{\theta} f(x;\theta)</span><script type="math/tex">\nabla_{\theta} f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>
 using backpack's BatchGrad per output dimension.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -71,9 +69,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.backpack.BackPackInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -110,7 +105,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.backpack.BackPackGGN"><code class="flex name class">
 <span>class <span class="ident">BackPackGGN</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of the <code>GGNInterface</code> using Backpack.</p></div>
@@ -136,7 +131,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.backpack.BackPackEF"><code class="flex name class">
 <span>class <span class="ident">BackPackEF</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of <code>EFInterface</code> using Backpack.</p></div>
diff --git a/docs/curvature/curvature.html b/docs/curvature/curvature.html
index 084432df..645baae7 100644
--- a/docs/curvature/curvature.html
+++ b/docs/curvature/curvature.html
@@ -35,7 +35,7 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
 <dt id="laplace.curvature.curvature.CurvatureInterface"><code class="flex name class">
 <span>class <span class="ident">CurvatureInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface to access curvature for a model and corresponding likelihood.
@@ -45,12 +45,15 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 structures, for example, a block-diagonal one.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 </dl>
 <h2 id="attributes">Attributes</h2>
 <dl>
@@ -67,17 +70,15 @@ <h3>Subclasses</h3>
 <li><a title="laplace.curvature.curvature.EFInterface" href="#laplace.curvature.curvature.EFInterface">EFInterface</a></li>
 <li><a title="laplace.curvature.curvature.GGNInterface" href="#laplace.curvature.curvature.GGNInterface">GGNInterface</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.curvature.CurvatureInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_\theta f(x;\theta)</span><script type="math/tex">\nabla_\theta f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -90,15 +91,13 @@ <h2 id="returns">Returns</h2>
 </dl></div>
 </dd>
 <dt id="laplace.curvature.curvature.CurvatureInterface.last_layer_jacobians"><code class="name flex">
-<span>def <span class="ident">last_layer_jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">last_layer_jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})</span><script type="math/tex">\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})</script></span>
 only at current last-layer parameter <span><span class="MathJax_Preview">\theta_{\textrm{last}}</span><script type="math/tex">\theta_{\textrm{last}}</script></span>.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>&nbsp;</dd>
 </dl>
@@ -110,9 +109,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.curvature.CurvatureInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -175,7 +171,7 @@ <h2 id="returns">Returns</h2>
 <dl>
 <dt><strong><code>loss</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>&nbsp;</dd>
-<dt><strong><code>H</code></strong> :&ensp;<code><a title="laplace.matrix.Kron" href="../matrix.html#laplace.matrix.Kron">Kron</a></code></dt>
+<dt><strong><code>H</code></strong> :&ensp;<code><a title="laplace.utils.matrix.Kron" href="../utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code></dt>
 <dd>Kronecker factored Hessian approximation.</dd>
 </dl></div>
 </dd>
@@ -204,7 +200,7 @@ <h2 id="returns">Returns</h2>
 </dd>
 <dt id="laplace.curvature.curvature.GGNInterface"><code class="flex name class">
 <span>class <span class="ident">GGNInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Generalized Gauss-Newton or Fisher Curvature Interface.
@@ -212,12 +208,15 @@ <h2 id="returns">Returns</h2>
 In addition to <code><a title="laplace.curvature.curvature.CurvatureInterface" href="#laplace.curvature.curvature.CurvatureInterface">CurvatureInterface</a></code>, methods for Jacobians are required by subclasses.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 <dt><strong><code>stochastic</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>Fisher if stochastic else GGN</dd>
 </dl></div>
@@ -270,19 +269,22 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.curvature.EFInterface"><code class="flex name class">
 <span>class <span class="ident">EFInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for Empirical Fisher as Hessian approximation.
 In addition to <code><a title="laplace.curvature.curvature.CurvatureInterface" href="#laplace.curvature.curvature.CurvatureInterface">CurvatureInterface</a></code>, methods for gradients are required by subclasses.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 </dl>
 <h2 id="attributes">Attributes</h2>
 <dl>
diff --git a/docs/curvature/index.html b/docs/curvature/index.html
index 72e1203b..00001e2f 100644
--- a/docs/curvature/index.html
+++ b/docs/curvature/index.html
@@ -50,7 +50,7 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
 <dt id="laplace.curvature.CurvatureInterface"><code class="flex name class">
 <span>class <span class="ident">CurvatureInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface to access curvature for a model and corresponding likelihood.
@@ -60,12 +60,15 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 structures, for example, a block-diagonal one.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 </dl>
 <h2 id="attributes">Attributes</h2>
 <dl>
@@ -82,17 +85,15 @@ <h3>Subclasses</h3>
 <li><a title="laplace.curvature.curvature.EFInterface" href="curvature.html#laplace.curvature.curvature.EFInterface">EFInterface</a></li>
 <li><a title="laplace.curvature.curvature.GGNInterface" href="curvature.html#laplace.curvature.curvature.GGNInterface">GGNInterface</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.CurvatureInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_\theta f(x;\theta)</span><script type="math/tex">\nabla_\theta f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -105,15 +106,13 @@ <h2 id="returns">Returns</h2>
 </dl></div>
 </dd>
 <dt id="laplace.curvature.CurvatureInterface.last_layer_jacobians"><code class="name flex">
-<span>def <span class="ident">last_layer_jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">last_layer_jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})</span><script type="math/tex">\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})</script></span>
 only at current last-layer parameter <span><span class="MathJax_Preview">\theta_{\textrm{last}}</span><script type="math/tex">\theta_{\textrm{last}}</script></span>.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>&nbsp;</dd>
 </dl>
@@ -125,9 +124,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.CurvatureInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -190,7 +186,7 @@ <h2 id="returns">Returns</h2>
 <dl>
 <dt><strong><code>loss</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>&nbsp;</dd>
-<dt><strong><code>H</code></strong> :&ensp;<code><a title="laplace.matrix.Kron" href="../matrix.html#laplace.matrix.Kron">Kron</a></code></dt>
+<dt><strong><code>H</code></strong> :&ensp;<code><a title="laplace.utils.matrix.Kron" href="../utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code></dt>
 <dd>Kronecker factored Hessian approximation.</dd>
 </dl></div>
 </dd>
@@ -219,7 +215,7 @@ <h2 id="returns">Returns</h2>
 </dd>
 <dt id="laplace.curvature.GGNInterface"><code class="flex name class">
 <span>class <span class="ident">GGNInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Generalized Gauss-Newton or Fisher Curvature Interface.
@@ -227,12 +223,15 @@ <h2 id="returns">Returns</h2>
 In addition to <code><a title="laplace.curvature.CurvatureInterface" href="#laplace.curvature.CurvatureInterface">CurvatureInterface</a></code>, methods for Jacobians are required by subclasses.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 <dt><strong><code>stochastic</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>Fisher if stochastic else GGN</dd>
 </dl></div>
@@ -285,19 +284,22 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.EFInterface"><code class="flex name class">
 <span>class <span class="ident">EFInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for Empirical Fisher as Hessian approximation.
 In addition to <code><a title="laplace.curvature.CurvatureInterface" href="#laplace.curvature.CurvatureInterface">CurvatureInterface</a></code>, methods for gradients are required by subclasses.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="../feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="../utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>torch model (neural network)</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>last_layer</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
 <dd>only consider curvature of last layer</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.Tensor</code>, default=<code>None</code></dt>
+<dd>indices of the vectorized model parameters that define the subnetwork
+to apply the Laplace approximation over</dd>
 </dl>
 <h2 id="attributes">Attributes</h2>
 <dl>
@@ -356,7 +358,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.BackPackInterface"><code class="flex name class">
 <span>class <span class="ident">BackPackInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for Backpack backend.</p></div>
@@ -369,18 +371,16 @@ <h3>Subclasses</h3>
 <li><a title="laplace.curvature.backpack.BackPackEF" href="backpack.html#laplace.curvature.backpack.BackPackEF">BackPackEF</a></li>
 <li><a title="laplace.curvature.backpack.BackPackGGN" href="backpack.html#laplace.curvature.backpack.BackPackGGN">BackPackGGN</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.BackPackInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_{\theta} f(x;\theta)</span><script type="math/tex">\nabla_{\theta} f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>
 using backpack's BatchGrad per output dimension.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -392,9 +392,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.BackPackInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -431,7 +428,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.BackPackGGN"><code class="flex name class">
 <span>class <span class="ident">BackPackGGN</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of the <code><a title="laplace.curvature.GGNInterface" href="#laplace.curvature.GGNInterface">GGNInterface</a></code> using Backpack.</p></div>
@@ -457,7 +454,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.BackPackEF"><code class="flex name class">
 <span>class <span class="ident">BackPackEF</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of <code><a title="laplace.curvature.EFInterface" href="#laplace.curvature.EFInterface">EFInterface</a></code> using Backpack.</p></div>
@@ -483,7 +480,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.AsdlInterface"><code class="flex name class">
 <span>class <span class="ident">AsdlInterface</span></span>
-<span>(</span><span>model, likelihood, last_layer=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Interface for asdfghjkl backend.</p></div>
@@ -495,19 +492,18 @@ <h3>Subclasses</h3>
 <ul class="hlist">
 <li><a title="laplace.curvature.asdl.AsdlEF" href="asdl.html#laplace.curvature.asdl.AsdlEF">AsdlEF</a></li>
 <li><a title="laplace.curvature.asdl.AsdlGGN" href="asdl.html#laplace.curvature.asdl.AsdlGGN">AsdlGGN</a></li>
+<li><a title="laplace.curvature.asdl.AsdlHessian" href="asdl.html#laplace.curvature.asdl.AsdlHessian">AsdlHessian</a></li>
 </ul>
-<h3>Static methods</h3>
+<h3>Methods</h3>
 <dl>
 <dt id="laplace.curvature.AsdlInterface.jacobians"><code class="name flex">
-<span>def <span class="ident">jacobians</span></span>(<span>model, x)</span>
+<span>def <span class="ident">jacobians</span></span>(<span>self, x)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute Jacobians <span><span class="MathJax_Preview">\nabla_\theta f(x;\theta)</span><script type="math/tex">\nabla_\theta f(x;\theta)</script></span> at current parameter <span><span class="MathJax_Preview">\theta</span><script type="math/tex">\theta</script></span>
 using asdfghjkl's gradient per output dimension.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
-<dd>&nbsp;</dd>
 <dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>input data <code>(batch, input_shape)</code> on compatible device with model.</dd>
 </dl>
@@ -519,9 +515,6 @@ <h2 id="returns">Returns</h2>
 <dd>output function <code>(batch, outputs)</code></dd>
 </dl></div>
 </dd>
-</dl>
-<h3>Methods</h3>
-<dl>
 <dt id="laplace.curvature.AsdlInterface.gradients"><code class="name flex">
 <span>def <span class="ident">gradients</span></span>(<span>self, x, y)</span>
 </code></dt>
@@ -558,7 +551,7 @@ <h3>Inherited members</h3>
 </dd>
 <dt id="laplace.curvature.AsdlGGN"><code class="flex name class">
 <span>class <span class="ident">AsdlGGN</span></span>
-<span>(</span><span>model, likelihood, last_layer=False, stochastic=False)</span>
+<span>(</span><span>model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Implementation of the <code><a title="laplace.curvature.GGNInterface" href="#laplace.curvature.GGNInterface">GGNInterface</a></code> using asdfghjkl.</p></div>
@@ -608,6 +601,40 @@ <h3>Inherited members</h3>
 </li>
 </ul>
 </dd>
+<dt id="laplace.curvature.AsdlHessian"><code class="flex name class">
+<span>class <span class="ident">AsdlHessian</span></span>
+<span>(</span><span>model, likelihood, last_layer=False, low_rank=10)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Interface for asdfghjkl backend.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.curvature.asdl.AsdlInterface" href="asdl.html#laplace.curvature.asdl.AsdlInterface">AsdlInterface</a></li>
+<li><a title="laplace.curvature.curvature.CurvatureInterface" href="curvature.html#laplace.curvature.curvature.CurvatureInterface">CurvatureInterface</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.curvature.AsdlHessian.eig_lowrank"><code class="name flex">
+<span>def <span class="ident">eig_lowrank</span></span>(<span>self, data_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.curvature.asdl.AsdlInterface" href="asdl.html#laplace.curvature.asdl.AsdlInterface">AsdlInterface</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.diag" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.diag">diag</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.full" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.full">full</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.gradients" href="asdl.html#laplace.curvature.asdl.AsdlInterface.gradients">gradients</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.jacobians" href="asdl.html#laplace.curvature.asdl.AsdlInterface.jacobians">jacobians</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.kron" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.kron">kron</a></code></li>
+<li><code><a title="laplace.curvature.asdl.AsdlInterface.last_layer_jacobians" href="curvature.html#laplace.curvature.curvature.CurvatureInterface.last_layer_jacobians">last_layer_jacobians</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
 </dl>
 </section>
 </article>
@@ -680,6 +707,12 @@ <h4><code><a title="laplace.curvature.AsdlGGN" href="#laplace.curvature.AsdlGGN"
 <li>
 <h4><code><a title="laplace.curvature.AsdlEF" href="#laplace.curvature.AsdlEF">AsdlEF</a></code></h4>
 </li>
+<li>
+<h4><code><a title="laplace.curvature.AsdlHessian" href="#laplace.curvature.AsdlHessian">AsdlHessian</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.curvature.AsdlHessian.eig_lowrank" href="#laplace.curvature.AsdlHessian.eig_lowrank">eig_lowrank</a></code></li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>
diff --git a/docs/index.html b/docs/index.html
index 4f80e44c..2819a3ce 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -29,15 +29,15 @@ <h1 class="title">Package <code>laplace</code></h1>
 <img src="https://raw.githubusercontent.com/AlexImmer/Laplace/main/logo/laplace_logo.png" alt="Laplace" width="300"/>
 </div>
 <p><a href="https://travis-ci.com/AlexImmer/Laplace"><img alt="Main" src="https://travis-ci.com/AlexImmer/Laplace.svg?token=rpuRxEjQS6cCZi7ptL9y&amp;branch=main"></a></p>
-<p>The laplace package facilitates the application of Laplace approximations for entire neural networks or just their last layer.
+<p>The laplace package facilitates the application of Laplace approximations for entire neural networks, subnetworks of neural networks, or just their last layer.
 The package enables posterior approximations, marginal-likelihood estimation, and various posterior predictive computations.
 The library documentation is available at <a href="https://aleximmer.github.io/Laplace">https://aleximmer.github.io/Laplace</a>.</p>
 <p>There is also a corresponding paper, <a href="https://arxiv.org/abs/2106.14806"><em>Laplace Redux — Effortless Bayesian Deep Learning</em></a>, which introduces the library, provides an introduction to the Laplace approximation, reviews its use in deep learning, and empirically demonstrates its versatility and competitiveness. Please consider referring to the paper when using our library:</p>
-<pre><code class="language-bibtex">@article{daxberger2021laplace,
-  title={Laplace Redux--Effortless Bayesian Deep Learning},
-  author={Daxberger, Erik and Kristiadi, Agustinus and Immer, Alexander
-          and Eschenhagen, Runa and Bauer, Matthias and Hennig, Philipp},
-  journal={arXiv preprint arXiv:2106.14806},
+<pre><code class="language-bibtex">@inproceedings{laplace2021,
+  title={Laplace Redux--Effortless {B}ayesian Deep Learning},
+  author={Erik Daxberger and Agustinus Kristiadi and Alexander Immer 
+          and Runa Eschenhagen and Matthias Bauer and Philipp Hennig},
+  booktitle={{N}eur{IPS}},
   year={2021}
 }
 </code></pre>
@@ -56,34 +56,39 @@ <h2 id="setup">Setup</h2>
 <h2 id="structure">Structure</h2>
 <p>The laplace package consists of two main components:</p>
 <ol>
-<li>The subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py"><code>laplace.BaseLaplace</code></a> that implement different sparsity structures: different subsets of weights (<code>'all'</code> and <code>'last_layer'</code>) and different structures of the Hessian approximation (<code>'full'</code>, <code>'kron'</code>, and <code>'diag'</code>). This results in six currently available options: <code><a title="laplace.FullLaplace" href="#laplace.FullLaplace">FullLaplace</a></code>, <code><a title="laplace.KronLaplace" href="#laplace.KronLaplace">KronLaplace</a></code>, <code><a title="laplace.DiagLaplace" href="#laplace.DiagLaplace">DiagLaplace</a></code>, and the corresponding last-layer variations <code><a title="laplace.FullLLLaplace" href="#laplace.FullLLLaplace">FullLLLaplace</a></code>, <code><a title="laplace.KronLLLaplace" href="#laplace.KronLLLaplace">KronLLLaplace</a></code>,
-and <code><a title="laplace.DiagLLLaplace" href="#laplace.DiagLLLaplace">DiagLLLaplace</a></code>, which are all subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py"><code>laplace.LLLaplace</code></a>. All of these can be conveniently accessed via the <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py"><code>laplace.Laplace</code></a> function.</li>
+<li>The subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py"><code>laplace.BaseLaplace</code></a> that implement different sparsity structures: different subsets of weights (<code>'all'</code>, <code>'subnetwork'</code> and <code>'last_layer'</code>) and different structures of the Hessian approximation (<code>'full'</code>, <code>'kron'</code>, <code>'lowrank'</code> and <code>'diag'</code>). This results in <em>eight</em> currently available options: <code><a title="laplace.FullLaplace" href="#laplace.FullLaplace">FullLaplace</a></code>, <code><a title="laplace.KronLaplace" href="#laplace.KronLaplace">KronLaplace</a></code>, <code><a title="laplace.DiagLaplace" href="#laplace.DiagLaplace">DiagLaplace</a></code>, the corresponding last-layer variations <code><a title="laplace.FullLLLaplace" href="#laplace.FullLLLaplace">FullLLLaplace</a></code>, <code><a title="laplace.KronLLLaplace" href="#laplace.KronLLLaplace">KronLLLaplace</a></code>,
+and <code><a title="laplace.DiagLLLaplace" href="#laplace.DiagLLLaplace">DiagLLLaplace</a></code> (which are all subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py"><code>laplace.LLLaplace</code></a>), <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/subnetlaplace.py"><code>laplace.SubnetLaplace</code></a> (which only supports a <code>'full'</code> Hessian approximation) and <code><a title="laplace.LowRankLaplace" href="#laplace.LowRankLaplace">LowRankLaplace</a></code> (which only supports inference over <code>'all'</code> weights). All of these can be conveniently accessed via the <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py"><code>laplace.Laplace</code></a> function.</li>
 <li>The backends in <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/"><code>laplace.curvature</code></a> which provide access to Hessian approximations of
 the corresponding sparsity structures, for example, the diagonal GGN.</li>
 </ol>
 <p>Additionally, the package provides utilities for
-decomposing a neural network into feature extractor and last layer for <code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code> subclasses (<a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/feature_extractor.py"><code>laplace.feature_extractor</code></a>)
+decomposing a neural network into feature extractor and last layer for <code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code> subclasses (<a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/feature_extractor.py"><code>laplace.utils.feature_extractor</code></a>)
 and
-effectively dealing with Kronecker factors (<a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/matrix.py"><code>laplace.matrix</code></a>).</p>
+effectively dealing with Kronecker factors (<a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/matrix.py"><code>laplace.utils.matrix</code></a>).</p>
+<p>Finally, the package implements several options to select/specify a subnetwork for <code><a title="laplace.SubnetLaplace" href="#laplace.SubnetLaplace">SubnetLaplace</a></code> (as subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/utils/subnetmask.py"><code>laplace.utils.subnetmask.SubnetMask</code></a>).
+Automatic subnetwork selection strategies include: uniformly at random (<code><a title="laplace.utils.subnetmask.RandomSubnetMask" href="utils/subnetmask.html#laplace.utils.subnetmask.RandomSubnetMask">RandomSubnetMask</a></code>), by largest parameter magnitudes (<code>LargestMagnitudeSubnetMask</code>), and by largest marginal parameter variances (<code>LargestVarianceDiagLaplaceSubnetMask</code> and <code>LargestVarianceSWAGSubnetMask</code>).
+In addition to that, subnetworks can also be specified manually, by listing the names of either the model parameters (<code>ParamNameSubnetMask</code>) or modules (<code>ModuleNameSubnetMask</code>) to perform Laplace inference over.</p>
 <h2 id="extendability">Extendability</h2>
 <p>To extend the laplace package, new <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> subclasses can be designed, for example,
-a block-diagonal structure or subset-of-weights Laplace.
-Alternatively, extending or integrating backends (subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/curvature.py"><code>curvature.curvature</code></a>) allows to provide different Hessian
+Laplace with a block-diagonal Hessian structure.
+One can also implement custom subnetwork selection strategies as new subclasses of <code>SubnetMask</code>.</p>
+<p>Alternatively, extending or integrating backends (subclasses of <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/curvature.py"><code>curvature.curvature</code></a>) allows to provide different Hessian
 approximations to the Laplace approximations.
 For example, currently the <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/backpack.py"><code>curvature.BackPackInterface</code></a> based on <a href="https://github.com/f-dangel/backpack/">BackPACK</a> and <a href="https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/asdl.py"><code>curvature.AsdlInterface</code></a> based on <a href="https://github.com/kazukiosawa/asdfghjkl">ASDL</a> are available.
 The <code><a title="laplace.curvature.AsdlInterface" href="curvature/index.html#laplace.curvature.AsdlInterface">AsdlInterface</a></code> provides a Kronecker factored empirical Fisher while the <code><a title="laplace.curvature.BackPackInterface" href="curvature/index.html#laplace.curvature.BackPackInterface">BackPackInterface</a></code>
 does not, and only the <code><a title="laplace.curvature.BackPackInterface" href="curvature/index.html#laplace.curvature.BackPackInterface">BackPackInterface</a></code> provides access to Hessian approximations
 for a regression (MSELoss) loss function.</p>
 <h2 id="example-usage">Example usage</h2>
-<h3 id="post-hoc-prior-precision-tuning-of-last-layer-la"><em>Post-hoc</em> prior precision tuning of last-layer LA</h3>
+<h3 id="post-hoc-prior-precision-tuning-of-diagonal-la"><em>Post-hoc</em> prior precision tuning of diagonal LA</h3>
 <p>In the following example, a pre-trained model is loaded,
-then the Laplace approximation is fit to the training data,
+then the Laplace approximation is fit to the training data
+(using a diagonal Hessian approximation over all parameters),
 and the prior precision is optimized with cross-validation <code>'CV'</code>.
 After that, the resulting LA is used for prediction with
 the <code>'probit'</code> predictive for classification.</p>
 <pre><code class="language-python">from laplace import Laplace
 
-# pre-trained model
+# Pre-trained model
 model = load_map_model()  
 
 # User-specified LA flavor
@@ -97,7 +102,7 @@ <h3 id="post-hoc-prior-precision-tuning-of-last-layer-la"><em>Post-hoc</em> prio
 pred = la(x, link_approx='probit')
 </code></pre>
 <h3 id="differentiating-the-log-marginal-likelihood-wrt-hyperparameters">Differentiating the log marginal likelihood w.r.t. hyperparameters</h3>
-<p>The marginal likelihood can be used for model selection and is differentiable
+<p>The marginal likelihood can be used for model selection [10] and is differentiable
 for continuous hyperparameters like the prior precision or observation noise.
 Here, we fit the library default, KFAC last-layer LA and differentiate
 the log marginal likelihood.</p>
@@ -114,6 +119,41 @@ <h3 id="differentiating-the-log-marginal-likelihood-wrt-hyperparameters">Differe
 ml = la.log_marginal_likelihood(prior_prec, obs_noise)
 ml.backward()
 </code></pre>
+<h3 id="applying-the-la-over-only-a-subset-of-the-model-parameters">Applying the LA over only a subset of the model parameters</h3>
+<p>This example shows how to fit the Laplace approximation over only
+a subnetwork within a neural network (while keeping all other parameters
+fixed at their MAP estimates), as proposed in [11]. It also exemplifies
+different ways to specify the subnetwork to perform inference over.</p>
+<pre><code class="language-python">from laplace import Laplace
+
+# Pre-trained model
+model = load_model()
+
+# Examples of different ways to specify the subnetwork
+# via indices of the vectorized model parameters
+#
+# Example 1: select the 128 parameters with the largest magnitude
+from laplace.utils import LargestMagnitudeSubnetMask
+subnetwork_mask = LargestMagnitudeSubnetMask(model, n_params_subnet=128)
+subnetwork_indices = subnetwork_mask.select()
+
+# Example 2: specify the layers that define the subnetwork
+from laplace.utils import ModuleNameSubnetMask
+subnetwork_mask = ModuleNameSubnetMask(model, module_names=['layer.1', 'layer.3'])
+subnetwork_mask.select()
+subnetwork_indices = subnetwork_mask.indices
+
+# Example 3: manually define the subnetwork via custom subnetwork indices
+import torch
+subnetwork_indices = torch.tensor([0, 4, 11, 42, 123, 2021])
+
+# Define and fit subnetwork LA using the specified subnetwork indices
+la = Laplace(model, 'classification',
+             subset_of_weights='subnetwork',
+             hessian_structure='full',
+             subnetwork_indices=subnetwork_indices)
+la.fit(train_loader)
+</code></pre>
 <h2 id="documentation">Documentation</h2>
 <p>The documentation is available <a href="https://aleximmer.github.io/Laplace">here</a> or can be generated and/or viewed locally:</p>
 <pre><code class="language-bash"># assuming the repository was cloned
@@ -124,7 +164,7 @@ <h2 id="documentation">Documentation</h2>
 pdoc --http 0.0.0.0:8080 laplace --template-dir template
 </code></pre>
 <h2 id="references">References</h2>
-<p>This package relies on various improvements to the Laplace approximation for neural networks, which was originally due to MacKay [1].</p>
+<p>This package relies on various improvements to the Laplace approximation for neural networks, which was originally due to MacKay [1]. Please consider citing the respective papers if you use any of their proposed methods via our laplace library.</p>
 <ul>
 <li>[1] MacKay, DJC. <a href="https://authors.library.caltech.edu/13793/"><em>A Practical Bayesian Framework for Backpropagation Networks</em></a>. Neural Computation 1992.</li>
 <li>[2] Gibbs, M. N. <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.1130&amp;rep=rep1&amp;type=pdf"><em>Bayesian Gaussian Processes for Regression and Classification</em></a>. PhD Thesis 1997.</li>
@@ -134,7 +174,9 @@ <h2 id="references">References</h2>
 <li>[6] Khan, M. E., Immer, A., Abedi, E., Korzepa, M. <a href="https://arxiv.org/abs/1906.01930"><em>Approximate Inference Turns Deep Networks into Gaussian Processes</em></a>. NeurIPS 2019.</li>
 <li>[7] Kristiadi, A., Hein, M., Hennig, P. <a href="https://arxiv.org/abs/2002.10118"><em>Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks</em></a>. ICML 2020.</li>
 <li>[8] Immer, A., Korzepa, M., Bauer, M. <a href="https://arxiv.org/abs/2008.08400"><em>Improving predictions of Bayesian neural nets via local linearization</em></a>. AISTATS 2021.</li>
-<li>[9] Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, EM. <a href="https://arxiv.org/abs/2104.04975"><em>Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning</em></a>. ICML 2021.</li>
+<li>[9] Sharma, A., Azizan, N., Pavone, M. <a href="https://arxiv.org/abs/2102.12567"><em>Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks</em></a>. UAI 2021.</li>
+<li>[10] Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, EM. <a href="https://arxiv.org/abs/2104.04975"><em>Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning</em></a>. ICML 2021.</li>
+<li>[11] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM. <a href="https://arxiv.org/abs/2010.14689"><em>Bayesian Deep Learning via Subnetwork Inference</em></a>. ICML 2021.</li>
 </ul>
 <h2 id="full-example-optimization-of-the-marginal-likelihood-and-prediction">Full example: Optimization of the marginal likelihood and prediction</h2>
 <h3 id="sinusoidal-toy-data">Sinusoidal toy data</h3>
@@ -326,10 +368,6 @@ <h2 class="section-title" id="header-submodules">Sub-modules</h2>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt><code class="name"><a title="laplace.feature_extractor" href="feature_extractor.html">laplace.feature_extractor</a></code></dt>
-<dd>
-<div class="desc"></div>
-</dd>
 <dt><code class="name"><a title="laplace.laplace" href="laplace.html">laplace.laplace</a></code></dt>
 <dd>
 <div class="desc"></div>
@@ -338,11 +376,11 @@ <h2 class="section-title" id="header-submodules">Sub-modules</h2>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt><code class="name"><a title="laplace.matrix" href="matrix.html">laplace.matrix</a></code></dt>
+<dt><code class="name"><a title="laplace.subnetlaplace" href="subnetlaplace.html">laplace.subnetlaplace</a></code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt><code class="name"><a title="laplace.utils" href="utils.html">laplace.utils</a></code></dt>
+<dt><code class="name"><a title="laplace.utils" href="utils/index.html">laplace.utils</a></code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
@@ -364,9 +402,9 @@ <h2 id="parameters">Parameters</h2>
 <dd>&nbsp;</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
-<dt><strong><code>subset_of_weights</code></strong> :&ensp;<code>{'last_layer', 'all'}</code>, default=<code>'last_layer'</code></dt>
+<dt><strong><code>subset_of_weights</code></strong> :&ensp;<code>{'last_layer', 'subnetwork', 'all'}</code>, default=<code>'last_layer'</code></dt>
 <dd>subset of weights to consider for inference</dd>
-<dt><strong><code>hessian_structure</code></strong> :&ensp;<code>{'diag', 'kron', 'full'}</code>, default=<code>'kron'</code></dt>
+<dt><strong><code>hessian_structure</code></strong> :&ensp;<code>{'diag', 'kron', 'full', 'lowrank'}</code>, default=<code>'kron'</code></dt>
 <dd>structure of the Hessian approximation</dd>
 </dl>
 <h2 id="returns">Returns</h2>
@@ -636,7 +674,8 @@ <h3>Subclasses</h3>
 <li><a title="laplace.baselaplace.DiagLaplace" href="baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></li>
 <li><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
 <li><a title="laplace.baselaplace.KronLaplace" href="baselaplace.html#laplace.baselaplace.KronLaplace">KronLaplace</a></li>
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.baselaplace.LowRankLaplace" href="baselaplace.html#laplace.baselaplace.LowRankLaplace">LowRankLaplace</a></li>
+<li><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 </ul>
 <h3>Instance variables</h3>
 <dl>
@@ -697,7 +736,7 @@ <h2 id="returns">Returns</h2>
 <h3>Methods</h3>
 <dl>
 <dt id="laplace.ParametricLaplace.fit"><code class="name flex">
-<span>def <span class="ident">fit</span></span>(<span>self, train_loader)</span>
+<span>def <span class="ident">fit</span></span>(<span>self, train_loader, override=True)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Fit the local Laplace approximation at the parameters of the model.</p>
@@ -706,6 +745,45 @@ <h2 id="parameters">Parameters</h2>
 <dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
 <dd>each iterate is a training batch (X, y);
 <code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+<dt><strong><code>override</code></strong> :&ensp;<code>bool</code>, default=<code>True</code></dt>
+<dd>whether to initialize H, loss, and n_data again; setting to False is useful for
+online learning settings to accumulate a sequential posterior approximation.</dd>
+</dl></div>
+</dd>
+<dt id="laplace.ParametricLaplace.square_norm"><code class="name flex">
+<span>def <span class="ident">square_norm</span></span>(<span>self, value)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute the square norm under post. Precision with <code>value-self.mean</code> as 𝛥:
+<span><span class="MathJax_Preview">
+\Delta^
+op P \Delta
+</span><script type="math/tex; mode=display">
+\Delta^
+op P \Delta
+</script></span>
+Returns</p>
+<hr>
+<dl>
+<dt><code>square_form</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.ParametricLaplace.log_prob"><code class="name flex">
+<span>def <span class="ident">log_prob</span></span>(<span>self, value, normalized=True)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute the log probability under the (current) Laplace approximation.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>normalized</code></strong> :&ensp;<code>bool</code>, default=<code>True</code></dt>
+<dd>whether to return log of a properly normalized Gaussian or just the
+terms that depend on <code>value</code>.</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>log_prob</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
 </dl></div>
 </dd>
 <dt id="laplace.ParametricLaplace.log_marginal_likelihood"><code class="name flex">
@@ -826,6 +904,7 @@ <h3>Ancestors</h3>
 <h3>Subclasses</h3>
 <ul class="hlist">
 <li><a title="laplace.lllaplace.FullLLLaplace" href="lllaplace.html#laplace.lllaplace.FullLLLaplace">FullLLLaplace</a></li>
+<li><a title="laplace.subnetlaplace.SubnetLaplace" href="subnetlaplace.html#laplace.subnetlaplace.SubnetLaplace">SubnetLaplace</a></li>
 </ul>
 <h3>Instance variables</h3>
 <dl>
@@ -869,11 +948,13 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -888,7 +969,7 @@ <h3>Inherited members</h3>
 Mathematically, we have for each parameter group, e.g., torch.nn.Module,
 that \P\approx Q \otimes H.
 See <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface and see
-<code><a title="laplace.matrix.Kron" href="matrix.html#laplace.matrix.Kron">Kron</a></code> and <code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
+<code><a title="laplace.utils.matrix.Kron" href="utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code> and <code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
 the Kronecker factors. <code>Kron</code> is used to aggregate factors by summing up and
 <code>KronDecomposed</code> is used to add the prior, a Hessian factor (e.g. temperature),
 and computing posterior covariances, marginal likelihood, etc.
@@ -909,7 +990,7 @@ <h3>Instance variables</h3>
 <div class="desc"><p>Kronecker factored Posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>.</p>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>precision</code></strong> :&ensp;<code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code></dt>
+<dt><strong><code>precision</code></strong> :&ensp;<code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code></dt>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
@@ -929,11 +1010,13 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -997,11 +1080,80 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.LowRankLaplace"><code class="flex name class">
+<span>class <span class="ident">LowRankLaplace</span></span>
+<span>(</span><span>model, likelihood, sigma_noise=1, prior_precision=1, prior_mean=0, temperature=1, backend=laplace.curvature.asdl.AsdlHessian, backend_kwargs=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Laplace approximation with low-rank log likelihood Hessian (approximation).
+The low-rank matrix is represented by an eigendecomposition (vecs, values).
+Based on the chosen <code>backend</code>, either a true Hessian or, for example, GGN
+approximation could be used.
+The posterior precision is computed as
+<span><span class="MathJax_Preview"> P = V diag(l) V^T + P_0.</span><script type="math/tex"> P = V diag(l) V^T + P_0.</script></span>
+To sample, compute the functional variance, and log determinant, algebraic tricks
+are usedto reduce the costs of inversion to the that of a <span><span class="MathJax_Preview">K
+imes K</span><script type="math/tex">K
+imes K</script></span> matrix
+if we have a rank of K.</p>
+<p>See <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
+<li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.LowRankLaplace.V"><code class="name">var <span class="ident">V</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.LowRankLaplace.Kinv"><code class="name">var <span class="ident">Kinv</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.LowRankLaplace.posterior_precision"><code class="name">var <span class="ident">posterior_precision</span></code></dt>
+<dd>
+<div class="desc"><p>Return correctly scaled posterior precision that would be constructed
+as H[0] @ diag(H[1]) @ H[0].T + self.prior_precision_diag.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>H</code></strong> :&ensp;<code>tuple(eigenvectors, eigenvalues)</code></dt>
+<dd>scaled self.H with temperature and loss factors.</dd>
+<dt><strong><code>prior_precision_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>diagonal prior precision shape <code>parameters</code> to be added to H.</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.baselaplace.ParametricLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -1035,7 +1187,7 @@ <h3>Inherited members</h3>
 all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
-<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.feature_extractor.FeatureExtractor" href="feature_extractor.html#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
 <dd>&nbsp;</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>determines the log likelihood Hessian approximation</dd>
@@ -1092,11 +1244,13 @@ <h3>Inherited members</h3>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
 <li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -1113,30 +1267,36 @@ <h3>Inherited members</h3>
 See <code><a title="laplace.FullLaplace" href="#laplace.FullLaplace">FullLaplace</a></code>, <code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code>, and <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="lllaplace.html#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
 <li><code><b><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.FullLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.FullLaplace.posterior_covariance" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_covariance">posterior_covariance</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_precision">posterior_precision</a></code></li>
 <li><code><a title="laplace.baselaplace.FullLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_scale">posterior_scale</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
 </ul>
 </li>
 </ul>
@@ -1151,35 +1311,37 @@ <h3>Inherited members</h3>
 Mathematically, we have for the last parameter group, i.e., torch.nn.Linear,
 that \P\approx Q \otimes H.
 See <code><a title="laplace.KronLaplace" href="#laplace.KronLaplace">KronLaplace</a></code>, <code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code>, and <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface and see
-<code><a title="laplace.matrix.Kron" href="matrix.html#laplace.matrix.Kron">Kron</a></code> and <code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
+<code><a title="laplace.utils.matrix.Kron" href="utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code> and <code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
 the Kronecker factors. <code>Kron</code> is used to aggregate factors by summing up and
 <code>KronDecomposed</code> is used to add the prior, a Hessian factor (e.g. temperature),
 and computing posterior covariances, marginal likelihood, etc.
 Use of <code>damping</code> is possible by initializing or setting <code>damping=True</code>.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.KronLaplace" href="baselaplace.html#laplace.baselaplace.KronLaplace">KronLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
-<li><code><b><a title="laplace.baselaplace.KronLaplace" href="baselaplace.html#laplace.baselaplace.KronLaplace">KronLaplace</a></b></code>:
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.KronLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.KronLaplace.posterior_precision">posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="lllaplace.html#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -1195,30 +1357,143 @@ <h3>Inherited members</h3>
 See <code><a title="laplace.DiagLaplace" href="#laplace.DiagLaplace">DiagLaplace</a></code>, <code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code>, and <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.DiagLaplace" href="baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="lllaplace.html#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="lllaplace.html#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
 <li><code><b><a title="laplace.baselaplace.DiagLaplace" href="baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.DiagLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_precision">posterior_precision</a></code></li>
 <li><code><a title="laplace.baselaplace.DiagLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_scale">posterior_scale</a></code></li>
 <li><code><a title="laplace.baselaplace.DiagLaplace.posterior_variance" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_variance">posterior_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.SubnetLaplace"><code class="flex name class">
+<span>class <span class="ident">SubnetLaplace</span></span>
+<span>(</span><span>model, likelihood, subnetwork_indices, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, backend_kwargs=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Class for subnetwork Laplace, which computes the Laplace approximation over
+just a subset of the model parameters (i.e. a subnetwork within the neural network),
+as proposed in [1]. Subnetwork Laplace only supports a full Hessian approximation; other
+approximations could be used in theory, but would not make as much sense conceptually.</p>
+<p>A Laplace approximation is represented by a MAP which is given by the
+<code>model</code> parameter and a posterior precision or covariance specifying
+a Gaussian distribution <span><span class="MathJax_Preview">\mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex">\mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.
+Here, only a subset of the model parameters (i.e. a subnetwork of the
+neural network) are treated probabilistically.
+The goal of this class is to compute the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>
+which sums as
+<span><span class="MathJax_Preview">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</span><script type="math/tex; mode=display">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</script></span>
+The prior is assumed to be Gaussian and therefore we have a simple form for
+<span><span class="MathJax_Preview">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </span><script type="math/tex">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </script></span>.
+In particular, we assume a scalar or diagonal prior precision so that in
+all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p>
+<p>The subnetwork Laplace approximation only supports a full, i.e., dense, log likelihood
+Hessian approximation and hence posterior precision.
+Based on the chosen <code>backend</code>
+parameter, the full approximation can be, for example, a generalized Gauss-Newton
+matrix.
+Mathematically, we have <span><span class="MathJax_Preview">P \in \mathbb{R}^{P \times P}</span><script type="math/tex">P \in \mathbb{R}^{P \times P}</script></span>.
+See <code><a title="laplace.FullLaplace" href="#laplace.FullLaplace">FullLaplace</a></code> and <code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace</a></code> for the full interface.</p>
+<h2 id="references">References</h2>
+<p>[1] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM.
+<a href="https://arxiv.org/abs/2010.14689"><em>Bayesian Deep Learning via Subnetwork Inference</em></a>.
+ICML 2021.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
+<dd>determines the log likelihood Hessian approximation</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork to apply the Laplace approximation over</dd>
+<dt><strong><code>sigma_noise</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>observation noise for the regression setting; must be 1 for classification</dd>
+<dt><strong><code>prior_precision</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>prior precision of a Gaussian prior (= weight decay);
+can be scalar, per-layer, or diagonal in the most general case</dd>
+<dt><strong><code>prior_mean</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>0</code></dt>
+<dd>prior mean of a Gaussian prior, useful for continual learning</dd>
+<dt><strong><code>temperature</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
+<dd>temperature of the likelihood; lower temperature leads to more
+concentrated posterior and vice versa.</dd>
+<dt><strong><code>backend</code></strong> :&ensp;<code>subclasses</code> of <code><a title="laplace.curvature.CurvatureInterface" href="curvature/index.html#laplace.curvature.CurvatureInterface">CurvatureInterface</a></code></dt>
+<dd>backend for access to curvature/Hessian approximations</dd>
+<dt><strong><code>backend_kwargs</code></strong> :&ensp;<code>dict</code>, default=<code>None</code></dt>
+<dd>arguments passed to the backend on initialization, for example to
+set the number of MC samples for stochastic approximations.</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
+<li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
+<li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.SubnetLaplace.prior_precision_diag"><code class="name">var <span class="ident">prior_precision_diag</span></code></dt>
+<dd>
+<div class="desc"><p>Obtain the diagonal prior precision <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> constructed from either
+a scalar or diagonal prior precision.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>prior_precision_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.baselaplace.FullLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_covariance" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_covariance">posterior_covariance</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_scale">posterior_scale</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -1234,8 +1509,9 @@ <h1>Index</h1>
 <li><a href="#structure">Structure</a></li>
 <li><a href="#extendability">Extendability</a></li>
 <li><a href="#example-usage">Example usage</a><ul>
-<li><a href="#post-hoc-prior-precision-tuning-of-last-layer-la">Post-hoc prior precision tuning of last-layer LA</a></li>
+<li><a href="#post-hoc-prior-precision-tuning-of-diagonal-la">Post-hoc prior precision tuning of diagonal LA</a></li>
 <li><a href="#differentiating-the-log-marginal-likelihood-wrt-hyperparameters">Differentiating the log marginal likelihood w.r.t. hyperparameters</a></li>
+<li><a href="#applying-the-la-over-only-a-subset-of-the-model-parameters">Applying the LA over only a subset of the model parameters</a></li>
 </ul>
 </li>
 <li><a href="#documentation">Documentation</a></li>
@@ -1262,11 +1538,10 @@ <h1>Index</h1>
 <ul>
 <li><code><a title="laplace.baselaplace" href="baselaplace.html">laplace.baselaplace</a></code></li>
 <li><code><a title="laplace.curvature" href="curvature/index.html">laplace.curvature</a></code></li>
-<li><code><a title="laplace.feature_extractor" href="feature_extractor.html">laplace.feature_extractor</a></code></li>
 <li><code><a title="laplace.laplace" href="laplace.html">laplace.laplace</a></code></li>
 <li><code><a title="laplace.lllaplace" href="lllaplace.html">laplace.lllaplace</a></code></li>
-<li><code><a title="laplace.matrix" href="matrix.html">laplace.matrix</a></code></li>
-<li><code><a title="laplace.utils" href="utils.html">laplace.utils</a></code></li>
+<li><code><a title="laplace.subnetlaplace" href="subnetlaplace.html">laplace.subnetlaplace</a></code></li>
+<li><code><a title="laplace.utils" href="utils/index.html">laplace.utils</a></code></li>
 </ul>
 </li>
 <li><h3><a href="#header-functions">Functions</a></h3>
@@ -1290,6 +1565,8 @@ <h4><code><a title="laplace.BaseLaplace" href="#laplace.BaseLaplace">BaseLaplace
 <h4><code><a title="laplace.ParametricLaplace" href="#laplace.ParametricLaplace">ParametricLaplace</a></code></h4>
 <ul class="">
 <li><code><a title="laplace.ParametricLaplace.fit" href="#laplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.ParametricLaplace.square_norm" href="#laplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+<li><code><a title="laplace.ParametricLaplace.log_prob" href="#laplace.ParametricLaplace.log_prob">log_prob</a></code></li>
 <li><code><a title="laplace.ParametricLaplace.log_marginal_likelihood" href="#laplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
 <li><code><a title="laplace.ParametricLaplace.predictive_samples" href="#laplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
 <li><code><a title="laplace.ParametricLaplace.functional_variance" href="#laplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
@@ -1307,6 +1584,9 @@ <h4><code><a title="laplace.KronLaplace" href="#laplace.KronLaplace">KronLaplace
 <h4><code><a title="laplace.DiagLaplace" href="#laplace.DiagLaplace">DiagLaplace</a></code></h4>
 </li>
 <li>
+<h4><code><a title="laplace.LowRankLaplace" href="#laplace.LowRankLaplace">LowRankLaplace</a></code></h4>
+</li>
+<li>
 <h4><code><a title="laplace.LLLaplace" href="#laplace.LLLaplace">LLLaplace</a></code></h4>
 </li>
 <li>
@@ -1318,6 +1598,9 @@ <h4><code><a title="laplace.KronLLLaplace" href="#laplace.KronLLLaplace">KronLLL
 <li>
 <h4><code><a title="laplace.DiagLLLaplace" href="#laplace.DiagLLLaplace">DiagLLLaplace</a></code></h4>
 </li>
+<li>
+<h4><code><a title="laplace.SubnetLaplace" href="#laplace.SubnetLaplace">SubnetLaplace</a></code></h4>
+</li>
 </ul>
 </li>
 </ul>
diff --git a/docs/laplace.html b/docs/laplace.html
index d72602d6..99dae2b8 100644
--- a/docs/laplace.html
+++ b/docs/laplace.html
@@ -42,9 +42,9 @@ <h2 id="parameters">Parameters</h2>
 <dd>&nbsp;</dd>
 <dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
 <dd>&nbsp;</dd>
-<dt><strong><code>subset_of_weights</code></strong> :&ensp;<code>{'last_layer', 'all'}</code>, default=<code>'last_layer'</code></dt>
+<dt><strong><code>subset_of_weights</code></strong> :&ensp;<code>{'last_layer', 'subnetwork', 'all'}</code>, default=<code>'last_layer'</code></dt>
 <dd>subset of weights to consider for inference</dd>
-<dt><strong><code>hessian_structure</code></strong> :&ensp;<code>{'diag', 'kron', 'full'}</code>, default=<code>'kron'</code></dt>
+<dt><strong><code>hessian_structure</code></strong> :&ensp;<code>{'diag', 'kron', 'full', 'lowrank'}</code>, default=<code>'kron'</code></dt>
 <dd>structure of the Hessian approximation</dd>
 </dl>
 <h2 id="returns">Returns</h2>
diff --git a/docs/lllaplace.html b/docs/lllaplace.html
index 108e9b0b..6ea940b2 100644
--- a/docs/lllaplace.html
+++ b/docs/lllaplace.html
@@ -33,6 +33,103 @@ <h1 class="title">Module <code>laplace.lllaplace</code></h1>
 <section>
 <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
+<dt id="laplace.lllaplace.LLLaplace"><code class="flex name class">
+<span>class <span class="ident">LLLaplace</span></span>
+<span>(</span><span>model, likelihood, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, last_layer_name=None, backend_kwargs=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Baseclass for all last-layer Laplace approximations in this library.
+Subclasses specify the structure of the Hessian approximation.
+See <code>BaseLaplace</code> for the full interface.</p>
+<p>A Laplace approximation is represented by a MAP which is given by the
+<code>model</code> parameter and a posterior precision or covariance specifying
+a Gaussian distribution <span><span class="MathJax_Preview">\mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex">\mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.
+Here, only the parameters of the last layer of the neural network
+are treated probabilistically.
+The goal of this class is to compute the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>
+which sums as
+<span><span class="MathJax_Preview">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</span><script type="math/tex; mode=display">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</script></span>
+Every subclass implements different approximations to the log likelihood Hessians,
+for example, a diagonal one. The prior is assumed to be Gaussian and therefore we have
+a simple form for <span><span class="MathJax_Preview">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </span><script type="math/tex">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </script></span>.
+In particular, we assume a scalar or diagonal prior precision so that in
+all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
+<dd>determines the log likelihood Hessian approximation</dd>
+<dt><strong><code>sigma_noise</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>observation noise for the regression setting; must be 1 for classification</dd>
+<dt><strong><code>prior_precision</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>prior precision of a Gaussian prior (= weight decay);
+can be scalar, per-layer, or diagonal in the most general case</dd>
+<dt><strong><code>prior_mean</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>0</code></dt>
+<dd>prior mean of a Gaussian prior, useful for continual learning</dd>
+<dt><strong><code>temperature</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
+<dd>temperature of the likelihood; lower temperature leads to more
+concentrated posterior and vice versa.</dd>
+<dt><strong><code>backend</code></strong> :&ensp;<code>subclasses</code> of <code><a title="laplace.curvature.CurvatureInterface" href="curvature/index.html#laplace.curvature.CurvatureInterface">CurvatureInterface</a></code></dt>
+<dd>backend for access to curvature/Hessian approximations</dd>
+<dt><strong><code>last_layer_name</code></strong> :&ensp;<code>str</code>, default=<code>None</code></dt>
+<dd>name of the model's last layer, if None it will be determined automatically</dd>
+<dt><strong><code>backend_kwargs</code></strong> :&ensp;<code>dict</code>, default=<code>None</code></dt>
+<dd>arguments passed to the backend on initialization, for example to
+set the number of MC samples for stochastic approximations.</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
+<li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
+</ul>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.lllaplace.DiagLLLaplace" href="#laplace.lllaplace.DiagLLLaplace">DiagLLLaplace</a></li>
+<li><a title="laplace.lllaplace.FullLLLaplace" href="#laplace.lllaplace.FullLLLaplace">FullLLLaplace</a></li>
+<li><a title="laplace.lllaplace.KronLLLaplace" href="#laplace.lllaplace.KronLLLaplace">KronLLLaplace</a></li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.lllaplace.LLLaplace.prior_precision_diag"><code class="name">var <span class="ident">prior_precision_diag</span></code></dt>
+<dd>
+<div class="desc"><p>Obtain the diagonal prior precision <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> constructed from either
+a scalar or diagonal prior precision.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>prior_precision_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.baselaplace.ParametricLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.ParametricLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
 <dt id="laplace.lllaplace.FullLLLaplace"><code class="flex name class">
 <span>class <span class="ident">FullLLLaplace</span></span>
 <span>(</span><span>model, likelihood, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, last_layer_name=None, backend_kwargs=None)</span>
@@ -42,33 +139,39 @@ <h2 class="section-title" id="header-classes">Classes</h2>
 and hence posterior precision. Based on the chosen <code>backend</code> parameter, the full
 approximation can be, for example, a generalized Gauss-Newton matrix.
 Mathematically, we have <span><span class="MathJax_Preview">P \in \mathbb{R}^{P \times P}</span><script type="math/tex">P \in \mathbb{R}^{P \times P}</script></span>.
-See <code>FullLaplace</code>, <code>LLLaplace</code>, and <code>BaseLaplace</code> for the full interface.</p></div>
+See <code>FullLaplace</code>, <code><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></code>, and <code>BaseLaplace</code> for the full interface.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
 <li><code><b><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.FullLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
 <li><code><a title="laplace.baselaplace.FullLaplace.posterior_covariance" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_covariance">posterior_covariance</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_precision">posterior_precision</a></code></li>
 <li><code><a title="laplace.baselaplace.FullLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_scale">posterior_scale</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.FullLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
 </ul>
 </li>
 </ul>
@@ -82,36 +185,38 @@ <h3>Inherited members</h3>
 and hence posterior precision.
 Mathematically, we have for the last parameter group, i.e., torch.nn.Linear,
 that \P\approx Q \otimes H.
-See <code>KronLaplace</code>, <code>LLLaplace</code>, and <code>BaseLaplace</code> for the full interface and see
-<code><a title="laplace.matrix.Kron" href="matrix.html#laplace.matrix.Kron">Kron</a></code> and <code><a title="laplace.matrix.KronDecomposed" href="matrix.html#laplace.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
+See <code>KronLaplace</code>, <code><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></code>, and <code>BaseLaplace</code> for the full interface and see
+<code><a title="laplace.utils.matrix.Kron" href="utils/matrix.html#laplace.utils.matrix.Kron">Kron</a></code> and <code><a title="laplace.utils.matrix.KronDecomposed" href="utils/matrix.html#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> for the structure of
 the Kronecker factors. <code>Kron</code> is used to aggregate factors by summing up and
 <code>KronDecomposed</code> is used to add the prior, a Hessian factor (e.g. temperature),
 and computing posterior covariances, marginal likelihood, etc.
 Use of <code>damping</code> is possible by initializing or setting <code>damping=True</code>.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.KronLaplace" href="baselaplace.html#laplace.baselaplace.KronLaplace">KronLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
-<li><code><b><a title="laplace.baselaplace.KronLaplace" href="baselaplace.html#laplace.baselaplace.KronLaplace">KronLaplace</a></b></code>:
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.KronLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.KronLaplace.posterior_precision">posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.KronLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
 </ul>
 </li>
 </ul>
@@ -124,33 +229,39 @@ <h3>Inherited members</h3>
 <div class="desc"><p>Last-layer Laplace approximation with diagonal log likelihood Hessian approximation
 and hence posterior precision.
 Mathematically, we have <span><span class="MathJax_Preview">P \approx \textrm{diag}(P)</span><script type="math/tex">P \approx \textrm{diag}(P)</script></span>.
-See <code>DiagLaplace</code>, <code>LLLaplace</code>, and <code>BaseLaplace</code> for the full interface.</p></div>
+See <code>DiagLaplace</code>, <code><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></code>, and <code>BaseLaplace</code> for the full interface.</p></div>
 <h3>Ancestors</h3>
 <ul class="hlist">
-<li>laplace.lllaplace.LLLaplace</li>
+<li><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></li>
 <li><a title="laplace.baselaplace.DiagLaplace" href="baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></li>
 <li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
 <li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
 </ul>
 <h3>Inherited members</h3>
 <ul class="hlist">
+<li><code><b><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.lllaplace.LLLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.prior_precision_diag" href="#laplace.lllaplace.LLLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.lllaplace.LLLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
 <li><code><b><a title="laplace.baselaplace.DiagLaplace" href="baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></b></code>:
 <ul class="hlist">
-<li><code><a title="laplace.baselaplace.DiagLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_precision">posterior_precision</a></code></li>
 <li><code><a title="laplace.baselaplace.DiagLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_scale">posterior_scale</a></code></li>
 <li><code><a title="laplace.baselaplace.DiagLaplace.posterior_variance" href="baselaplace.html#laplace.baselaplace.DiagLaplace.posterior_variance">posterior_variance</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.prior_precision_diag" href="baselaplace.html#laplace.baselaplace.BaseLaplace.prior_precision_diag">prior_precision_diag</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
-<li><code><a title="laplace.baselaplace.DiagLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
 </ul>
 </li>
 </ul>
@@ -172,6 +283,9 @@ <h1>Index</h1>
 <li><h3><a href="#header-classes">Classes</a></h3>
 <ul>
 <li>
+<h4><code><a title="laplace.lllaplace.LLLaplace" href="#laplace.lllaplace.LLLaplace">LLLaplace</a></code></h4>
+</li>
+<li>
 <h4><code><a title="laplace.lllaplace.FullLLLaplace" href="#laplace.lllaplace.FullLLLaplace">FullLLLaplace</a></code></h4>
 </li>
 <li>
diff --git a/docs/regression_example.png b/docs/regression_example.png
index c6a94587..94f94c34 100644
Binary files a/docs/regression_example.png and b/docs/regression_example.png differ
diff --git a/docs/regression_example_online.png b/docs/regression_example_online.png
index 06f66afd..2f30ee5f 100644
Binary files a/docs/regression_example_online.png and b/docs/regression_example_online.png differ
diff --git a/docs/subnetlaplace.html b/docs/subnetlaplace.html
new file mode 100644
index 00000000..31d93975
--- /dev/null
+++ b/docs/subnetlaplace.html
@@ -0,0 +1,171 @@
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
+<meta name="generator" content="pdoc 0.10.0" />
+<title>laplace.subnetlaplace API documentation</title>
+<meta name="description" content="" />
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
+<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
+<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
+<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
+<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
+<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
+<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
+<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
+</head>
+<body>
+<main>
+<article id="content">
+<header>
+<h1 class="title">Module <code>laplace.subnetlaplace</code></h1>
+</header>
+<section id="section-intro">
+</section>
+<section>
+</section>
+<section>
+</section>
+<section>
+</section>
+<section>
+<h2 class="section-title" id="header-classes">Classes</h2>
+<dl>
+<dt id="laplace.subnetlaplace.SubnetLaplace"><code class="flex name class">
+<span>class <span class="ident">SubnetLaplace</span></span>
+<span>(</span><span>model, likelihood, subnetwork_indices, sigma_noise=1.0, prior_precision=1.0, prior_mean=0.0, temperature=1.0, backend=laplace.curvature.backpack.BackPackGGN, backend_kwargs=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Class for subnetwork Laplace, which computes the Laplace approximation over
+just a subset of the model parameters (i.e. a subnetwork within the neural network),
+as proposed in [1]. Subnetwork Laplace only supports a full Hessian approximation; other
+approximations could be used in theory, but would not make as much sense conceptually.</p>
+<p>A Laplace approximation is represented by a MAP which is given by the
+<code>model</code> parameter and a posterior precision or covariance specifying
+a Gaussian distribution <span><span class="MathJax_Preview">\mathcal{N}(\theta_{MAP}, P^{-1})</span><script type="math/tex">\mathcal{N}(\theta_{MAP}, P^{-1})</script></span>.
+Here, only a subset of the model parameters (i.e. a subnetwork of the
+neural network) are treated probabilistically.
+The goal of this class is to compute the posterior precision <span><span class="MathJax_Preview">P</span><script type="math/tex">P</script></span>
+which sums as
+<span><span class="MathJax_Preview">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</span><script type="math/tex; mode=display">
+P = \sum_{n=1}^N \nabla^2_\theta \log p(\mathcal{D}_n \mid \theta)
+\vert_{\theta_{MAP}} + \nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}}.
+</script></span>
+The prior is assumed to be Gaussian and therefore we have a simple form for
+<span><span class="MathJax_Preview">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </span><script type="math/tex">\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 </script></span>.
+In particular, we assume a scalar or diagonal prior precision so that in
+all cases <span><span class="MathJax_Preview">P_0 = \textrm{diag}(p_0)</span><script type="math/tex">P_0 = \textrm{diag}(p_0)</script></span> and the structure of <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> can be varied.</p>
+<p>The subnetwork Laplace approximation only supports a full, i.e., dense, log likelihood
+Hessian approximation and hence posterior precision.
+Based on the chosen <code>backend</code>
+parameter, the full approximation can be, for example, a generalized Gauss-Newton
+matrix.
+Mathematically, we have <span><span class="MathJax_Preview">P \in \mathbb{R}^{P \times P}</span><script type="math/tex">P \in \mathbb{R}^{P \times P}</script></span>.
+See <code>FullLaplace</code> and <code>BaseLaplace</code> for the full interface.</p>
+<h2 id="references">References</h2>
+<p>[1] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM.
+<a href="https://arxiv.org/abs/2010.14689"><em>Bayesian Deep Learning via Subnetwork Inference</em></a>.
+ICML 2021.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code> or <code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="utils/feature_extractor.html#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>likelihood</code></strong> :&ensp;<code>{'classification', 'regression'}</code></dt>
+<dd>determines the log likelihood Hessian approximation</dd>
+<dt><strong><code>subnetwork_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork to apply the Laplace approximation over</dd>
+<dt><strong><code>sigma_noise</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>observation noise for the regression setting; must be 1 for classification</dd>
+<dt><strong><code>prior_precision</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>1</code></dt>
+<dd>prior precision of a Gaussian prior (= weight decay);
+can be scalar, per-layer, or diagonal in the most general case</dd>
+<dt><strong><code>prior_mean</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code>, default=<code>0</code></dt>
+<dd>prior mean of a Gaussian prior, useful for continual learning</dd>
+<dt><strong><code>temperature</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
+<dd>temperature of the likelihood; lower temperature leads to more
+concentrated posterior and vice versa.</dd>
+<dt><strong><code>backend</code></strong> :&ensp;<code>subclasses</code> of <code><a title="laplace.curvature.CurvatureInterface" href="curvature/index.html#laplace.curvature.CurvatureInterface">CurvatureInterface</a></code></dt>
+<dd>backend for access to curvature/Hessian approximations</dd>
+<dt><strong><code>backend_kwargs</code></strong> :&ensp;<code>dict</code>, default=<code>None</code></dt>
+<dd>arguments passed to the backend on initialization, for example to
+set the number of MC samples for stochastic approximations.</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></li>
+<li><a title="laplace.baselaplace.ParametricLaplace" href="baselaplace.html#laplace.baselaplace.ParametricLaplace">ParametricLaplace</a></li>
+<li><a title="laplace.baselaplace.BaseLaplace" href="baselaplace.html#laplace.baselaplace.BaseLaplace">BaseLaplace</a></li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.subnetlaplace.SubnetLaplace.prior_precision_diag"><code class="name">var <span class="ident">prior_precision_diag</span></code></dt>
+<dd>
+<div class="desc"><p>Obtain the diagonal prior precision <span><span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></span> constructed from either
+a scalar or diagonal prior precision.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>prior_precision_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.baselaplace.FullLaplace" href="baselaplace.html#laplace.baselaplace.FullLaplace">FullLaplace</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.baselaplace.FullLaplace.fit" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.fit">fit</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.functional_variance" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.functional_variance">functional_variance</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_posterior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_posterior_precision">log_det_posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_prior_precision" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_prior_precision">log_det_prior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_det_ratio" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_det_ratio">log_det_ratio</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_likelihood" href="baselaplace.html#laplace.baselaplace.BaseLaplace.log_likelihood">log_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_marginal_likelihood" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_marginal_likelihood">log_marginal_likelihood</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.log_prob" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.log_prob">log_prob</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.optimize_prior_precision_base" href="baselaplace.html#laplace.baselaplace.BaseLaplace.optimize_prior_precision_base">optimize_prior_precision_base</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_covariance" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_covariance">posterior_covariance</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_precision" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_precision">posterior_precision</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.posterior_scale" href="baselaplace.html#laplace.baselaplace.FullLaplace.posterior_scale">posterior_scale</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.predictive_samples" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.predictive_samples">predictive_samples</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.sample" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.sample">sample</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.scatter" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.scatter">scatter</a></code></li>
+<li><code><a title="laplace.baselaplace.FullLaplace.square_norm" href="baselaplace.html#laplace.baselaplace.ParametricLaplace.square_norm">square_norm</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+</dl>
+</section>
+</article>
+<nav id="sidebar">
+<h1>Index</h1>
+<div class="toc">
+<ul></ul>
+</div>
+<ul id="index">
+<li><h3>Super-module</h3>
+<ul>
+<li><code><a title="laplace" href="index.html">laplace</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-classes">Classes</a></h3>
+<ul>
+<li>
+<h4><code><a title="laplace.subnetlaplace.SubnetLaplace" href="#laplace.subnetlaplace.SubnetLaplace">SubnetLaplace</a></code></h4>
+</li>
+</ul>
+</li>
+</ul>
+</nav>
+</main>
+<footer id="footer">
+<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
+</footer>
+</body>
+</html>
\ No newline at end of file
diff --git a/docs/feature_extractor.html b/docs/utils/feature_extractor.html
similarity index 84%
rename from docs/feature_extractor.html
rename to docs/utils/feature_extractor.html
index 1c7ef071..9599128b 100644
--- a/docs/feature_extractor.html
+++ b/docs/utils/feature_extractor.html
@@ -4,7 +4,7 @@
 <meta charset="utf-8">
 <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
 <meta name="generator" content="pdoc 0.10.0" />
-<title>laplace.feature_extractor API documentation</title>
+<title>laplace.utils.feature_extractor API documentation</title>
 <meta name="description" content="" />
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
@@ -20,7 +20,7 @@
 <main>
 <article id="content">
 <header>
-<h1 class="title">Module <code>laplace.feature_extractor</code></h1>
+<h1 class="title">Module <code>laplace.utils.feature_extractor</code></h1>
 </header>
 <section id="section-intro">
 </section>
@@ -33,7 +33,7 @@ <h1 class="title">Module <code>laplace.feature_extractor</code></h1>
 <section>
 <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
-<dt id="laplace.feature_extractor.FeatureExtractor"><code class="flex name class">
+<dt id="laplace.utils.feature_extractor.FeatureExtractor"><code class="flex name class">
 <span>class <span class="ident">FeatureExtractor</span></span>
 <span>(</span><span>model: torch.nn.modules.module.Module, last_layer_name: Optional[str] = None)</span>
 </code></dt>
@@ -61,18 +61,18 @@ <h3>Ancestors</h3>
 </ul>
 <h3>Class variables</h3>
 <dl>
-<dt id="laplace.feature_extractor.FeatureExtractor.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt id="laplace.feature_extractor.FeatureExtractor.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
 </dl>
 <h3>Methods</h3>
 <dl>
-<dt id="laplace.feature_extractor.FeatureExtractor.forward"><code class="name flex">
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.forward"><code class="name flex">
 <span>def <span class="ident">forward</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -84,7 +84,7 @@ <h2 id="parameters">Parameters</h2>
 <dd>one batch of data to use as input for the forward pass</dd>
 </dl></div>
 </dd>
-<dt id="laplace.feature_extractor.FeatureExtractor.forward_with_features"><code class="name flex">
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.forward_with_features"><code class="name flex">
 <span>def <span class="ident">forward_with_features</span></span>(<span>self, x: torch.Tensor) ‑> Tuple[torch.Tensor, torch.Tensor]</span>
 </code></dt>
 <dd>
@@ -97,7 +97,7 @@ <h2 id="parameters">Parameters</h2>
 <dd>one batch of data to use as input for the forward pass</dd>
 </dl></div>
 </dd>
-<dt id="laplace.feature_extractor.FeatureExtractor.set_last_layer"><code class="name flex">
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.set_last_layer"><code class="name flex">
 <span>def <span class="ident">set_last_layer</span></span>(<span>self, last_layer_name: str) ‑> None</span>
 </code></dt>
 <dd>
@@ -109,7 +109,7 @@ <h2 id="parameters">Parameters</h2>
 <dd>the name of the last layer (fixed in <code>model.named_modules()</code>).</dd>
 </dl></div>
 </dd>
-<dt id="laplace.feature_extractor.FeatureExtractor.find_last_layer"><code class="name flex">
+<dt id="laplace.utils.feature_extractor.FeatureExtractor.find_last_layer"><code class="name flex">
 <span>def <span class="ident">find_last_layer</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -138,18 +138,18 @@ <h1>Index</h1>
 <ul id="index">
 <li><h3>Super-module</h3>
 <ul>
-<li><code><a title="laplace" href="index.html">laplace</a></code></li>
+<li><code><a title="laplace.utils" href="index.html">laplace.utils</a></code></li>
 </ul>
 </li>
 <li><h3><a href="#header-classes">Classes</a></h3>
 <ul>
 <li>
-<h4><code><a title="laplace.feature_extractor.FeatureExtractor" href="#laplace.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></h4>
+<h4><code><a title="laplace.utils.feature_extractor.FeatureExtractor" href="#laplace.utils.feature_extractor.FeatureExtractor">FeatureExtractor</a></code></h4>
 <ul class="">
-<li><code><a title="laplace.feature_extractor.FeatureExtractor.forward" href="#laplace.feature_extractor.FeatureExtractor.forward">forward</a></code></li>
-<li><code><a title="laplace.feature_extractor.FeatureExtractor.forward_with_features" href="#laplace.feature_extractor.FeatureExtractor.forward_with_features">forward_with_features</a></code></li>
-<li><code><a title="laplace.feature_extractor.FeatureExtractor.set_last_layer" href="#laplace.feature_extractor.FeatureExtractor.set_last_layer">set_last_layer</a></code></li>
-<li><code><a title="laplace.feature_extractor.FeatureExtractor.find_last_layer" href="#laplace.feature_extractor.FeatureExtractor.find_last_layer">find_last_layer</a></code></li>
+<li><code><a title="laplace.utils.feature_extractor.FeatureExtractor.forward" href="#laplace.utils.feature_extractor.FeatureExtractor.forward">forward</a></code></li>
+<li><code><a title="laplace.utils.feature_extractor.FeatureExtractor.forward_with_features" href="#laplace.utils.feature_extractor.FeatureExtractor.forward_with_features">forward_with_features</a></code></li>
+<li><code><a title="laplace.utils.feature_extractor.FeatureExtractor.set_last_layer" href="#laplace.utils.feature_extractor.FeatureExtractor.set_last_layer">set_last_layer</a></code></li>
+<li><code><a title="laplace.utils.feature_extractor.FeatureExtractor.find_last_layer" href="#laplace.utils.feature_extractor.FeatureExtractor.find_last_layer">find_last_layer</a></code></li>
 </ul>
 </li>
 </ul>
diff --git a/docs/utils/index.html b/docs/utils/index.html
new file mode 100644
index 00000000..2848898a
--- /dev/null
+++ b/docs/utils/index.html
@@ -0,0 +1,1017 @@
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
+<meta name="generator" content="pdoc 0.10.0" />
+<title>laplace.utils API documentation</title>
+<meta name="description" content="" />
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
+<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
+<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
+<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
+<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
+<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
+<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
+<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
+</head>
+<body>
+<main>
+<article id="content">
+<header>
+<h1 class="title">Module <code>laplace.utils</code></h1>
+</header>
+<section id="section-intro">
+</section>
+<section>
+<h2 class="section-title" id="header-submodules">Sub-modules</h2>
+<dl>
+<dt><code class="name"><a title="laplace.utils.feature_extractor" href="feature_extractor.html">laplace.utils.feature_extractor</a></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt><code class="name"><a title="laplace.utils.matrix" href="matrix.html">laplace.utils.matrix</a></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt><code class="name"><a title="laplace.utils.subnetmask" href="subnetmask.html">laplace.utils.subnetmask</a></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt><code class="name"><a title="laplace.utils.swag" href="swag.html">laplace.utils.swag</a></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt><code class="name"><a title="laplace.utils.utils" href="utils.html">laplace.utils.utils</a></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+</section>
+<section>
+</section>
+<section>
+<h2 class="section-title" id="header-functions">Functions</h2>
+<dl>
+<dt id="laplace.utils.get_nll"><code class="name flex">
+<span>def <span class="ident">get_nll</span></span>(<span>out_dist, targets)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.validate"><code class="name flex">
+<span>def <span class="ident">validate</span></span>(<span>laplace, val_loader, pred_type='glm', link_approx='probit', n_samples=100)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.parameters_per_layer"><code class="name flex">
+<span>def <span class="ident">parameters_per_layer</span></span>(<span>model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get number of parameters per layer.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>params_per_layer</code></strong> :&ensp;<code>list[int]</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.invsqrt_precision"><code class="name flex">
+<span>def <span class="ident">invsqrt_precision</span></span>(<span>M)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute <code>M^{-0.5}</code> as a tridiagonal matrix.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>M</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>M_invsqrt</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.kron"><code class="name flex">
+<span>def <span class="ident">kron</span></span>(<span>t1, t2)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Computes the Kronecker product between two tensors.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>t1</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>t2</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>kron_product</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.diagonal_add_scalar"><code class="name flex">
+<span>def <span class="ident">diagonal_add_scalar</span></span>(<span>X, value)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Add scalar value <code>value</code> to diagonal of <code>X</code>.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>X</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>value</code></strong> :&ensp;<code>torch.Tensor</code> or <code>float</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>X_add_scalar</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.symeig"><code class="name flex">
+<span>def <span class="ident">symeig</span></span>(<span>M)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Symetric eigendecomposition avoiding failure cases by
+adding and removing jitter to the diagonal.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>M</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>L</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>eigenvalues</dd>
+<dt><strong><code>W</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>eigenvectors</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.block_diag"><code class="name flex">
+<span>def <span class="ident">block_diag</span></span>(<span>blocks)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compose block-diagonal matrix of individual blocks.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>blocks</code></strong> :&ensp;<code>list[torch.Tensor]</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>M</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.expand_prior_precision"><code class="name flex">
+<span>def <span class="ident">expand_prior_precision</span></span>(<span>prior_prec, model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Expand prior precision to match the shape of the model parameters.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>prior_prec</code></strong> :&ensp;<code>torch.Tensor 1-dimensional</code></dt>
+<dd>prior precision</dd>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>torch model with parameters that are regularized by prior_prec</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>expanded_prior_prec</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>expanded prior precision has the same shape as model parameters</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.fit_diagonal_swag_var"><code class="name flex">
+<span>def <span class="ident">fit_diagonal_swag_var</span></span>(<span>model, train_loader, criterion, n_snapshots_total=40, snapshot_freq=1, lr=0.01, momentum=0.9, weight_decay=0.0003, min_var=1e-30)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Fit diagonal SWAG [1], which estimates marginal variances of model parameters by
+computing the first and second moment of SGD iterates with a large learning rate.</p>
+<p>Implementation partly adapted from:
+- <a href="https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py">https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py</a>
+- <a href="https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py">https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py</a></p>
+<h2 id="references">References</h2>
+<p>[1] Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, AG.
+<a href="https://arxiv.org/abs/1902.02476"><em>A Simple Baseline for Bayesian Uncertainty in Deep Learning</em></a>.
+NeurIPS 2019.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
+<dd>training data loader to use for snapshot collection</dd>
+<dt><strong><code>criterion</code></strong> :&ensp;<code>torch.nn.CrossEntropyLoss</code> or <code>torch.nn.MSELoss</code></dt>
+<dd>loss function to use for snapshot collection</dd>
+<dt><strong><code>n_snapshots_total</code></strong> :&ensp;<code>int</code></dt>
+<dd>total number of model snapshots to collect</dd>
+<dt><strong><code>snapshot_freq</code></strong> :&ensp;<code>int</code></dt>
+<dd>snapshot collection frequency (in epochs)</dd>
+<dt><strong><code>lr</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD learning rate for collecting snapshots</dd>
+<dt><strong><code>momentum</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD momentum</dd>
+<dt><strong><code>weight_decay</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD weight decay</dd>
+<dt><strong><code>min_var</code></strong> :&ensp;<code>float</code></dt>
+<dd>minimum parameter variance to clamp to (for numerical stability)</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>param_variances</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>vector of marginal variances for each model parameter</dd>
+</dl></div>
+</dd>
+</dl>
+</section>
+<section>
+<h2 class="section-title" id="header-classes">Classes</h2>
+<dl>
+<dt id="laplace.utils.FeatureExtractor"><code class="flex name class">
+<span>class <span class="ident">FeatureExtractor</span></span>
+<span>(</span><span>model: torch.nn.modules.module.Module, last_layer_name: Optional[str] = None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Feature extractor for a PyTorch neural network.
+A wrapper which can return the output of the penultimate layer in addition to
+the output of the last layer for each forward pass. If the name of the last
+layer is not known, it can determine it automatically. It assumes that the
+last layer is linear and that for every forward pass the last layer is the same.
+If the name of the last layer is known, it can be passed as a parameter at
+initilization; this is the safest way to use this class.
+Based on <a href="https://gist.github.com/fkodom/27ed045c9051a39102e8bcf4ce31df76.">https://gist.github.com/fkodom/27ed045c9051a39102e8bcf4ce31df76.</a></p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>PyTorch model</dd>
+<dt><strong><code>last_layer_name</code></strong> :&ensp;<code>str</code>, default=<code>None</code></dt>
+<dd>if the name of the last layer is already known, otherwise it will
+be determined automatically.</dd>
+</dl>
+<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>torch.nn.modules.module.Module</li>
+</ul>
+<h3>Class variables</h3>
+<dl>
+<dt id="laplace.utils.FeatureExtractor.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.FeatureExtractor.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.FeatureExtractor.forward"><code class="name flex">
+<span>def <span class="ident">forward</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Forward pass. If the last layer is not known yet, it will be
+determined when this function is called for the first time.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>one batch of data to use as input for the forward pass</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.FeatureExtractor.forward_with_features"><code class="name flex">
+<span>def <span class="ident">forward_with_features</span></span>(<span>self, x: torch.Tensor) ‑> Tuple[torch.Tensor, torch.Tensor]</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Forward pass which returns the output of the penultimate layer along
+with the output of the last layer. If the last layer is not known yet,
+it will be determined when this function is called for the first time.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>one batch of data to use as input for the forward pass</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.FeatureExtractor.set_last_layer"><code class="name flex">
+<span>def <span class="ident">set_last_layer</span></span>(<span>self, last_layer_name: str) ‑> None</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Set the last layer of the model by its name. This sets the forward
+hook to get the output of the penultimate layer.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>last_layer_name</code></strong> :&ensp;<code>str</code></dt>
+<dd>the name of the last layer (fixed in <code>model.named_modules()</code>).</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.FeatureExtractor.find_last_layer"><code class="name flex">
+<span>def <span class="ident">find_last_layer</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Automatically determines the last layer of the model with one
+forward pass. It assumes that the last layer is the same for every
+forward pass and that it is an instance of <code>torch.nn.Linear</code>.
+Might not work with every architecture, but is tested with all PyTorch
+torchvision classification models (besides SqueezeNet, which has no
+linear last layer).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>x</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>one batch of data to use as input for the forward pass</dd>
+</dl></div>
+</dd>
+</dl>
+</dd>
+<dt id="laplace.utils.Kron"><code class="flex name class">
+<span>class <span class="ident">Kron</span></span>
+<span>(</span><span>kfacs)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Kronecker factored approximate curvature representation for a corresponding
+neural network.
+Each element in <code>kfacs</code> is either a tuple or single matrix.
+A tuple represents two Kronecker factors <span><span class="MathJax_Preview">Q</span><script type="math/tex">Q</script></span>, and <span><span class="MathJax_Preview">H</span><script type="math/tex">H</script></span> and a single element
+is just a full block Hessian approximation.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>kfacs</code></strong> :&ensp;<code>list[Tuple]</code></dt>
+<dd>each element in the list is a Tuple of two Kronecker factors Q, H
+or a single matrix approximating the Hessian (in case of bias, for example)</dd>
+</dl></div>
+<h3>Static methods</h3>
+<dl>
+<dt id="laplace.utils.Kron.init_from_model"><code class="name flex">
+<span>def <span class="ident">init_from_model</span></span>(<span>model, device)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Initialize Kronecker factors based on a models architecture.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>device</code></strong> :&ensp;<code>torch.device</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>kron</code></strong> :&ensp;<code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.Kron.decompose"><code class="name flex">
+<span>def <span class="ident">decompose</span></span>(<span>self, damping=False)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Eigendecompose Kronecker factors and turn into <code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code>.
+Parameters</p>
+<hr>
+<dl>
+<dt><strong><code>damping</code></strong> :&ensp;<code>bool</code></dt>
+<dd>use damping</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>kron_decomposed</code></strong> :&ensp;<code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.Kron.bmm"><code class="name flex">
+<span>def <span class="ident">bmm</span></span>(<span>self, W: torch.Tensor, exponent: float = 1) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Batched matrix multiplication with the Kronecker factors.
+If Kron is <code>H</code>, we compute <code>H @ W</code>.
+This is useful for computing the predictive or a regularization
+based on Kronecker factors as in continual learning.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>W</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>matrix <code>(batch, classes, params)</code></dd>
+<dt><strong><code>exponent</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
+<dd>only can be <code>1</code> for Kron, requires <code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code> for other
+exponent values of the Kronecker factors.</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>SW</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>result <code>(batch, classes, params)</code></dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.Kron.logdet"><code class="name flex">
+<span>def <span class="ident">logdet</span></span>(<span>self) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute log determinant of the Kronecker factors and sums them up.
+This corresponds to the log determinant of the entire Hessian approximation.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>logdet</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.Kron.diag"><code class="name flex">
+<span>def <span class="ident">diag</span></span>(<span>self) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Extract diagonal of the entire Kronecker factorization.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.Kron.to_matrix"><code class="name flex">
+<span>def <span class="ident">to_matrix</span></span>(<span>self) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Make the Kronecker factorization dense by computing the kronecker product.
+Warning: this should only be used for testing purposes as it will allocate
+large amounts of memory for big architectures.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>block_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+</dd>
+<dt id="laplace.utils.KronDecomposed"><code class="flex name class">
+<span>class <span class="ident">KronDecomposed</span></span>
+<span>(</span><span>eigenvectors, eigenvalues, deltas=None, damping=False)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Decomposed Kronecker factored approximate curvature representation
+for a corresponding neural network.
+Each matrix in <code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code> is decomposed to obtain <code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code>.
+Front-loading decomposition allows cheap repeated computation
+of inverses and log determinants.
+In contrast to <code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code>, we can add scalar or layerwise scalars but
+we cannot add other <code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code> or <code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code> anymore.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>eigenvectors</code></strong> :&ensp;<code>list[Tuple[torch.Tensor]]</code></dt>
+<dd>eigenvectors corresponding to matrices in a corresponding <code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code></dd>
+<dt><strong><code>eigenvalues</code></strong> :&ensp;<code>list[Tuple[torch.Tensor]]</code></dt>
+<dd>eigenvalues corresponding to matrices in a corresponding <code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code></dd>
+<dt><strong><code>deltas</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>addend for each group of Kronecker factors representing, for example,
+a prior precision</dd>
+<dt><strong><code>dampen</code></strong> :&ensp;<code>bool</code>, default=<code>False</code></dt>
+<dd>use dampen approximation mixing prior and Kron partially multiplicatively</dd>
+</dl></div>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.KronDecomposed.detach"><code class="name flex">
+<span>def <span class="ident">detach</span></span>(<span>self)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.KronDecomposed.logdet"><code class="name flex">
+<span>def <span class="ident">logdet</span></span>(<span>self) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Compute log determinant of the Kronecker factors and sums them up.
+This corresponds to the log determinant of the entire Hessian approximation.
+In contrast to <code><a title="laplace.utils.Kron.logdet" href="#laplace.utils.Kron.logdet">Kron.logdet()</a></code>, additive <code>deltas</code> corresponding to prior
+precisions are added.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>logdet</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.KronDecomposed.inv_square_form"><code class="name flex">
+<span>def <span class="ident">inv_square_form</span></span>(<span>self, W: torch.Tensor) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.KronDecomposed.bmm"><code class="name flex">
+<span>def <span class="ident">bmm</span></span>(<span>self, W: torch.Tensor, exponent: float = -1) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Batched matrix multiplication with the decomposed Kronecker factors.
+This is useful for computing the predictive or a regularization loss.
+Compared to <code><a title="laplace.utils.Kron.bmm" href="#laplace.utils.Kron.bmm">Kron.bmm()</a></code>, a prior can be added here in form of <code>deltas</code>
+and the exponent can be other than just 1.
+Computes <span><span class="MathJax_Preview">H^{exponent} W</span><script type="math/tex">H^{exponent} W</script></span>.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>W</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>matrix <code>(batch, classes, params)</code></dd>
+<dt><strong><code>exponent</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
+<dd>&nbsp;</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>SW</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>result <code>(batch, classes, params)</code></dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.KronDecomposed.to_matrix"><code class="name flex">
+<span>def <span class="ident">to_matrix</span></span>(<span>self, exponent: float = 1) ‑> torch.Tensor</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Make the Kronecker factorization dense by computing the kronecker product.
+Warning: this should only be used for testing purposes as it will allocate
+large amounts of memory for big architectures.</p>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>block_diag</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+</dd>
+</dl>
+</dd>
+<dt id="laplace.utils.SubnetMask"><code class="flex name class">
+<span>class <span class="ident">SubnetMask</span></span>
+<span>(</span><span>model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Baseclass for all subnetwork masks in this library (for subnetwork Laplace).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="subnetmask.html#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></li>
+<li><a title="laplace.utils.subnetmask.ParamNameSubnetMask" href="subnetmask.html#laplace.utils.subnetmask.ParamNameSubnetMask">ParamNameSubnetMask</a></li>
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.utils.SubnetMask.indices"><code class="name">var <span class="ident">indices</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.SubnetMask.n_params_subnet"><code class="name">var <span class="ident">n_params_subnet</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.SubnetMask.convert_subnet_mask_to_indices"><code class="name flex">
+<span>def <span class="ident">convert_subnet_mask_to_indices</span></span>(<span>self, subnet_mask)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Converts a subnetwork mask into subnetwork indices.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>subnet_mask</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>a binary vector of size (n_params) where 1s locate the subnetwork parameters
+within the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>a vector of indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.SubnetMask.select"><code class="name flex">
+<span>def <span class="ident">select</span></span>(<span>self, train_loader=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Select the subnetwork mask.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code>, default=<code>None</code></dt>
+<dd>each iterate is a training batch (X, y);
+<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>a vector of indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.SubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
+<dd>each iterate is a training batch (X, y);
+<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>a binary vector of size (n_params) where 1s locate the subnetwork parameters
+within the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)</dd>
+</dl></div>
+</dd>
+</dl>
+</dd>
+<dt id="laplace.utils.RandomSubnetMask"><code class="flex name class">
+<span>class <span class="ident">RandomSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask of parameters sampled uniformly at random.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.RandomSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.LargestMagnitudeSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestMagnitudeSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest magnitude.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.LargestMagnitudeSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.LargestVarianceDiagLaplaceSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestVarianceDiagLaplaceSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet, diag_laplace_model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest marginal variances
+(estimated using a diagonal Laplace approximation over all model parameters).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>n_params_subnet</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)</dd>
+<dt><strong><code>diag_laplace_model</code></strong> :&ensp;<code><a title="laplace.baselaplace.DiagLaplace" href="../baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></code></dt>
+<dd>diagonal Laplace model to use for variance estimation</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.LargestVarianceSWAGSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestVarianceSWAGSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet, likelihood='classification', swag_n_snapshots=40, swag_snapshot_freq=1, swag_lr=0.01)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest marginal variances
+(estimated using diagonal SWAG over all model parameters).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>n_params_subnet</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)</dd>
+<dt><strong><code>likelihood</code></strong> :&ensp;<code>str</code></dt>
+<dd>'classification' or 'regression'</dd>
+<dt><strong><code>swag_n_snapshots</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of model snapshots to collect for SWAG</dd>
+<dt><strong><code>swag_snapshot_freq</code></strong> :&ensp;<code>int</code></dt>
+<dd>SWAG snapshot collection frequency (in epochs)</dd>
+<dt><strong><code>swag_lr</code></strong> :&ensp;<code>float</code></dt>
+<dd>learning rate for SWAG snapshot collection</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.LargestVarianceSWAGSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.ParamNameSubnetMask"><code class="flex name class">
+<span>class <span class="ident">ParamNameSubnetMask</span></span>
+<span>(</span><span>model, parameter_names)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the specified parameters of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>parameter_names</code></strong> :&ensp;<code>List[str]</code></dt>
+<dd>list of names of the parameters (as in <code>model.named_parameters()</code>)
+that define the subnetwork</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.ParamNameSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the specified parameters.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.ModuleNameSubnetMask"><code class="flex name class">
+<span>class <span class="ident">ModuleNameSubnetMask</span></span>
+<span>(</span><span>model, module_names)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the specified modules of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>parameter_names</code></strong> :&ensp;<code>List[str]</code></dt>
+<dd>list of names of the modules (as in <code>model.named_modules()</code>) that define the subnetwork;
+the modules cannot have children, i.e. need to be leaf modules</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.LastLayerSubnetMask" href="subnetmask.html#laplace.utils.subnetmask.LastLayerSubnetMask">LastLayerSubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.ModuleNameSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the specified modules.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.LastLayerSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LastLayerSubnetMask</span></span>
+<span>(</span><span>model, last_layer_name=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the last layer of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>last_layer_name</code></strong> :&ensp;<code>str</code>, default=<code>None</code></dt>
+<dd>name of the model's last layer, if None it will be determined automatically</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="subnetmask.html#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.LastLayerSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the last layer.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="subnetmask.html#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask.convert_subnet_mask_to_indices" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask.select" href="subnetmask.html#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+</dl>
+</section>
+</article>
+<nav id="sidebar">
+<h1>Index</h1>
+<div class="toc">
+<ul></ul>
+</div>
+<ul id="index">
+<li><h3>Super-module</h3>
+<ul>
+<li><code><a title="laplace" href="../index.html">laplace</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-submodules">Sub-modules</a></h3>
+<ul>
+<li><code><a title="laplace.utils.feature_extractor" href="feature_extractor.html">laplace.utils.feature_extractor</a></code></li>
+<li><code><a title="laplace.utils.matrix" href="matrix.html">laplace.utils.matrix</a></code></li>
+<li><code><a title="laplace.utils.subnetmask" href="subnetmask.html">laplace.utils.subnetmask</a></code></li>
+<li><code><a title="laplace.utils.swag" href="swag.html">laplace.utils.swag</a></code></li>
+<li><code><a title="laplace.utils.utils" href="utils.html">laplace.utils.utils</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-functions">Functions</a></h3>
+<ul class="">
+<li><code><a title="laplace.utils.get_nll" href="#laplace.utils.get_nll">get_nll</a></code></li>
+<li><code><a title="laplace.utils.validate" href="#laplace.utils.validate">validate</a></code></li>
+<li><code><a title="laplace.utils.parameters_per_layer" href="#laplace.utils.parameters_per_layer">parameters_per_layer</a></code></li>
+<li><code><a title="laplace.utils.invsqrt_precision" href="#laplace.utils.invsqrt_precision">invsqrt_precision</a></code></li>
+<li><code><a title="laplace.utils.kron" href="#laplace.utils.kron">kron</a></code></li>
+<li><code><a title="laplace.utils.diagonal_add_scalar" href="#laplace.utils.diagonal_add_scalar">diagonal_add_scalar</a></code></li>
+<li><code><a title="laplace.utils.symeig" href="#laplace.utils.symeig">symeig</a></code></li>
+<li><code><a title="laplace.utils.block_diag" href="#laplace.utils.block_diag">block_diag</a></code></li>
+<li><code><a title="laplace.utils.expand_prior_precision" href="#laplace.utils.expand_prior_precision">expand_prior_precision</a></code></li>
+<li><code><a title="laplace.utils.fit_diagonal_swag_var" href="#laplace.utils.fit_diagonal_swag_var">fit_diagonal_swag_var</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-classes">Classes</a></h3>
+<ul>
+<li>
+<h4><code><a title="laplace.utils.FeatureExtractor" href="#laplace.utils.FeatureExtractor">FeatureExtractor</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.FeatureExtractor.forward" href="#laplace.utils.FeatureExtractor.forward">forward</a></code></li>
+<li><code><a title="laplace.utils.FeatureExtractor.forward_with_features" href="#laplace.utils.FeatureExtractor.forward_with_features">forward_with_features</a></code></li>
+<li><code><a title="laplace.utils.FeatureExtractor.set_last_layer" href="#laplace.utils.FeatureExtractor.set_last_layer">set_last_layer</a></code></li>
+<li><code><a title="laplace.utils.FeatureExtractor.find_last_layer" href="#laplace.utils.FeatureExtractor.find_last_layer">find_last_layer</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.Kron" href="#laplace.utils.Kron">Kron</a></code></h4>
+<ul class="two-column">
+<li><code><a title="laplace.utils.Kron.init_from_model" href="#laplace.utils.Kron.init_from_model">init_from_model</a></code></li>
+<li><code><a title="laplace.utils.Kron.decompose" href="#laplace.utils.Kron.decompose">decompose</a></code></li>
+<li><code><a title="laplace.utils.Kron.bmm" href="#laplace.utils.Kron.bmm">bmm</a></code></li>
+<li><code><a title="laplace.utils.Kron.logdet" href="#laplace.utils.Kron.logdet">logdet</a></code></li>
+<li><code><a title="laplace.utils.Kron.diag" href="#laplace.utils.Kron.diag">diag</a></code></li>
+<li><code><a title="laplace.utils.Kron.to_matrix" href="#laplace.utils.Kron.to_matrix">to_matrix</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.KronDecomposed" href="#laplace.utils.KronDecomposed">KronDecomposed</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.KronDecomposed.detach" href="#laplace.utils.KronDecomposed.detach">detach</a></code></li>
+<li><code><a title="laplace.utils.KronDecomposed.logdet" href="#laplace.utils.KronDecomposed.logdet">logdet</a></code></li>
+<li><code><a title="laplace.utils.KronDecomposed.inv_square_form" href="#laplace.utils.KronDecomposed.inv_square_form">inv_square_form</a></code></li>
+<li><code><a title="laplace.utils.KronDecomposed.bmm" href="#laplace.utils.KronDecomposed.bmm">bmm</a></code></li>
+<li><code><a title="laplace.utils.KronDecomposed.to_matrix" href="#laplace.utils.KronDecomposed.to_matrix">to_matrix</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.SubnetMask" href="#laplace.utils.SubnetMask">SubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.SubnetMask.select" href="#laplace.utils.SubnetMask.select">select</a></code></li>
+<li><code><a title="laplace.utils.SubnetMask.get_subnet_mask" href="#laplace.utils.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.RandomSubnetMask" href="#laplace.utils.RandomSubnetMask">RandomSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.RandomSubnetMask.compute_param_scores" href="#laplace.utils.RandomSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.LargestMagnitudeSubnetMask" href="#laplace.utils.LargestMagnitudeSubnetMask">LargestMagnitudeSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.LargestMagnitudeSubnetMask.compute_param_scores" href="#laplace.utils.LargestMagnitudeSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.LargestVarianceDiagLaplaceSubnetMask" href="#laplace.utils.LargestVarianceDiagLaplaceSubnetMask">LargestVarianceDiagLaplaceSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores" href="#laplace.utils.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.LargestVarianceSWAGSubnetMask" href="#laplace.utils.LargestVarianceSWAGSubnetMask">LargestVarianceSWAGSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.LargestVarianceSWAGSubnetMask.compute_param_scores" href="#laplace.utils.LargestVarianceSWAGSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.ParamNameSubnetMask" href="#laplace.utils.ParamNameSubnetMask">ParamNameSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.ParamNameSubnetMask.get_subnet_mask" href="#laplace.utils.ParamNameSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.ModuleNameSubnetMask" href="#laplace.utils.ModuleNameSubnetMask">ModuleNameSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.ModuleNameSubnetMask.get_subnet_mask" href="#laplace.utils.ModuleNameSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.LastLayerSubnetMask" href="#laplace.utils.LastLayerSubnetMask">LastLayerSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.LastLayerSubnetMask.get_subnet_mask" href="#laplace.utils.LastLayerSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</nav>
+</main>
+<footer id="footer">
+<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
+</footer>
+</body>
+</html>
\ No newline at end of file
diff --git a/docs/matrix.html b/docs/utils/matrix.html
similarity index 75%
rename from docs/matrix.html
rename to docs/utils/matrix.html
index 6323f7cf..caa3e688 100644
--- a/docs/matrix.html
+++ b/docs/utils/matrix.html
@@ -4,7 +4,7 @@
 <meta charset="utf-8">
 <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
 <meta name="generator" content="pdoc 0.10.0" />
-<title>laplace.matrix API documentation</title>
+<title>laplace.utils.matrix API documentation</title>
 <meta name="description" content="" />
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
@@ -20,7 +20,7 @@
 <main>
 <article id="content">
 <header>
-<h1 class="title">Module <code>laplace.matrix</code></h1>
+<h1 class="title">Module <code>laplace.utils.matrix</code></h1>
 </header>
 <section id="section-intro">
 </section>
@@ -33,7 +33,7 @@ <h1 class="title">Module <code>laplace.matrix</code></h1>
 <section>
 <h2 class="section-title" id="header-classes">Classes</h2>
 <dl>
-<dt id="laplace.matrix.Kron"><code class="flex name class">
+<dt id="laplace.utils.matrix.Kron"><code class="flex name class">
 <span>class <span class="ident">Kron</span></span>
 <span>(</span><span>kfacs)</span>
 </code></dt>
@@ -51,7 +51,7 @@ <h2 id="parameters">Parameters</h2>
 </dl></div>
 <h3>Static methods</h3>
 <dl>
-<dt id="laplace.matrix.Kron.init_from_model"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.init_from_model"><code class="name flex">
 <span>def <span class="ident">init_from_model</span></span>(<span>model, device)</span>
 </code></dt>
 <dd>
@@ -65,18 +65,18 @@ <h2 id="parameters">Parameters</h2>
 </dl>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>kron</code></strong> :&ensp;<code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code></dt>
+<dt><strong><code>kron</code></strong> :&ensp;<code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code></dt>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
 </dl>
 <h3>Methods</h3>
 <dl>
-<dt id="laplace.matrix.Kron.decompose"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.decompose"><code class="name flex">
 <span>def <span class="ident">decompose</span></span>(<span>self, damping=False)</span>
 </code></dt>
 <dd>
-<div class="desc"><p>Eigendecompose Kronecker factors and turn into <code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code>.
+<div class="desc"><p>Eigendecompose Kronecker factors and turn into <code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code>.
 Parameters</p>
 <hr>
 <dl>
@@ -85,11 +85,11 @@ <h3>Methods</h3>
 </dl>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>kron_decomposed</code></strong> :&ensp;<code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code></dt>
+<dt><strong><code>kron_decomposed</code></strong> :&ensp;<code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code></dt>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.Kron.bmm"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.bmm"><code class="name flex">
 <span>def <span class="ident">bmm</span></span>(<span>self, W: torch.Tensor, exponent: float = 1) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -102,7 +102,7 @@ <h2 id="parameters">Parameters</h2>
 <dt><strong><code>W</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>matrix <code>(batch, classes, params)</code></dd>
 <dt><strong><code>exponent</code></strong> :&ensp;<code>float</code>, default=<code>1</code></dt>
-<dd>only can be <code>1</code> for Kron, requires <code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code> for other
+<dd>only can be <code>1</code> for Kron, requires <code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> for other
 exponent values of the Kronecker factors.</dd>
 </dl>
 <h2 id="returns">Returns</h2>
@@ -111,7 +111,7 @@ <h2 id="returns">Returns</h2>
 <dd>result <code>(batch, classes, params)</code></dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.Kron.logdet"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.logdet"><code class="name flex">
 <span>def <span class="ident">logdet</span></span>(<span>self) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -123,7 +123,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.Kron.diag"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.diag"><code class="name flex">
 <span>def <span class="ident">diag</span></span>(<span>self) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -134,7 +134,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.Kron.to_matrix"><code class="name flex">
+<dt id="laplace.utils.matrix.Kron.to_matrix"><code class="name flex">
 <span>def <span class="ident">to_matrix</span></span>(<span>self) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -149,24 +149,24 @@ <h2 id="returns">Returns</h2>
 </dd>
 </dl>
 </dd>
-<dt id="laplace.matrix.KronDecomposed"><code class="flex name class">
+<dt id="laplace.utils.matrix.KronDecomposed"><code class="flex name class">
 <span>class <span class="ident">KronDecomposed</span></span>
 <span>(</span><span>eigenvectors, eigenvalues, deltas=None, damping=False)</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Decomposed Kronecker factored approximate curvature representation
 for a corresponding neural network.
-Each matrix in <code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code> is decomposed to obtain <code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code>.
+Each matrix in <code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code> is decomposed to obtain <code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code>.
 Front-loading decomposition allows cheap repeated computation
 of inverses and log determinants.
-In contrast to <code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code>, we can add scalar or layerwise scalars but
-we cannot add other <code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code> or <code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code> anymore.</p>
+In contrast to <code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code>, we can add scalar or layerwise scalars but
+we cannot add other <code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code> or <code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code> anymore.</p>
 <h2 id="parameters">Parameters</h2>
 <dl>
 <dt><strong><code>eigenvectors</code></strong> :&ensp;<code>list[Tuple[torch.Tensor]]</code></dt>
-<dd>eigenvectors corresponding to matrices in a corresponding <code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code></dd>
+<dd>eigenvectors corresponding to matrices in a corresponding <code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code></dd>
 <dt><strong><code>eigenvalues</code></strong> :&ensp;<code>list[Tuple[torch.Tensor]]</code></dt>
-<dd>eigenvalues corresponding to matrices in a corresponding <code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code></dd>
+<dd>eigenvalues corresponding to matrices in a corresponding <code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code></dd>
 <dt><strong><code>deltas</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>addend for each group of Kronecker factors representing, for example,
 a prior precision</dd>
@@ -175,19 +175,19 @@ <h2 id="parameters">Parameters</h2>
 </dl></div>
 <h3>Methods</h3>
 <dl>
-<dt id="laplace.matrix.KronDecomposed.detach"><code class="name flex">
+<dt id="laplace.utils.matrix.KronDecomposed.detach"><code class="name flex">
 <span>def <span class="ident">detach</span></span>(<span>self)</span>
 </code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt id="laplace.matrix.KronDecomposed.logdet"><code class="name flex">
+<dt id="laplace.utils.matrix.KronDecomposed.logdet"><code class="name flex">
 <span>def <span class="ident">logdet</span></span>(<span>self) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Compute log determinant of the Kronecker factors and sums them up.
 This corresponds to the log determinant of the entire Hessian approximation.
-In contrast to <code><a title="laplace.matrix.Kron.logdet" href="#laplace.matrix.Kron.logdet">Kron.logdet()</a></code>, additive <code>deltas</code> corresponding to prior
+In contrast to <code><a title="laplace.utils.matrix.Kron.logdet" href="#laplace.utils.matrix.Kron.logdet">Kron.logdet()</a></code>, additive <code>deltas</code> corresponding to prior
 precisions are added.</p>
 <h2 id="returns">Returns</h2>
 <dl>
@@ -195,19 +195,19 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.KronDecomposed.inv_square_form"><code class="name flex">
+<dt id="laplace.utils.matrix.KronDecomposed.inv_square_form"><code class="name flex">
 <span>def <span class="ident">inv_square_form</span></span>(<span>self, W: torch.Tensor) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt id="laplace.matrix.KronDecomposed.bmm"><code class="name flex">
+<dt id="laplace.utils.matrix.KronDecomposed.bmm"><code class="name flex">
 <span>def <span class="ident">bmm</span></span>(<span>self, W: torch.Tensor, exponent: float = -1) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
 <div class="desc"><p>Batched matrix multiplication with the decomposed Kronecker factors.
 This is useful for computing the predictive or a regularization loss.
-Compared to <code><a title="laplace.matrix.Kron.bmm" href="#laplace.matrix.Kron.bmm">Kron.bmm()</a></code>, a prior can be added here in form of <code>deltas</code>
+Compared to <code><a title="laplace.utils.matrix.Kron.bmm" href="#laplace.utils.matrix.Kron.bmm">Kron.bmm()</a></code>, a prior can be added here in form of <code>deltas</code>
 and the exponent can be other than just 1.
 Computes <span><span class="MathJax_Preview">H^{exponent} W</span><script type="math/tex">H^{exponent} W</script></span>.</p>
 <h2 id="parameters">Parameters</h2>
@@ -223,7 +223,7 @@ <h2 id="returns">Returns</h2>
 <dd>result <code>(batch, classes, params)</code></dd>
 </dl></div>
 </dd>
-<dt id="laplace.matrix.KronDecomposed.to_matrix"><code class="name flex">
+<dt id="laplace.utils.matrix.KronDecomposed.to_matrix"><code class="name flex">
 <span>def <span class="ident">to_matrix</span></span>(<span>self, exponent: float = 1) ‑> torch.Tensor</span>
 </code></dt>
 <dd>
@@ -249,30 +249,30 @@ <h1>Index</h1>
 <ul id="index">
 <li><h3>Super-module</h3>
 <ul>
-<li><code><a title="laplace" href="index.html">laplace</a></code></li>
+<li><code><a title="laplace.utils" href="index.html">laplace.utils</a></code></li>
 </ul>
 </li>
 <li><h3><a href="#header-classes">Classes</a></h3>
 <ul>
 <li>
-<h4><code><a title="laplace.matrix.Kron" href="#laplace.matrix.Kron">Kron</a></code></h4>
+<h4><code><a title="laplace.utils.matrix.Kron" href="#laplace.utils.matrix.Kron">Kron</a></code></h4>
 <ul class="two-column">
-<li><code><a title="laplace.matrix.Kron.init_from_model" href="#laplace.matrix.Kron.init_from_model">init_from_model</a></code></li>
-<li><code><a title="laplace.matrix.Kron.decompose" href="#laplace.matrix.Kron.decompose">decompose</a></code></li>
-<li><code><a title="laplace.matrix.Kron.bmm" href="#laplace.matrix.Kron.bmm">bmm</a></code></li>
-<li><code><a title="laplace.matrix.Kron.logdet" href="#laplace.matrix.Kron.logdet">logdet</a></code></li>
-<li><code><a title="laplace.matrix.Kron.diag" href="#laplace.matrix.Kron.diag">diag</a></code></li>
-<li><code><a title="laplace.matrix.Kron.to_matrix" href="#laplace.matrix.Kron.to_matrix">to_matrix</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.init_from_model" href="#laplace.utils.matrix.Kron.init_from_model">init_from_model</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.decompose" href="#laplace.utils.matrix.Kron.decompose">decompose</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.bmm" href="#laplace.utils.matrix.Kron.bmm">bmm</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.logdet" href="#laplace.utils.matrix.Kron.logdet">logdet</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.diag" href="#laplace.utils.matrix.Kron.diag">diag</a></code></li>
+<li><code><a title="laplace.utils.matrix.Kron.to_matrix" href="#laplace.utils.matrix.Kron.to_matrix">to_matrix</a></code></li>
 </ul>
 </li>
 <li>
-<h4><code><a title="laplace.matrix.KronDecomposed" href="#laplace.matrix.KronDecomposed">KronDecomposed</a></code></h4>
+<h4><code><a title="laplace.utils.matrix.KronDecomposed" href="#laplace.utils.matrix.KronDecomposed">KronDecomposed</a></code></h4>
 <ul class="">
-<li><code><a title="laplace.matrix.KronDecomposed.detach" href="#laplace.matrix.KronDecomposed.detach">detach</a></code></li>
-<li><code><a title="laplace.matrix.KronDecomposed.logdet" href="#laplace.matrix.KronDecomposed.logdet">logdet</a></code></li>
-<li><code><a title="laplace.matrix.KronDecomposed.inv_square_form" href="#laplace.matrix.KronDecomposed.inv_square_form">inv_square_form</a></code></li>
-<li><code><a title="laplace.matrix.KronDecomposed.bmm" href="#laplace.matrix.KronDecomposed.bmm">bmm</a></code></li>
-<li><code><a title="laplace.matrix.KronDecomposed.to_matrix" href="#laplace.matrix.KronDecomposed.to_matrix">to_matrix</a></code></li>
+<li><code><a title="laplace.utils.matrix.KronDecomposed.detach" href="#laplace.utils.matrix.KronDecomposed.detach">detach</a></code></li>
+<li><code><a title="laplace.utils.matrix.KronDecomposed.logdet" href="#laplace.utils.matrix.KronDecomposed.logdet">logdet</a></code></li>
+<li><code><a title="laplace.utils.matrix.KronDecomposed.inv_square_form" href="#laplace.utils.matrix.KronDecomposed.inv_square_form">inv_square_form</a></code></li>
+<li><code><a title="laplace.utils.matrix.KronDecomposed.bmm" href="#laplace.utils.matrix.KronDecomposed.bmm">bmm</a></code></li>
+<li><code><a title="laplace.utils.matrix.KronDecomposed.to_matrix" href="#laplace.utils.matrix.KronDecomposed.to_matrix">to_matrix</a></code></li>
 </ul>
 </li>
 </ul>
diff --git a/docs/utils/subnetmask.html b/docs/utils/subnetmask.html
new file mode 100644
index 00000000..15781ff4
--- /dev/null
+++ b/docs/utils/subnetmask.html
@@ -0,0 +1,466 @@
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
+<meta name="generator" content="pdoc 0.10.0" />
+<title>laplace.utils.subnetmask API documentation</title>
+<meta name="description" content="" />
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
+<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
+<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
+<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
+<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
+<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
+<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
+<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
+</head>
+<body>
+<main>
+<article id="content">
+<header>
+<h1 class="title">Module <code>laplace.utils.subnetmask</code></h1>
+</header>
+<section id="section-intro">
+</section>
+<section>
+</section>
+<section>
+</section>
+<section>
+</section>
+<section>
+<h2 class="section-title" id="header-classes">Classes</h2>
+<dl>
+<dt id="laplace.utils.subnetmask.SubnetMask"><code class="flex name class">
+<span>class <span class="ident">SubnetMask</span></span>
+<span>(</span><span>model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Baseclass for all subnetwork masks in this library (for subnetwork Laplace).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+</dl></div>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></li>
+<li><a title="laplace.utils.subnetmask.ParamNameSubnetMask" href="#laplace.utils.subnetmask.ParamNameSubnetMask">ParamNameSubnetMask</a></li>
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+</ul>
+<h3>Instance variables</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.SubnetMask.indices"><code class="name">var <span class="ident">indices</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+<dt id="laplace.utils.subnetmask.SubnetMask.n_params_subnet"><code class="name">var <span class="ident">n_params_subnet</span></code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices"><code class="name flex">
+<span>def <span class="ident">convert_subnet_mask_to_indices</span></span>(<span>self, subnet_mask)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Converts a subnetwork mask into subnetwork indices.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>subnet_mask</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>a binary vector of size (n_params) where 1s locate the subnetwork parameters
+within the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>a vector of indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.subnetmask.SubnetMask.select"><code class="name flex">
+<span>def <span class="ident">select</span></span>(<span>self, train_loader=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Select the subnetwork mask.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code>, default=<code>None</code></dt>
+<dd>each iterate is a training batch (X, y);
+<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask_indices</code></strong> :&ensp;<code>torch.LongTensor</code></dt>
+<dd>a vector of indices of the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)
+that define the subnetwork</dd>
+</dl></div>
+</dd>
+<dt id="laplace.utils.subnetmask.SubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
+<dd>each iterate is a training batch (X, y);
+<code>train_loader.dataset</code> needs to be set to access <span><span class="MathJax_Preview">N</span><script type="math/tex">N</script></span>, size of the data set</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>subnet_mask</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>a binary vector of size (n_params) where 1s locate the subnetwork parameters
+within the vectorized model parameters
+(i.e. <code>torch.nn.utils.parameters_to_vector(model.parameters())</code>)</dd>
+</dl></div>
+</dd>
+</dl>
+</dd>
+<dt id="laplace.utils.subnetmask.RandomSubnetMask"><code class="flex name class">
+<span>class <span class="ident">RandomSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask of parameters sampled uniformly at random.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.RandomSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.LargestMagnitudeSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestMagnitudeSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest magnitude.</p></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.LargestMagnitudeSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestVarianceDiagLaplaceSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet, diag_laplace_model)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest marginal variances
+(estimated using a diagonal Laplace approximation over all model parameters).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>n_params_subnet</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)</dd>
+<dt><strong><code>diag_laplace_model</code></strong> :&ensp;<code><a title="laplace.baselaplace.DiagLaplace" href="../baselaplace.html#laplace.baselaplace.DiagLaplace">DiagLaplace</a></code></dt>
+<dd>diagonal Laplace model to use for variance estimation</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LargestVarianceSWAGSubnetMask</span></span>
+<span>(</span><span>model, n_params_subnet, likelihood='classification', swag_n_snapshots=40, swag_snapshot_freq=1, swag_lr=0.01)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask identifying the parameters with the largest marginal variances
+(estimated using diagonal SWAG over all model parameters).</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>n_params_subnet</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)</dd>
+<dt><strong><code>likelihood</code></strong> :&ensp;<code>str</code></dt>
+<dd>'classification' or 'regression'</dd>
+<dt><strong><code>swag_n_snapshots</code></strong> :&ensp;<code>int</code></dt>
+<dd>number of model snapshots to collect for SWAG</dd>
+<dt><strong><code>swag_snapshot_freq</code></strong> :&ensp;<code>int</code></dt>
+<dd>SWAG snapshot collection frequency (in epochs)</dd>
+<dt><strong><code>swag_lr</code></strong> :&ensp;<code>float</code></dt>
+<dd>learning rate for SWAG snapshot collection</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li>laplace.utils.subnetmask.ScoreBasedSubnetMask</li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask.compute_param_scores"><code class="name flex">
+<span>def <span class="ident">compute_param_scores</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.ParamNameSubnetMask"><code class="flex name class">
+<span>class <span class="ident">ParamNameSubnetMask</span></span>
+<span>(</span><span>model, parameter_names)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the specified parameters of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>parameter_names</code></strong> :&ensp;<code>List[str]</code></dt>
+<dd>list of names of the parameters (as in <code>model.named_parameters()</code>)
+that define the subnetwork</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.ParamNameSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the specified parameters.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.ModuleNameSubnetMask"><code class="flex name class">
+<span>class <span class="ident">ModuleNameSubnetMask</span></span>
+<span>(</span><span>model, module_names)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the specified modules of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>parameter_names</code></strong> :&ensp;<code>List[str]</code></dt>
+<dd>list of names of the modules (as in <code>model.named_modules()</code>) that define the subnetwork;
+the modules cannot have children, i.e. need to be leaf modules</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Subclasses</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.LastLayerSubnetMask" href="#laplace.utils.subnetmask.LastLayerSubnetMask">LastLayerSubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.ModuleNameSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the specified modules.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+<dt id="laplace.utils.subnetmask.LastLayerSubnetMask"><code class="flex name class">
+<span>class <span class="ident">LastLayerSubnetMask</span></span>
+<span>(</span><span>model, last_layer_name=None)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Subnetwork mask corresponding to the last layer of the neural network.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>last_layer_name</code></strong> :&ensp;<code>str</code>, default=<code>None</code></dt>
+<dd>name of the model's last layer, if None it will be determined automatically</dd>
+</dl></div>
+<h3>Ancestors</h3>
+<ul class="hlist">
+<li><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></li>
+<li><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></li>
+</ul>
+<h3>Methods</h3>
+<dl>
+<dt id="laplace.utils.subnetmask.LastLayerSubnetMask.get_subnet_mask"><code class="name flex">
+<span>def <span class="ident">get_subnet_mask</span></span>(<span>self, train_loader)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Get the subnetwork mask identifying the last layer.</p></div>
+</dd>
+</dl>
+<h3>Inherited members</h3>
+<ul class="hlist">
+<li><code><b><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></b></code>:
+<ul class="hlist">
+<li><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+</ul>
+</li>
+</ul>
+</dd>
+</dl>
+</section>
+</article>
+<nav id="sidebar">
+<h1>Index</h1>
+<div class="toc">
+<ul></ul>
+</div>
+<ul id="index">
+<li><h3>Super-module</h3>
+<ul>
+<li><code><a title="laplace.utils" href="index.html">laplace.utils</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-classes">Classes</a></h3>
+<ul>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.SubnetMask" href="#laplace.utils.subnetmask.SubnetMask">SubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices" href="#laplace.utils.subnetmask.SubnetMask.convert_subnet_mask_to_indices">convert_subnet_mask_to_indices</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.select" href="#laplace.utils.subnetmask.SubnetMask.select">select</a></code></li>
+<li><code><a title="laplace.utils.subnetmask.SubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.SubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.RandomSubnetMask" href="#laplace.utils.subnetmask.RandomSubnetMask">RandomSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.RandomSubnetMask.compute_param_scores" href="#laplace.utils.subnetmask.RandomSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.LargestMagnitudeSubnetMask" href="#laplace.utils.subnetmask.LargestMagnitudeSubnetMask">LargestMagnitudeSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.LargestMagnitudeSubnetMask.compute_param_scores" href="#laplace.utils.subnetmask.LargestMagnitudeSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask" href="#laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask">LargestVarianceDiagLaplaceSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores" href="#laplace.utils.subnetmask.LargestVarianceDiagLaplaceSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask" href="#laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask">LargestVarianceSWAGSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask.compute_param_scores" href="#laplace.utils.subnetmask.LargestVarianceSWAGSubnetMask.compute_param_scores">compute_param_scores</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.ParamNameSubnetMask" href="#laplace.utils.subnetmask.ParamNameSubnetMask">ParamNameSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.ParamNameSubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.ParamNameSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask" href="#laplace.utils.subnetmask.ModuleNameSubnetMask">ModuleNameSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.ModuleNameSubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.ModuleNameSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+<li>
+<h4><code><a title="laplace.utils.subnetmask.LastLayerSubnetMask" href="#laplace.utils.subnetmask.LastLayerSubnetMask">LastLayerSubnetMask</a></code></h4>
+<ul class="">
+<li><code><a title="laplace.utils.subnetmask.LastLayerSubnetMask.get_subnet_mask" href="#laplace.utils.subnetmask.LastLayerSubnetMask.get_subnet_mask">get_subnet_mask</a></code></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</nav>
+</main>
+<footer id="footer">
+<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
+</footer>
+</body>
+</html>
\ No newline at end of file
diff --git a/docs/utils/swag.html b/docs/utils/swag.html
new file mode 100644
index 00000000..9f1e1843
--- /dev/null
+++ b/docs/utils/swag.html
@@ -0,0 +1,102 @@
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
+<meta name="generator" content="pdoc 0.10.0" />
+<title>laplace.utils.swag API documentation</title>
+<meta name="description" content="" />
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
+<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
+<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
+<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
+<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
+<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
+<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
+<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
+<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
+</head>
+<body>
+<main>
+<article id="content">
+<header>
+<h1 class="title">Module <code>laplace.utils.swag</code></h1>
+</header>
+<section id="section-intro">
+</section>
+<section>
+</section>
+<section>
+</section>
+<section>
+<h2 class="section-title" id="header-functions">Functions</h2>
+<dl>
+<dt id="laplace.utils.swag.fit_diagonal_swag_var"><code class="name flex">
+<span>def <span class="ident">fit_diagonal_swag_var</span></span>(<span>model, train_loader, criterion, n_snapshots_total=40, snapshot_freq=1, lr=0.01, momentum=0.9, weight_decay=0.0003, min_var=1e-30)</span>
+</code></dt>
+<dd>
+<div class="desc"><p>Fit diagonal SWAG [1], which estimates marginal variances of model parameters by
+computing the first and second moment of SGD iterates with a large learning rate.</p>
+<p>Implementation partly adapted from:
+- <a href="https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py">https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py</a>
+- <a href="https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py">https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py</a></p>
+<h2 id="references">References</h2>
+<p>[1] Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, AG.
+<a href="https://arxiv.org/abs/1902.02476"><em>A Simple Baseline for Bayesian Uncertainty in Deep Learning</em></a>.
+NeurIPS 2019.</p>
+<h2 id="parameters">Parameters</h2>
+<dl>
+<dt><strong><code>model</code></strong> :&ensp;<code>torch.nn.Module</code></dt>
+<dd>&nbsp;</dd>
+<dt><strong><code>train_loader</code></strong> :&ensp;<code>torch.data.utils.DataLoader</code></dt>
+<dd>training data loader to use for snapshot collection</dd>
+<dt><strong><code>criterion</code></strong> :&ensp;<code>torch.nn.CrossEntropyLoss</code> or <code>torch.nn.MSELoss</code></dt>
+<dd>loss function to use for snapshot collection</dd>
+<dt><strong><code>n_snapshots_total</code></strong> :&ensp;<code>int</code></dt>
+<dd>total number of model snapshots to collect</dd>
+<dt><strong><code>snapshot_freq</code></strong> :&ensp;<code>int</code></dt>
+<dd>snapshot collection frequency (in epochs)</dd>
+<dt><strong><code>lr</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD learning rate for collecting snapshots</dd>
+<dt><strong><code>momentum</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD momentum</dd>
+<dt><strong><code>weight_decay</code></strong> :&ensp;<code>float</code></dt>
+<dd>SGD weight decay</dd>
+<dt><strong><code>min_var</code></strong> :&ensp;<code>float</code></dt>
+<dd>minimum parameter variance to clamp to (for numerical stability)</dd>
+</dl>
+<h2 id="returns">Returns</h2>
+<dl>
+<dt><strong><code>param_variances</code></strong> :&ensp;<code>torch.Tensor</code></dt>
+<dd>vector of marginal variances for each model parameter</dd>
+</dl></div>
+</dd>
+</dl>
+</section>
+<section>
+</section>
+</article>
+<nav id="sidebar">
+<h1>Index</h1>
+<div class="toc">
+<ul></ul>
+</div>
+<ul id="index">
+<li><h3>Super-module</h3>
+<ul>
+<li><code><a title="laplace.utils" href="index.html">laplace.utils</a></code></li>
+</ul>
+</li>
+<li><h3><a href="#header-functions">Functions</a></h3>
+<ul class="">
+<li><code><a title="laplace.utils.swag.fit_diagonal_swag_var" href="#laplace.utils.swag.fit_diagonal_swag_var">fit_diagonal_swag_var</a></code></li>
+</ul>
+</li>
+</ul>
+</nav>
+</main>
+<footer id="footer">
+<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
+</footer>
+</body>
+</html>
\ No newline at end of file
diff --git a/docs/utils.html b/docs/utils/utils.html
similarity index 84%
rename from docs/utils.html
rename to docs/utils/utils.html
index 633a3565..aa721218 100644
--- a/docs/utils.html
+++ b/docs/utils/utils.html
@@ -4,7 +4,7 @@
 <meta charset="utf-8">
 <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
 <meta name="generator" content="pdoc 0.10.0" />
-<title>laplace.utils API documentation</title>
+<title>laplace.utils.utils API documentation</title>
 <meta name="description" content="" />
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
 <link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
@@ -20,7 +20,7 @@
 <main>
 <article id="content">
 <header>
-<h1 class="title">Module <code>laplace.utils</code></h1>
+<h1 class="title">Module <code>laplace.utils.utils</code></h1>
 </header>
 <section id="section-intro">
 </section>
@@ -31,19 +31,19 @@ <h1 class="title">Module <code>laplace.utils</code></h1>
 <section>
 <h2 class="section-title" id="header-functions">Functions</h2>
 <dl>
-<dt id="laplace.utils.get_nll"><code class="name flex">
+<dt id="laplace.utils.utils.get_nll"><code class="name flex">
 <span>def <span class="ident">get_nll</span></span>(<span>out_dist, targets)</span>
 </code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt id="laplace.utils.validate"><code class="name flex">
+<dt id="laplace.utils.utils.validate"><code class="name flex">
 <span>def <span class="ident">validate</span></span>(<span>laplace, val_loader, pred_type='glm', link_approx='probit', n_samples=100)</span>
 </code></dt>
 <dd>
 <div class="desc"></div>
 </dd>
-<dt id="laplace.utils.parameters_per_layer"><code class="name flex">
+<dt id="laplace.utils.utils.parameters_per_layer"><code class="name flex">
 <span>def <span class="ident">parameters_per_layer</span></span>(<span>model)</span>
 </code></dt>
 <dd>
@@ -59,7 +59,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.invsqrt_precision"><code class="name flex">
+<dt id="laplace.utils.utils.invsqrt_precision"><code class="name flex">
 <span>def <span class="ident">invsqrt_precision</span></span>(<span>M)</span>
 </code></dt>
 <dd>
@@ -75,7 +75,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.kron"><code class="name flex">
+<dt id="laplace.utils.utils.kron"><code class="name flex">
 <span>def <span class="ident">kron</span></span>(<span>t1, t2)</span>
 </code></dt>
 <dd>
@@ -93,7 +93,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.diagonal_add_scalar"><code class="name flex">
+<dt id="laplace.utils.utils.diagonal_add_scalar"><code class="name flex">
 <span>def <span class="ident">diagonal_add_scalar</span></span>(<span>X, value)</span>
 </code></dt>
 <dd>
@@ -111,7 +111,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.symeig"><code class="name flex">
+<dt id="laplace.utils.utils.symeig"><code class="name flex">
 <span>def <span class="ident">symeig</span></span>(<span>M)</span>
 </code></dt>
 <dd>
@@ -130,7 +130,7 @@ <h2 id="returns">Returns</h2>
 <dd>eigenvectors</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.block_diag"><code class="name flex">
+<dt id="laplace.utils.utils.block_diag"><code class="name flex">
 <span>def <span class="ident">block_diag</span></span>(<span>blocks)</span>
 </code></dt>
 <dd>
@@ -146,7 +146,7 @@ <h2 id="returns">Returns</h2>
 <dd>&nbsp;</dd>
 </dl></div>
 </dd>
-<dt id="laplace.utils.expand_prior_precision"><code class="name flex">
+<dt id="laplace.utils.utils.expand_prior_precision"><code class="name flex">
 <span>def <span class="ident">expand_prior_precision</span></span>(<span>prior_prec, model)</span>
 </code></dt>
 <dd>
@@ -160,7 +160,7 @@ <h2 id="parameters">Parameters</h2>
 </dl>
 <h2 id="returns">Returns</h2>
 <dl>
-<dt><strong><code>expanded_prior_prec</code></strong> :&ensp;<code>torch.Tensor </code></dt>
+<dt><strong><code>expanded_prior_prec</code></strong> :&ensp;<code>torch.Tensor</code></dt>
 <dd>expanded prior precision has the same shape as model parameters</dd>
 </dl></div>
 </dd>
@@ -177,20 +177,20 @@ <h1>Index</h1>
 <ul id="index">
 <li><h3>Super-module</h3>
 <ul>
-<li><code><a title="laplace" href="index.html">laplace</a></code></li>
+<li><code><a title="laplace.utils" href="index.html">laplace.utils</a></code></li>
 </ul>
 </li>
 <li><h3><a href="#header-functions">Functions</a></h3>
 <ul class="">
-<li><code><a title="laplace.utils.get_nll" href="#laplace.utils.get_nll">get_nll</a></code></li>
-<li><code><a title="laplace.utils.validate" href="#laplace.utils.validate">validate</a></code></li>
-<li><code><a title="laplace.utils.parameters_per_layer" href="#laplace.utils.parameters_per_layer">parameters_per_layer</a></code></li>
-<li><code><a title="laplace.utils.invsqrt_precision" href="#laplace.utils.invsqrt_precision">invsqrt_precision</a></code></li>
-<li><code><a title="laplace.utils.kron" href="#laplace.utils.kron">kron</a></code></li>
-<li><code><a title="laplace.utils.diagonal_add_scalar" href="#laplace.utils.diagonal_add_scalar">diagonal_add_scalar</a></code></li>
-<li><code><a title="laplace.utils.symeig" href="#laplace.utils.symeig">symeig</a></code></li>
-<li><code><a title="laplace.utils.block_diag" href="#laplace.utils.block_diag">block_diag</a></code></li>
-<li><code><a title="laplace.utils.expand_prior_precision" href="#laplace.utils.expand_prior_precision">expand_prior_precision</a></code></li>
+<li><code><a title="laplace.utils.utils.get_nll" href="#laplace.utils.utils.get_nll">get_nll</a></code></li>
+<li><code><a title="laplace.utils.utils.validate" href="#laplace.utils.utils.validate">validate</a></code></li>
+<li><code><a title="laplace.utils.utils.parameters_per_layer" href="#laplace.utils.utils.parameters_per_layer">parameters_per_layer</a></code></li>
+<li><code><a title="laplace.utils.utils.invsqrt_precision" href="#laplace.utils.utils.invsqrt_precision">invsqrt_precision</a></code></li>
+<li><code><a title="laplace.utils.utils.kron" href="#laplace.utils.utils.kron">kron</a></code></li>
+<li><code><a title="laplace.utils.utils.diagonal_add_scalar" href="#laplace.utils.utils.diagonal_add_scalar">diagonal_add_scalar</a></code></li>
+<li><code><a title="laplace.utils.utils.symeig" href="#laplace.utils.utils.symeig">symeig</a></code></li>
+<li><code><a title="laplace.utils.utils.block_diag" href="#laplace.utils.utils.block_diag">block_diag</a></code></li>
+<li><code><a title="laplace.utils.utils.expand_prior_precision" href="#laplace.utils.utils.expand_prior_precision">expand_prior_precision</a></code></li>
 </ul>
 </li>
 </ul>
diff --git a/laplace/__init__.py b/laplace/__init__.py
index fd71fb1b..429b9f0d 100644
--- a/laplace/__init__.py
+++ b/laplace/__init__.py
@@ -9,6 +9,7 @@
 
 from laplace.baselaplace import BaseLaplace, ParametricLaplace, FullLaplace, KronLaplace, DiagLaplace, LowRankLaplace
 from laplace.lllaplace import LLLaplace, FullLLLaplace, KronLLLaplace, DiagLLLaplace
+from laplace.subnetlaplace import SubnetLaplace
 from laplace.laplace import Laplace
 from laplace.marglik_training import marglik_training
 
@@ -17,4 +18,5 @@
            'FullLaplace', 'KronLaplace', 'DiagLaplace', 'LowRankLaplace',  # all-weights
            'LLLaplace',  # base-class last-layer
            'FullLLLaplace', 'KronLLLaplace', 'DiagLLLaplace',  # last-layer
+           'SubnetLaplace',  # subnetwork
            'marglik_training']  # methods
diff --git a/laplace/baselaplace.py b/laplace/baselaplace.py
index 55b66a54..09c05ca3 100644
--- a/laplace/baselaplace.py
+++ b/laplace/baselaplace.py
@@ -4,8 +4,7 @@
 from torch.nn.utils import parameters_to_vector, vector_to_parameters
 from torch.distributions import MultivariateNormal, Dirichlet, Normal
 
-from laplace.utils import parameters_per_layer, invsqrt_precision, get_nll, validate
-from laplace.matrix import Kron
+from laplace.utils import parameters_per_layer, invsqrt_precision, get_nll, validate, Kron
 from laplace.curvature import BackPackGGN, AsdlHessian
 
 
@@ -594,7 +593,7 @@ def predictive_samples(self, x, pred_type='glm', n_samples=100):
 
     @torch.enable_grad()
     def _glm_predictive_distribution(self, X):
-        Js, f_mu = self.backend.jacobians(self.model, X)
+        Js, f_mu = self.backend.jacobians(X)
         f_var = self.functional_variance(Js)
         return f_mu.detach(), f_var.detach()
 
@@ -754,7 +753,7 @@ class KronLaplace(ParametricLaplace):
     Mathematically, we have for each parameter group, e.g., torch.nn.Module,
     that \\P\\approx Q \\otimes H\\.
     See `BaseLaplace` for the full interface and see
-    `laplace.matrix.Kron` and `laplace.matrix.KronDecomposed` for the structure of
+    `laplace.utils.matrix.Kron` and `laplace.utils.matrix.KronDecomposed` for the structure of
     the Kronecker factors. `Kron` is used to aggregate factors by summing up and
     `KronDecomposed` is used to add the prior, a Hessian factor (e.g. temperature),
     and computing posterior covariances, marginal likelihood, etc.
@@ -812,7 +811,7 @@ def posterior_precision(self):
 
         Returns
         -------
-        precision : `laplace.matrix.KronDecomposed`
+        precision : `laplace.utils.matrix.KronDecomposed`
         """
         self._check_H_init()
         return self.H * self._H_factor + self.prior_precision
diff --git a/laplace/curvature/asdl.py b/laplace/curvature/asdl.py
index f9900dfd..a49b98c1 100644
--- a/laplace/curvature/asdl.py
+++ b/laplace/curvature/asdl.py
@@ -9,8 +9,7 @@
 from asdfghjkl.gradient import batch_gradient
 
 from laplace.curvature import CurvatureInterface, GGNInterface, EFInterface
-from laplace.matrix import Kron
-from laplace.utils import _is_batchnorm
+from laplace.utils import Kron, _is_batchnorm
 
 EPS = 1e-6
 
@@ -19,14 +18,12 @@ class AsdlInterface(CurvatureInterface):
     """Interface for asdfghjkl backend.
     """
 
-    @staticmethod
-    def jacobians(model, x):
+    def jacobians(self, x):
         """Compute Jacobians \\(\\nabla_\\theta f(x;\\theta)\\) at current parameter \\(\\theta\\)
         using asdfghjkl's gradient per output dimension.
 
         Parameters
         ----------
-        model : torch.nn.Module
         x : torch.Tensor
             input data `(batch, input_shape)` on compatible device with model.
 
@@ -38,12 +35,15 @@ def jacobians(model, x):
             output function `(batch, outputs)`
         """
         Js = list()
-        for i in range(model.output_size):
+        for i in range(self.model.output_size):
             def loss_fn(outputs, targets):
                 return outputs[:, i].sum()
 
-            f = batch_gradient(model, loss_fn, x, None).detach()
-            Js.append(_get_batch_grad(model))
+            f = batch_gradient(self.model, loss_fn, x, None).detach()
+            Jk = _get_batch_grad(self.model)
+            if self.subnetwork_indices is not None:
+                Jk = Jk[:, self.subnetwork_indices]
+            Js.append(Jk)
         Js = torch.stack(Js, dim=1)
         return Js, f
 
@@ -65,6 +65,8 @@ def gradients(self, x, y):
         """
         f = batch_gradient(self.model, self.lossfunc, x, y).detach()
         Gs = _get_batch_grad(self._model)
+        if self.subnetwork_indices is not None:
+            Gs = Gs[:, self.subnetwork_indices]
         loss = self.lossfunc(f, y)
         return Gs, loss
 
@@ -163,10 +165,10 @@ def eig_lowrank(self, data_loader):
 class AsdlGGN(AsdlInterface, GGNInterface):
     """Implementation of the `GGNInterface` using asdfghjkl.
     """
-    def __init__(self, model, likelihood, last_layer=False, stochastic=False):
+    def __init__(self, model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False):
         if likelihood != 'classification':
             raise ValueError('This backend only supports classification currently.')
-        super().__init__(model, likelihood, last_layer)
+        super().__init__(model, likelihood, last_layer, subnetwork_indices)
         self.stochastic = stochastic
 
     @property
diff --git a/laplace/curvature/backpack.py b/laplace/curvature/backpack.py
index 885ee2b9..8cffc154 100644
--- a/laplace/curvature/backpack.py
+++ b/laplace/curvature/backpack.py
@@ -5,25 +5,23 @@
 from backpack.context import CTX
 
 from laplace.curvature import CurvatureInterface, GGNInterface, EFInterface
-from laplace.matrix import Kron
+from laplace.utils import Kron
 
 
 class BackPackInterface(CurvatureInterface):
     """Interface for Backpack backend.
     """
-    def __init__(self, model, likelihood, last_layer=False):
-        super().__init__(model, likelihood, last_layer)
+    def __init__(self, model, likelihood, last_layer=False, subnetwork_indices=None):
+        super().__init__(model, likelihood, last_layer, subnetwork_indices)
         extend(self._model)
         extend(self.lossfunc)
 
-    @staticmethod
-    def jacobians(model, x):
+    def jacobians(self, x):
         """Compute Jacobians \\(\\nabla_{\\theta} f(x;\\theta)\\) at current parameter \\(\\theta\\)
         using backpack's BatchGrad per output dimension.
 
         Parameters
         ----------
-        model : torch.nn.Module
         x : torch.Tensor
             input data `(batch, input_shape)` on compatible device with model.
 
@@ -34,7 +32,7 @@ def jacobians(model, x):
         f : torch.Tensor
             output function `(batch, outputs)`
         """
-        model = extend(model)
+        model = extend(self.model)
         to_stack = []
         for i in range(model.output_size):
             model.zero_grad()
@@ -49,6 +47,8 @@ def jacobians(model, x):
                     to_cat.append(param.grad_batch.detach().reshape(x.shape[0], -1))
                     delattr(param, 'grad_batch')
                 Jk = torch.cat(to_cat, dim=1)
+                if self.subnetwork_indices is not None:
+                    Jk = Jk[:, self.subnetwork_indices]
             to_stack.append(Jk)
             if i == 0:
                 f = out.detach()
@@ -83,14 +83,16 @@ def gradients(self, x, y):
             loss.backward()
         Gs = torch.cat([p.grad_batch.data.flatten(start_dim=1)
                         for p in self._model.parameters()], dim=1)
+        if self.subnetwork_indices is not None:
+            Gs = Gs[:, self.subnetwork_indices]
         return Gs, loss
 
 
 class BackPackGGN(BackPackInterface, GGNInterface):
     """Implementation of the `GGNInterface` using Backpack.
     """
-    def __init__(self, model, likelihood, last_layer=False, stochastic=False):
-        super().__init__(model, likelihood, last_layer)
+    def __init__(self, model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False):
+        super().__init__(model, likelihood, last_layer, subnetwork_indices)
         self.stochastic = stochastic
 
     def _get_diag_ggn(self):
diff --git a/laplace/curvature/curvature.py b/laplace/curvature/curvature.py
index 96043066..72b0a041 100644
--- a/laplace/curvature/curvature.py
+++ b/laplace/curvature/curvature.py
@@ -11,11 +11,14 @@ class CurvatureInterface:
 
     Parameters
     ----------
-    model : torch.nn.Module or `laplace.feature_extractor.FeatureExtractor`
+    model : torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
         torch model (neural network)
     likelihood : {'classification', 'regression'}
     last_layer : bool, default=False
         only consider curvature of last layer
+    subnetwork_indices : torch.Tensor, default=None
+        indices of the vectorized model parameters that define the subnetwork
+        to apply the Laplace approximation over
 
     Attributes
     ----------
@@ -24,11 +27,12 @@ class CurvatureInterface:
         conversion factor between torch losses and base likelihoods
         For example, \\(\\frac{1}{2}\\) to get to \\(\\mathcal{N}(f, 1)\\) from MSELoss.
     """
-    def __init__(self, model, likelihood, last_layer=False):
+    def __init__(self, model, likelihood, last_layer=False, subnetwork_indices=None):
         assert likelihood in ['regression', 'classification']
         self.likelihood = likelihood
         self.model = model
         self.last_layer = last_layer
+        self.subnetwork_indices = subnetwork_indices
         if likelihood == 'regression':
             self.lossfunc = MSELoss(reduction='sum')
             self.factor = 0.5
@@ -40,13 +44,11 @@ def __init__(self, model, likelihood, last_layer=False):
     def _model(self):
         return self.model.last_layer if self.last_layer else self.model
 
-    @staticmethod
-    def jacobians(model, x):
+    def jacobians(self, x):
         """Compute Jacobians \\(\\nabla_\\theta f(x;\\theta)\\) at current parameter \\(\\theta\\).
 
         Parameters
         ----------
-        model : torch.nn.Module
         x : torch.Tensor
             input data `(batch, input_shape)` on compatible device with model.
 
@@ -59,14 +61,12 @@ def jacobians(model, x):
         """
         raise NotImplementedError
 
-    @staticmethod
-    def last_layer_jacobians(model, x):
+    def last_layer_jacobians(self, x):
         """Compute Jacobians \\(\\nabla_{\\theta_\\textrm{last}} f(x;\\theta_\\textrm{last})\\) 
         only at current last-layer parameter \\(\\theta_{\\textrm{last}}\\).
 
         Parameters
         ----------
-        model : laplace.feature_extractor.FeatureExtractor
         x : torch.Tensor
 
         Returns
@@ -76,7 +76,7 @@ def last_layer_jacobians(model, x):
         f : torch.Tensor
             output function `(batch, outputs)`
         """
-        f, phi = model.forward_with_features(x)
+        f, phi = self.model.forward_with_features(x)
         bsize = phi.shape[0]
         output_size = f.shape[-1]
 
@@ -84,7 +84,7 @@ def last_layer_jacobians(model, x):
         identity = torch.eye(output_size, device=x.device).unsqueeze(0).tile(bsize, 1, 1)
         # Jacobians are batch x output x params
         Js = torch.einsum('kp,kij->kijp', phi, identity).reshape(bsize, output_size, -1)
-        if model.last_layer.bias is not None:
+        if self.model.last_layer.bias is not None:
             Js = torch.cat([Js, identity], dim=2)
 
         return Js, f.detach()
@@ -143,7 +143,7 @@ def kron(self, x, y, **kwargs):
         Returns
         -------
         loss : torch.Tensor
-        H : `laplace.matrix.Kron`
+        H : `laplace.utils.matrix.Kron`
             Kronecker factored Hessian approximation.
         """
         raise NotImplementedError
@@ -175,17 +175,20 @@ class GGNInterface(CurvatureInterface):
 
     Parameters
     ----------
-    model : torch.nn.Module or `laplace.feature_extractor.FeatureExtractor`
+    model : torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
         torch model (neural network)
     likelihood : {'classification', 'regression'}
     last_layer : bool, default=False
         only consider curvature of last layer
+    subnetwork_indices : torch.Tensor, default=None
+        indices of the vectorized model parameters that define the subnetwork
+        to apply the Laplace approximation over
     stochastic : bool, default=False
         Fisher if stochastic else GGN
     """
-    def __init__(self, model, likelihood, last_layer=False, stochastic=False):
+    def __init__(self, model, likelihood, last_layer=False, subnetwork_indices=None, stochastic=False):
         self.stochastic = stochastic
-        super().__init__(model, likelihood, last_layer)
+        super().__init__(model, likelihood, last_layer, subnetwork_indices)
 
     def _get_full_ggn(self, Js, f, y):
         """Compute full GGN from Jacobians.
@@ -237,9 +240,9 @@ def full(self, x, y, **kwargs):
             raise ValueError('Stochastic approximation not implemented for full GGN.')
 
         if self.last_layer:
-            Js, f = self.last_layer_jacobians(self.model, x)
+            Js, f = self.last_layer_jacobians(x)
         else:
-            Js, f = self.jacobians(self.model, x)
+            Js, f = self.jacobians(x)
         loss, H_ggn = self._get_full_ggn(Js, f, y)
 
         return loss, H_ggn
@@ -251,11 +254,14 @@ class EFInterface(CurvatureInterface):
 
     Parameters
     ----------
-    model : torch.nn.Module or `laplace.feature_extractor.FeatureExtractor`
+    model : torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
         torch model (neural network)
     likelihood : {'classification', 'regression'}
     last_layer : bool, default=False
         only consider curvature of last layer
+    subnetwork_indices : torch.Tensor, default=None
+        indices of the vectorized model parameters that define the subnetwork
+        to apply the Laplace approximation over
 
     Attributes
     ----------
diff --git a/laplace/laplace.py b/laplace/laplace.py
index a006f170..9631f7c9 100644
--- a/laplace/laplace.py
+++ b/laplace/laplace.py
@@ -10,7 +10,7 @@ def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure
     ----------
     model : torch.nn.Module
     likelihood : {'classification', 'regression'}
-    subset_of_weights : {'last_layer', 'all'}, default='last_layer'
+    subset_of_weights : {'last_layer', 'subnetwork', 'all'}, default='last_layer'
         subset of weights to consider for inference
     hessian_structure : {'diag', 'kron', 'full', 'lowrank'}, default='kron'
         structure of the Hessian approximation
@@ -20,6 +20,9 @@ def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure
     laplace : ParametricLaplace
         chosen subclass of ParametricLaplace instantiated with additional arguments
     """
+    if subset_of_weights == 'subnetwork' and hessian_structure != 'full':
+        raise ValueError('Subnetwork Laplace requires using a full Hessian approximation!')
+
     laplace_map = {subclass._key: subclass for subclass in _all_subclasses(ParametricLaplace)
                    if hasattr(subclass, '_key')}
     laplace_class = laplace_map[(subset_of_weights, hessian_structure)]
diff --git a/laplace/lllaplace.py b/laplace/lllaplace.py
index b232c262..73c552df 100644
--- a/laplace/lllaplace.py
+++ b/laplace/lllaplace.py
@@ -3,9 +3,7 @@
 from torch.nn.utils import parameters_to_vector, vector_to_parameters
 
 from laplace.baselaplace import ParametricLaplace, FullLaplace, KronLaplace, DiagLaplace
-from laplace.feature_extractor import FeatureExtractor
-
-from laplace.matrix import Kron
+from laplace.utils import FeatureExtractor, Kron
 from laplace.curvature import BackPackGGN
 
 
@@ -36,7 +34,7 @@ class LLLaplace(ParametricLaplace):
 
     Parameters
     ----------
-    model : torch.nn.Module or `laplace.feature_extractor.FeatureExtractor`
+    model : torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
     likelihood : {'classification', 'regression'}
         determines the log likelihood Hessian approximation
     sigma_noise : torch.Tensor or float, default=1
@@ -117,7 +115,7 @@ def fit(self, train_loader, override=True):
         self.mean = parameters_to_vector(self.model.last_layer.parameters()).detach()
 
     def _glm_predictive_distribution(self, X):
-        Js, f_mu = self.backend.last_layer_jacobians(self.model, X)
+        Js, f_mu = self.backend.last_layer_jacobians(X)
         f_var = self.functional_variance(Js)
         return f_mu.detach(), f_var.detach()
 
@@ -168,7 +166,7 @@ class KronLLLaplace(LLLaplace, KronLaplace):
     Mathematically, we have for the last parameter group, i.e., torch.nn.Linear,
     that \\P\\approx Q \\otimes H\\.
     See `KronLaplace`, `LLLaplace`, and `BaseLaplace` for the full interface and see
-    `laplace.matrix.Kron` and `laplace.matrix.KronDecomposed` for the structure of
+    `laplace.utils.matrix.Kron` and `laplace.utils.matrix.KronDecomposed` for the structure of
     the Kronecker factors. `Kron` is used to aggregate factors by summing up and
     `KronDecomposed` is used to add the prior, a Hessian factor (e.g. temperature),
     and computing posterior covariances, marginal likelihood, etc.
diff --git a/laplace/subnetlaplace.py b/laplace/subnetlaplace.py
new file mode 100644
index 00000000..86178ba6
--- /dev/null
+++ b/laplace/subnetlaplace.py
@@ -0,0 +1,130 @@
+import torch
+from torch.distributions import MultivariateNormal
+
+from laplace.baselaplace import FullLaplace
+from laplace.curvature import BackPackGGN
+
+
+__all__ = ['SubnetLaplace']
+
+
+class SubnetLaplace(FullLaplace):
+    """Class for subnetwork Laplace, which computes the Laplace approximation over
+    just a subset of the model parameters (i.e. a subnetwork within the neural network),
+    as proposed in [1]. Subnetwork Laplace only supports a full Hessian approximation; other
+    approximations could be used in theory, but would not make as much sense conceptually.
+
+    A Laplace approximation is represented by a MAP which is given by the
+    `model` parameter and a posterior precision or covariance specifying
+    a Gaussian distribution \\(\\mathcal{N}(\\theta_{MAP}, P^{-1})\\).
+    Here, only a subset of the model parameters (i.e. a subnetwork of the
+    neural network) are treated probabilistically.
+    The goal of this class is to compute the posterior precision \\(P\\)
+    which sums as
+    \\[
+        P = \\sum_{n=1}^N \\nabla^2_\\theta \\log p(\\mathcal{D}_n \\mid \\theta)
+        \\vert_{\\theta_{MAP}} + \\nabla^2_\\theta \\log p(\\theta) \\vert_{\\theta_{MAP}}.
+    \\]
+    The prior is assumed to be Gaussian and therefore we have a simple form for
+    \\(\\nabla^2_\\theta \\log p(\\theta) \\vert_{\\theta_{MAP}} = P_0 \\).
+    In particular, we assume a scalar or diagonal prior precision so that in
+    all cases \\(P_0 = \\textrm{diag}(p_0)\\) and the structure of \\(p_0\\) can be varied.
+
+    The subnetwork Laplace approximation only supports a full, i.e., dense, log likelihood
+    Hessian approximation and hence posterior precision.  Based on the chosen `backend`
+    parameter, the full approximation can be, for example, a generalized Gauss-Newton
+    matrix.  Mathematically, we have \\(P \\in \\mathbb{R}^{P \\times P}\\).
+    See `FullLaplace` and `BaseLaplace` for the full interface.
+
+    References
+    ----------
+    [1] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM.
+    [*Bayesian Deep Learning via Subnetwork Inference*](https://arxiv.org/abs/2010.14689). 
+    ICML 2021.
+
+    Parameters
+    ----------
+    model : torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
+    likelihood : {'classification', 'regression'}
+        determines the log likelihood Hessian approximation
+    subnetwork_indices : torch.LongTensor
+        indices of the vectorized model parameters
+        (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)
+        that define the subnetwork to apply the Laplace approximation over
+    sigma_noise : torch.Tensor or float, default=1
+        observation noise for the regression setting; must be 1 for classification
+    prior_precision : torch.Tensor or float, default=1
+        prior precision of a Gaussian prior (= weight decay);
+        can be scalar, per-layer, or diagonal in the most general case
+    prior_mean : torch.Tensor or float, default=0
+        prior mean of a Gaussian prior, useful for continual learning
+    temperature : float, default=1
+        temperature of the likelihood; lower temperature leads to more
+        concentrated posterior and vice versa.
+    backend : subclasses of `laplace.curvature.CurvatureInterface`
+        backend for access to curvature/Hessian approximations
+    backend_kwargs : dict, default=None
+        arguments passed to the backend on initialization, for example to
+        set the number of MC samples for stochastic approximations.
+    """
+    # key to map to correct subclass of BaseLaplace, (subset of weights, Hessian structure)
+    _key = ('subnetwork', 'full')
+
+    def __init__(self, model, likelihood, subnetwork_indices, sigma_noise=1., prior_precision=1.,
+                 prior_mean=0., temperature=1., backend=BackPackGGN, backend_kwargs=None):
+        self.H = None
+        super().__init__(model, likelihood, sigma_noise=sigma_noise,
+                         prior_precision=prior_precision, prior_mean=prior_mean,
+                         temperature=temperature, backend=backend, backend_kwargs=backend_kwargs)
+        # check validity of subnetwork indices and pass them to backend
+        self._check_subnetwork_indices(subnetwork_indices)
+        self.backend.subnetwork_indices = subnetwork_indices
+        self.n_params_subnet = len(subnetwork_indices)
+        self._init_H()
+
+    def _init_H(self):
+        self.H = torch.zeros(self.n_params_subnet, self.n_params_subnet, device=self._device)
+
+    def _check_subnetwork_indices(self, subnetwork_indices):
+        """Check that subnetwork indices are valid indices of the vectorized model parameters
+           (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`).
+        """
+        if subnetwork_indices is None:
+            raise ValueError('Subnetwork indices cannot be None.')
+        elif not (isinstance(subnetwork_indices, torch.LongTensor) and
+            subnetwork_indices.numel() > 0 and len(subnetwork_indices.shape) == 1):
+            raise ValueError('Subnetwork indices must be non-empty 1-dimensional torch.LongTensor.')
+        elif not (len(subnetwork_indices[subnetwork_indices < 0]) == 0 and
+            len(subnetwork_indices[subnetwork_indices >= self.n_params]) == 0):
+            raise ValueError(f'Subnetwork indices must lie between 0 and n_params={self.n_params}.')
+        elif not (len(subnetwork_indices.unique()) == len(subnetwork_indices)):
+            raise ValueError('Subnetwork indices must not contain duplicate entries.')
+
+    @property
+    def prior_precision_diag(self):
+        """Obtain the diagonal prior precision \\(p_0\\) constructed from either
+        a scalar or diagonal prior precision.
+
+        Returns
+        -------
+        prior_precision_diag : torch.Tensor
+        """
+        if len(self.prior_precision) == 1:  # scalar
+            return self.prior_precision * torch.ones(self.n_params_subnet, device=self._device)
+
+        elif len(self.prior_precision) == self.n_params_subnet:  # diagonal
+            return self.prior_precision
+
+        else:
+            raise ValueError('Mismatch of prior and model. Diagonal or scalar prior.')
+
+    def sample(self, n_samples=100):
+        # sample parameters just of the subnetwork
+        subnet_mean = self.mean[self.backend.subnetwork_indices]
+        dist = MultivariateNormal(loc=subnet_mean, scale_tril=self.posterior_scale)
+        subnet_samples = dist.sample((n_samples,))
+
+        # set all other parameters to their MAP estimates
+        full_samples = self.mean.repeat(n_samples, 1)
+        full_samples[:, self.backend.subnetwork_indices] = subnet_samples
+        return full_samples
diff --git a/laplace/utils/__init__.py b/laplace/utils/__init__.py
new file mode 100644
index 00000000..10f559e0
--- /dev/null
+++ b/laplace/utils/__init__.py
@@ -0,0 +1,14 @@
+from laplace.utils.utils import get_nll, validate, parameters_per_layer, invsqrt_precision, _is_batchnorm, _is_valid_scalar, kron, diagonal_add_scalar, symeig, block_diag, expand_prior_precision
+from laplace.utils.feature_extractor import FeatureExtractor
+from laplace.utils.matrix import Kron, KronDecomposed
+from laplace.utils.swag import fit_diagonal_swag_var
+from laplace.utils.subnetmask import SubnetMask, RandomSubnetMask, LargestMagnitudeSubnetMask, LargestVarianceDiagLaplaceSubnetMask, LargestVarianceSWAGSubnetMask, ParamNameSubnetMask, ModuleNameSubnetMask, LastLayerSubnetMask
+
+
+__all__ = ['get_nll', 'validate', 'parameters_per_layer', 'invsqrt_precision', 'kron',
+		   'diagonal_add_scalar', 'symeig', 'block_diag', 'expand_prior_precision',
+		   'FeatureExtractor',
+           'Kron', 'KronDecomposed',
+		   'fit_diagonal_swag_var',
+		   'SubnetMask', 'RandomSubnetMask', 'LargestMagnitudeSubnetMask', 'LargestVarianceDiagLaplaceSubnetMask',
+		   'LargestVarianceSWAGSubnetMask', 'ParamNameSubnetMask', 'ModuleNameSubnetMask', 'LastLayerSubnetMask']
diff --git a/laplace/feature_extractor.py b/laplace/utils/feature_extractor.py
similarity index 100%
rename from laplace/feature_extractor.py
rename to laplace/utils/feature_extractor.py
diff --git a/laplace/matrix.py b/laplace/utils/matrix.py
similarity index 99%
rename from laplace/matrix.py
rename to laplace/utils/matrix.py
index 61c07ab5..14a84bfe 100644
--- a/laplace/matrix.py
+++ b/laplace/utils/matrix.py
@@ -6,6 +6,9 @@
 from laplace.utils import _is_valid_scalar, symeig, kron, block_diag
 
 
+__all__ = ['Kron', 'KronDecomposed']
+
+
 class Kron:
     """Kronecker factored approximate curvature representation for a corresponding
     neural network.
diff --git a/laplace/utils/subnetmask.py b/laplace/utils/subnetmask.py
new file mode 100644
index 00000000..00d73ff4
--- /dev/null
+++ b/laplace/utils/subnetmask.py
@@ -0,0 +1,359 @@
+from copy import deepcopy
+
+import torch
+from torch.nn import CrossEntropyLoss, MSELoss
+from torch.nn.utils import parameters_to_vector
+
+from laplace.utils import FeatureExtractor, fit_diagonal_swag_var
+
+
+__all__ = ['SubnetMask', 'RandomSubnetMask', 'LargestMagnitudeSubnetMask',
+           'LargestVarianceDiagLaplaceSubnetMask', 'LargestVarianceSWAGSubnetMask',
+           'ParamNameSubnetMask', 'ModuleNameSubnetMask', 'LastLayerSubnetMask']
+
+
+class SubnetMask:
+    """Baseclass for all subnetwork masks in this library (for subnetwork Laplace).
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    """
+    def __init__(self, model):
+        self.model = model
+        self.parameter_vector = parameters_to_vector(self.model.parameters()).detach()
+        self._n_params = len(self.parameter_vector)
+        self._device = next(self.model.parameters()).device
+        self._indices = None
+        self._n_params_subnet = None
+
+    def _check_select(self):
+        if self._indices is None:
+            raise AttributeError('Subnetwork mask not selected. Run select() first.')
+
+    @property
+    def indices(self):
+        self._check_select()
+        return self._indices
+
+    @property
+    def n_params_subnet(self):
+        if self._n_params_subnet is None:
+            self._check_select()
+            self._n_params_subnet = len(self._indices)
+        return self._n_params_subnet
+
+    def convert_subnet_mask_to_indices(self, subnet_mask):
+        """Converts a subnetwork mask into subnetwork indices.
+
+        Parameters
+        ----------
+        subnet_mask : torch.Tensor
+            a binary vector of size (n_params) where 1s locate the subnetwork parameters
+            within the vectorized model parameters
+            (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)
+
+        Returns
+        -------
+        subnet_mask_indices : torch.LongTensor
+            a vector of indices of the vectorized model parameters
+            (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)
+            that define the subnetwork
+        """
+        if not isinstance(subnet_mask, torch.Tensor):
+            raise ValueError('Subnetwork mask needs to be torch.Tensor!')
+        elif subnet_mask.dtype not in [torch.int64, torch.int32, torch.int16, torch.int8,
+            torch.uint8, torch.bool] or len(subnet_mask.shape) != 1:
+            raise ValueError(
+                'Subnetwork mask needs to be 1-dimensional integral or boolean tensor!')
+        elif (len(subnet_mask) != self._n_params or len(subnet_mask[subnet_mask == 0])
+            + len(subnet_mask[subnet_mask == 1]) != self._n_params):
+            raise ValueError('Subnetwork mask needs to be a binary vector of'
+                             'size (n_params) where 1s locate the subnetwork'
+                             'parameters within the vectorized model parameters'
+                             '(i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)!')
+
+        subnet_mask_indices = subnet_mask.nonzero(as_tuple=True)[0]
+        return subnet_mask_indices
+
+    def select(self, train_loader=None):
+        """ Select the subnetwork mask.
+
+        Parameters
+        ----------
+        train_loader : torch.data.utils.DataLoader, default=None
+            each iterate is a training batch (X, y);
+            `train_loader.dataset` needs to be set to access \\(N\\), size of the data set
+
+        Returns
+        -------
+        subnet_mask_indices : torch.LongTensor
+            a vector of indices of the vectorized model parameters
+            (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)
+            that define the subnetwork
+        """
+        if self._indices is not None:
+            raise ValueError('Subnetwork mask already selected.')
+
+        subnet_mask = self.get_subnet_mask(train_loader)
+        self._indices = self.convert_subnet_mask_to_indices(subnet_mask)
+        return self._indices
+
+    def get_subnet_mask(self, train_loader):
+        """ Get the subnetwork mask.
+
+        Parameters
+        ----------
+        train_loader : torch.data.utils.DataLoader
+            each iterate is a training batch (X, y);
+            `train_loader.dataset` needs to be set to access \\(N\\), size of the data set
+
+        Returns
+        -------
+        subnet_mask: torch.Tensor
+            a binary vector of size (n_params) where 1s locate the subnetwork parameters
+            within the vectorized model parameters
+            (i.e. `torch.nn.utils.parameters_to_vector(model.parameters())`)
+        """
+        raise NotImplementedError
+
+
+class ScoreBasedSubnetMask(SubnetMask):
+    """Baseclass for subnetwork masks defined by selecting
+    the top-scoring parameters according to some criterion.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    n_params_subnet : int
+        number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
+    """
+    def __init__(self, model, n_params_subnet):
+        super().__init__(model)
+
+        if n_params_subnet is None:
+            raise ValueError(
+                'Need to pass number of subnetwork parameters when using subnetwork Laplace.')
+        if n_params_subnet > self._n_params:
+            raise ValueError(
+                f'Subnetwork ({n_params_subnet}) cannot be larger than model ({self._n_params}).')
+        self._n_params_subnet = n_params_subnet
+        self._param_scores = None
+
+    def compute_param_scores(self, train_loader):
+        raise NotImplementedError
+
+    def _check_param_scores(self):
+        if self._param_scores.shape != self.parameter_vector.shape:
+            raise ValueError('Parameter scores need to be of same shape as parameter vector.')
+
+    def get_subnet_mask(self, train_loader):
+        """ Get the subnetwork mask by (descendingly) ranking parameters based on their scores."""
+
+        if self._param_scores is None:
+            self._param_scores = self.compute_param_scores(train_loader)
+        self._check_param_scores()
+
+        idx = torch.argsort(self._param_scores, descending=True)[:self._n_params_subnet]
+        idx = idx.sort()[0]
+        subnet_mask = torch.zeros_like(self.parameter_vector).bool()
+        subnet_mask[idx] = 1
+        return subnet_mask
+
+
+class RandomSubnetMask(ScoreBasedSubnetMask):
+    """Subnetwork mask of parameters sampled uniformly at random."""
+    def compute_param_scores(self, train_loader):
+        return torch.rand_like(self.parameter_vector)
+
+
+class LargestMagnitudeSubnetMask(ScoreBasedSubnetMask):
+    """Subnetwork mask identifying the parameters with the largest magnitude. """
+    def compute_param_scores(self, train_loader):
+        return self.parameter_vector.abs()
+
+
+class LargestVarianceDiagLaplaceSubnetMask(ScoreBasedSubnetMask):
+    """Subnetwork mask identifying the parameters with the largest marginal variances
+    (estimated using a diagonal Laplace approximation over all model parameters).
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    n_params_subnet : int
+        number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
+    diag_laplace_model : `laplace.baselaplace.DiagLaplace`
+        diagonal Laplace model to use for variance estimation
+    """
+    def __init__(self, model, n_params_subnet, diag_laplace_model):
+        super().__init__(model, n_params_subnet)
+        self.diag_laplace_model = diag_laplace_model
+
+    def compute_param_scores(self, train_loader):
+        if train_loader is None:
+            raise ValueError('Need to pass train loader for subnet selection.')
+
+        self.diag_laplace_model.fit(train_loader)
+        return self.diag_laplace_model.posterior_variance
+
+
+class LargestVarianceSWAGSubnetMask(ScoreBasedSubnetMask):
+    """Subnetwork mask identifying the parameters with the largest marginal variances
+    (estimated using diagonal SWAG over all model parameters).
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    n_params_subnet : int
+        number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
+    likelihood : str
+        'classification' or 'regression'
+    swag_n_snapshots : int
+        number of model snapshots to collect for SWAG
+    swag_snapshot_freq : int
+        SWAG snapshot collection frequency (in epochs)
+    swag_lr : float
+        learning rate for SWAG snapshot collection
+    """
+    def __init__(self, model, n_params_subnet, likelihood='classification',
+                 swag_n_snapshots=40, swag_snapshot_freq=1, swag_lr=0.01):
+        super().__init__(model, n_params_subnet)
+        self.likelihood = likelihood
+        self.swag_n_snapshots = swag_n_snapshots
+        self.swag_snapshot_freq = swag_snapshot_freq
+        self.swag_lr = swag_lr
+
+    def compute_param_scores(self, train_loader):
+        if train_loader is None:
+            raise ValueError('Need to pass train loader for subnet selection.')
+
+        if self.likelihood == 'classification':
+            criterion = CrossEntropyLoss(reduction='mean')
+        elif self.likelihood == 'regression':
+            criterion = MSELoss(reduction='mean')
+        param_variances = fit_diagonal_swag_var(self.model, train_loader, criterion,
+                                                n_snapshots_total=self.swag_n_snapshots,
+                                                snapshot_freq=self.swag_snapshot_freq,
+                                                lr=self.swag_lr)
+        return param_variances
+
+
+class ParamNameSubnetMask(SubnetMask):
+    """Subnetwork mask corresponding to the specified parameters of the neural network.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    parameter_names: List[str]
+        list of names of the parameters (as in `model.named_parameters()`)
+        that define the subnetwork
+    """
+    def __init__(self, model, parameter_names):
+        super().__init__(model)
+        self._parameter_names = parameter_names
+        self._n_params_subnet = None
+
+    def _check_param_names(self):
+        param_names = deepcopy(self._parameter_names)
+        if len(param_names) == 0:
+            raise ValueError(f'Parameter name list cannot be empty.')
+
+        for name, _ in self.model.named_parameters():
+            if name in param_names:
+                param_names.remove(name)
+        if len(param_names) > 0:
+            raise ValueError(f'Parameters {param_names} do not exist in model.')
+
+    def get_subnet_mask(self, train_loader):
+        """ Get the subnetwork mask identifying the specified parameters."""
+
+        self._check_param_names()
+
+        subnet_mask_list = []
+        for name, param in self.model.named_parameters():
+            if name in self._parameter_names:
+                mask_method = torch.ones_like
+            else:
+                mask_method = torch.zeros_like
+            subnet_mask_list.append(mask_method(parameters_to_vector(param)))
+        subnet_mask = torch.cat(subnet_mask_list).bool()
+        return subnet_mask
+
+
+class ModuleNameSubnetMask(SubnetMask):
+    """Subnetwork mask corresponding to the specified modules of the neural network.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    parameter_names: List[str]
+        list of names of the modules (as in `model.named_modules()`) that define the subnetwork;
+        the modules cannot have children, i.e. need to be leaf modules
+    """
+    def __init__(self, model, module_names):
+        super().__init__(model)
+        self._module_names = module_names
+        self._n_params_subnet = None
+
+    def _check_module_names(self):
+        module_names = deepcopy(self._module_names)
+        if len(module_names) == 0:
+            raise ValueError(f'Module name list cannot be empty.')
+
+        for name, module in self.model.named_modules():
+            if name in module_names:
+                if len(list(module.children())) > 0:
+                    raise ValueError(f'Module "{name}" has children, which is not supported.')
+                elif len(list(module.parameters())) == 0:
+                    raise ValueError(f'Module "{name}" does not have any parameters.')
+                else:
+                    module_names.remove(name)
+        if len(module_names) > 0:
+            raise ValueError(f'Modules {module_names} do not exist in model.')
+
+    def get_subnet_mask(self, train_loader):
+        """ Get the subnetwork mask identifying the specified modules."""
+
+        self._check_module_names()
+
+        subnet_mask_list = []
+        for name, module in self.model.named_modules():
+            if len(list(module.children())) > 0 or len(list(module.parameters())) == 0:
+                continue
+            if name in self._module_names:
+                mask_method = torch.ones_like
+            else:
+                mask_method = torch.zeros_like
+            subnet_mask_list.append(mask_method(parameters_to_vector(module.parameters())))
+        subnet_mask = torch.cat(subnet_mask_list).bool()
+        return subnet_mask
+
+
+class LastLayerSubnetMask(ModuleNameSubnetMask):
+    """Subnetwork mask corresponding to the last layer of the neural network.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    last_layer_name: str, default=None
+        name of the model's last layer, if None it will be determined automatically
+    """
+    def __init__(self, model, last_layer_name=None):
+        super().__init__(model, None)
+        self._feature_extractor = FeatureExtractor(self.model, last_layer_name=last_layer_name)
+        self._n_params_subnet = None
+
+    def get_subnet_mask(self, train_loader):
+        """ Get the subnetwork mask identifying the last layer."""
+
+        if train_loader is None:
+            raise ValueError('Need to pass train loader for subnet selection.')
+
+        self._feature_extractor.eval()
+        if self._feature_extractor.last_layer is None:
+            X = next(iter(train_loader))[0]
+            with torch.no_grad():
+                self._feature_extractor.find_last_layer(X[:1].to(self._device))
+        self._module_names = [self._feature_extractor._last_layer_name]
+
+        return super().get_subnet_mask(train_loader)
diff --git a/laplace/utils/swag.py b/laplace/utils/swag.py
new file mode 100644
index 00000000..a6aba701
--- /dev/null
+++ b/laplace/utils/swag.py
@@ -0,0 +1,87 @@
+from copy import deepcopy
+
+import torch
+from torch.nn.utils import parameters_to_vector
+
+
+__all__ = ['fit_diagonal_swag_var']
+
+
+def _param_vector(model):
+    return parameters_to_vector(model.parameters()).detach()
+
+
+def fit_diagonal_swag_var(model, train_loader, criterion, n_snapshots_total=40, snapshot_freq=1,
+                          lr=0.01, momentum=0.9, weight_decay=3e-4, min_var=1e-30):
+    """
+    Fit diagonal SWAG [1], which estimates marginal variances of model parameters by
+    computing the first and second moment of SGD iterates with a large learning rate.
+    
+    Implementation partly adapted from:
+    - https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py
+    - https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py
+
+    References
+    ----------
+    [1] Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, AG. 
+    [*A Simple Baseline for Bayesian Uncertainty in Deep Learning*](https://arxiv.org/abs/1902.02476). 
+    NeurIPS 2019.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+    train_loader : torch.data.utils.DataLoader
+        training data loader to use for snapshot collection
+    criterion : torch.nn.CrossEntropyLoss or torch.nn.MSELoss
+        loss function to use for snapshot collection
+    n_snapshots_total : int
+        total number of model snapshots to collect
+    snapshot_freq : int
+        snapshot collection frequency (in epochs)
+    lr : float
+        SGD learning rate for collecting snapshots
+    momentum : float
+        SGD momentum
+    weight_decay : float
+        SGD weight decay
+    min_var : float
+        minimum parameter variance to clamp to (for numerical stability)
+
+    Returns
+    -------
+    param_variances : torch.Tensor
+        vector of marginal variances for each model parameter
+    """
+
+    # create a copy of the model to avoid undesired changes to the original model parameters
+    _model = deepcopy(model)
+    _model.train()
+    device = next(_model.parameters()).device
+
+    # initialize running estimates of first and second moment of model parameters
+    mean = torch.zeros_like(_param_vector(_model))
+    sq_mean = torch.zeros_like(_param_vector(_model))
+    n_snapshots = 0
+
+    # run SGD to collect model snapshots
+    optimizer = torch.optim.SGD(
+        _model.parameters(), lr=lr, momentum=momentum, weight_decay=weight_decay)
+    n_epochs = snapshot_freq * n_snapshots_total
+    for epoch in range(n_epochs):
+        for inputs, targets in train_loader:
+            inputs, targets = inputs.to(device), targets.to(device)
+            optimizer.zero_grad()
+            loss = criterion(_model(inputs), targets)
+            loss.backward()
+            optimizer.step()
+
+        if epoch % snapshot_freq == 0:
+            # update running estimates of first and second moment of model parameters
+            old_fac, new_fac = n_snapshots / (n_snapshots + 1), 1 / (n_snapshots + 1)
+            mean = mean * old_fac + _param_vector(_model) * new_fac
+            sq_mean = sq_mean * old_fac + _param_vector(_model) ** 2 * new_fac
+            n_snapshots += 1
+
+    # compute marginal parameter variances, Var[P] = E[P^2] - E[P]^2
+    param_variances = torch.clamp(sq_mean - mean ** 2, min_var)
+    return param_variances
diff --git a/laplace/utils.py b/laplace/utils/utils.py
similarity index 97%
rename from laplace/utils.py
rename to laplace/utils/utils.py
index 5b059d31..a00dc2f4 100644
--- a/laplace/utils.py
+++ b/laplace/utils/utils.py
@@ -8,6 +8,10 @@
 from torch.distributions.multivariate_normal import _precision_to_scale_tril
 
 
+__all__ = ['get_nll', 'validate', 'parameters_per_layer', 'invsqrt_precision', 'kron',
+           'diagonal_add_scalar', 'symeig', 'block_diag', 'expand_prior_precision']
+
+
 def get_nll(out_dist, targets):
     return F.nll_loss(torch.log(out_dist), targets)
 
diff --git a/tests/test_baselaplace.py b/tests/test_baselaplace.py
index 75529be8..36fe8a16 100644
--- a/tests/test_baselaplace.py
+++ b/tests/test_baselaplace.py
@@ -12,7 +12,7 @@
 from torchvision.models import wide_resnet50_2
 
 from laplace.laplace import FullLaplace, KronLaplace, DiagLaplace, LowRankLaplace
-from laplace.matrix import KronDecomposed
+from laplace.utils import KronDecomposed
 from tests.utils import jacobians_naive
 
 
diff --git a/tests/test_feature_extractor.py b/tests/test_feature_extractor.py
index 37494d76..d3b95ad5 100644
--- a/tests/test_feature_extractor.py
+++ b/tests/test_feature_extractor.py
@@ -2,7 +2,7 @@
 import torch.nn as nn
 import torchvision.models as models
 
-from laplace.feature_extractor import FeatureExtractor
+from laplace.utils import FeatureExtractor
 
 
 class CNN(nn.Module):
diff --git a/tests/test_jacobians.py b/tests/test_jacobians.py
index 7a5a22ef..13d2466e 100644
--- a/tests/test_jacobians.py
+++ b/tests/test_jacobians.py
@@ -1,10 +1,9 @@
 import pytest
 import torch
 from torch import nn
-from torch.nn.utils import parameters_to_vector
 
 from laplace.curvature import AsdlInterface, BackPackInterface
-from laplace.feature_extractor import FeatureExtractor
+from laplace.utils import FeatureExtractor
 from tests.utils import jacobians_naive
 
 
@@ -35,10 +34,11 @@ def X():
     return torch.randn(200, 3)
 
 
-@pytest.mark.parametrize('backend', [AsdlInterface, BackPackInterface])
-def test_linear_jacobians(linear_model, X, backend):
+@pytest.mark.parametrize('backend_cls', [AsdlInterface, BackPackInterface])
+def test_linear_jacobians(linear_model, X, backend_cls):
     # jacobian of linear model is input X.
-    Js, f = backend.jacobians(linear_model, X)
+    backend = backend_cls(linear_model, 'classification')
+    Js, f = backend.jacobians(X)
     # into Jacs shape (batch_size, output_size, params)
     true_Js = X.reshape(len(X), 1, -1)
     assert true_Js.shape == Js.shape
@@ -46,10 +46,11 @@ def test_linear_jacobians(linear_model, X, backend):
     assert torch.allclose(f, linear_model(X), atol=1e-5)
 
 
-@pytest.mark.parametrize('backend', [AsdlInterface, BackPackInterface])
-def test_jacobians_singleoutput(singleoutput_model, X, backend):
+@pytest.mark.parametrize('backend_cls', [AsdlInterface, BackPackInterface])
+def test_jacobians_singleoutput(singleoutput_model, X, backend_cls):
     model = singleoutput_model
-    Js, f = backend.jacobians(model, X)
+    backend = backend_cls(model, 'classification')
+    Js, f = backend.jacobians(X)
     Js_naive, f_naive = jacobians_naive(model, X)
     assert Js.shape == Js_naive.shape
     assert torch.abs(Js-Js_naive).max() < 1e-6
@@ -57,10 +58,11 @@ def test_jacobians_singleoutput(singleoutput_model, X, backend):
     assert torch.allclose(f, f_naive)
 
 
-@pytest.mark.parametrize('backend', [AsdlInterface, BackPackInterface])
-def test_jacobians_multioutput(multioutput_model, X, backend):
+@pytest.mark.parametrize('backend_cls', [AsdlInterface, BackPackInterface])
+def test_jacobians_multioutput(multioutput_model, X, backend_cls):
     model = multioutput_model
-    Js, f = backend.jacobians(model, X)
+    backend = backend_cls(model, 'classification')
+    Js, f = backend.jacobians(X)
     Js_naive, f_naive = jacobians_naive(model, X)
     assert Js.shape == Js_naive.shape
     assert torch.abs(Js-Js_naive).max() < 1e-6
@@ -68,10 +70,11 @@ def test_jacobians_multioutput(multioutput_model, X, backend):
     assert torch.allclose(f, f_naive)
 
 
-@pytest.mark.parametrize('backend', [AsdlInterface, BackPackInterface])
-def test_last_layer_jacobians_singleoutput(singleoutput_model, X, backend):
+@pytest.mark.parametrize('backend_cls', [AsdlInterface, BackPackInterface])
+def test_last_layer_jacobians_singleoutput(singleoutput_model, X, backend_cls):
     model = FeatureExtractor(singleoutput_model)
-    Js, f = backend.last_layer_jacobians(model, X)
+    backend = backend_cls(model, 'classification')
+    Js, f = backend.last_layer_jacobians(X)
     _, phi = model.forward_with_features(X)
     Js_naive, f_naive = jacobians_naive(model.last_layer, phi)
     assert Js.shape == Js_naive.shape
@@ -80,10 +83,11 @@ def test_last_layer_jacobians_singleoutput(singleoutput_model, X, backend):
     assert torch.allclose(f, f_naive)
 
 
-@pytest.mark.parametrize('backend', [AsdlInterface, BackPackInterface])
-def test_last_layer_jacobians_multioutput(multioutput_model, X, backend):
+@pytest.mark.parametrize('backend_cls', [AsdlInterface, BackPackInterface])
+def test_last_layer_jacobians_multioutput(multioutput_model, X, backend_cls):
     model = FeatureExtractor(multioutput_model)
-    Js, f = backend.last_layer_jacobians(model, X)
+    backend = backend_cls(model, 'classification')
+    Js, f = backend.last_layer_jacobians(X)
     _, phi = model.forward_with_features(X)
     Js_naive, f_naive = jacobians_naive(model.last_layer, phi)
     assert Js.shape == Js_naive.shape
diff --git a/tests/test_lllaplace.py b/tests/test_lllaplace.py
index ccf581c5..0e6855aa 100644
--- a/tests/test_lllaplace.py
+++ b/tests/test_lllaplace.py
@@ -8,8 +8,8 @@
 from torch.distributions import Normal, Categorical
 from torchvision.models import wide_resnet50_2
 
-from laplace.lllaplace import LLLaplace, FullLLLaplace, KronLLLaplace, DiagLLLaplace
-from laplace.feature_extractor import FeatureExtractor
+from laplace.lllaplace import FullLLLaplace, KronLLLaplace, DiagLLLaplace
+from laplace.utils import FeatureExtractor
 from tests.utils import jacobians_naive
 
 
@@ -309,7 +309,7 @@ def test_laplace_functionality(laplace, lh, model, reg_loader, class_loader):
     Js, f = jacobians_naive(feature_extractor.last_layer, phi)
     true_f_var = torch.einsum('mkp,pq,mcq->mkc', Js, Sigma, Js)
     # test last-layer Jacobians
-    comp_Js, comp_f = lap.backend.last_layer_jacobians(lap.model, X)
+    comp_Js, comp_f = lap.backend.last_layer_jacobians(X)
     assert torch.allclose(Js, comp_Js)
     assert torch.allclose(f, comp_f)
     comp_f_var = lap.functional_variance(comp_Js)
diff --git a/tests/test_matrix.py b/tests/test_matrix.py
index fb5bef1e..7c366990 100644
--- a/tests/test_matrix.py
+++ b/tests/test_matrix.py
@@ -4,10 +4,9 @@
 from torch import nn
 from torch.nn.utils import parameters_to_vector
 
-from laplace.matrix import Kron, KronDecomposed
+from laplace.utils import Kron, block_diag
 from laplace.utils import kron as kron_prod
 from laplace.curvature import BackPackGGN
-from laplace.utils import block_diag
 from tests.utils import get_psd_matrix, jacobians_naive
 
 
diff --git a/tests/test_subnetlaplace.py b/tests/test_subnetlaplace.py
new file mode 100644
index 00000000..f51f5a5e
--- /dev/null
+++ b/tests/test_subnetlaplace.py
@@ -0,0 +1,553 @@
+import pytest
+from itertools import product
+
+import torch
+from torch import nn
+from torch.nn.utils import parameters_to_vector
+from torch.utils.data import DataLoader, TensorDataset
+from torchvision.models import wide_resnet50_2
+
+from laplace import Laplace, SubnetLaplace
+from laplace.baselaplace import DiagLaplace
+from laplace.utils import (SubnetMask, RandomSubnetMask, LargestMagnitudeSubnetMask,
+                           LargestVarianceDiagLaplaceSubnetMask, LargestVarianceSWAGSubnetMask,
+                           ParamNameSubnetMask, ModuleNameSubnetMask, LastLayerSubnetMask)
+
+
+torch.manual_seed(240)
+torch.set_default_tensor_type(torch.DoubleTensor)
+score_based_subnet_masks = [RandomSubnetMask, LargestMagnitudeSubnetMask,
+                            LargestVarianceDiagLaplaceSubnetMask, LargestVarianceSWAGSubnetMask]
+layer_subnet_masks = [ParamNameSubnetMask, ModuleNameSubnetMask, LastLayerSubnetMask]
+all_subnet_masks = score_based_subnet_masks + layer_subnet_masks
+likelihoods = ['classification', 'regression']
+
+
+@pytest.fixture
+def model():
+    model = torch.nn.Sequential(nn.Linear(3, 20), nn.Linear(20, 2))
+    model_params = list(model.parameters())
+    setattr(model, 'n_params', len(parameters_to_vector(model_params)))
+    return model
+
+
+@pytest.fixture
+def large_model():
+    model = wide_resnet50_2()
+    return model
+
+
+@pytest.fixture
+def class_loader():
+    X = torch.randn(10, 3)
+    y = torch.randint(2, (10,))
+    return DataLoader(TensorDataset(X, y), batch_size=3)
+
+
+@pytest.fixture
+def reg_loader():
+    X = torch.randn(10, 3)
+    y = torch.randn(10, 2)
+    return DataLoader(TensorDataset(X, y), batch_size=3)
+
+
+@pytest.mark.parametrize('likelihood', likelihoods)
+def test_subnet_laplace_init(model, likelihood):
+    # use random subnet mask for this test
+    subnetwork_mask = RandomSubnetMask
+    subnetmask_kwargs = dict(model=model, n_params_subnet=10)
+    subnetmask = subnetwork_mask(**subnetmask_kwargs)
+    subnetmask.select()
+
+    # subnet Laplace with full Hessian should work
+    hessian_structure = 'full'
+    lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure=hessian_structure)
+    assert isinstance(lap, SubnetLaplace)
+
+    # subnet Laplace without specifying subnetwork indices should raise an error
+    with pytest.raises(TypeError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      hessian_structure=hessian_structure)
+
+    # subnet Laplace with diag, kron or lowrank Hessians should raise errors
+    hessian_structure = 'diag'
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure=hessian_structure)
+    hessian_structure = 'kron'
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure=hessian_structure)
+    hessian_structure = 'lowrank'
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure=hessian_structure)
+
+
+@pytest.mark.parametrize('likelihood', likelihoods)
+def test_subnet_laplace_large_init(large_model, likelihood):
+    # use random subnet mask for this test
+    subnetwork_mask = RandomSubnetMask
+    n_param_subnet = 10
+    subnetmask_kwargs = dict(model=large_model, n_params_subnet=n_param_subnet)
+    subnetmask = subnetwork_mask(**subnetmask_kwargs)
+    subnetmask.select()
+
+    lap = Laplace(large_model, likelihood=likelihood, subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure='full')
+    assert lap.n_params_subnet == n_param_subnet
+    assert lap.H.shape == (lap.n_params_subnet, lap.n_params_subnet)
+    H = lap.H.clone()
+    lap._init_H()
+    assert torch.allclose(H, lap.H)
+
+
+@pytest.mark.parametrize('likelihood', likelihoods)
+def test_custom_subnetwork_indices(model, likelihood, class_loader, reg_loader):
+    loader = class_loader if likelihood == 'classification' else reg_loader
+
+    # subnetwork indices that are None should raise an error
+    subnetwork_indices = None
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are not PyTorch tensors should raise an error
+    subnetwork_indices = [0, 5, 11, 42]
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are empty tensors should raise an error
+    subnetwork_indices = torch.LongTensor([])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are scalar tensors should raise an error
+    subnetwork_indices = torch.LongTensor(11)
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are not 1D PyTorch tensors should raise an error
+    subnetwork_indices = torch.LongTensor([[0, 5], [11, 42]])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are double tensors should raise an error
+    subnetwork_indices = torch.DoubleTensor([0.0, 5.0, 11.0, 42.0])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are float tensors should raise an error
+    subnetwork_indices = torch.FloatTensor([0.0, 5.0, 11.0, 42.0])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are half tensors should raise an error
+    subnetwork_indices = torch.HalfTensor([0.0, 5.0, 11.0, 42.0])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are int tensors should raise an error
+    subnetwork_indices = torch.IntTensor([0, 5, 11, 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are short tensors should raise an error
+    subnetwork_indices = torch.ShortTensor([0, 5, 11, 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are char tensors should raise an error
+    subnetwork_indices = torch.CharTensor([0, 5, 11, 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that are bool tensors should raise an error
+    subnetwork_indices = torch.BoolTensor([0, 5, 11, 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that contain elements smaller than zero should raise an error
+    subnetwork_indices = torch.LongTensor([0, -1, -11])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that contain elements larger than n_params should raise an error
+    subnetwork_indices = torch.LongTensor([model.n_params + 1, model.n_params + 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # subnetwork indices that contain duplicate entries should raise an error
+    subnetwork_indices = torch.LongTensor([0, 0, 5, 11, 11, 42])
+    with pytest.raises(ValueError):
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetwork_indices, hessian_structure='full')
+
+    # Non-empty, 1-dimensional torch.LongTensor with valid entries should work
+    subnetwork_indices = torch.LongTensor([0, 5, 11, 42])
+    lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetwork_indices, hessian_structure='full')
+    lap.fit(loader)
+    assert isinstance(lap, SubnetLaplace)
+    assert lap.n_params_subnet == 4
+    assert lap.H.shape == (4, 4)
+    assert lap.backend.subnetwork_indices.equal(subnetwork_indices)
+
+
+@pytest.mark.parametrize('subnetwork_mask,likelihood', product(score_based_subnet_masks, likelihoods))
+def test_score_based_subnet_masks(model, likelihood, subnetwork_mask, class_loader, reg_loader):
+    loader = class_loader if likelihood == 'classification' else reg_loader
+    model_params = parameters_to_vector(model.parameters())
+
+    # set subnetwork mask arguments
+    if subnetwork_mask == LargestVarianceDiagLaplaceSubnetMask: 
+        diag_laplace_model = DiagLaplace(model, likelihood)
+        subnetmask_kwargs = dict(model=model, diag_laplace_model=diag_laplace_model)
+    elif subnetwork_mask == LargestVarianceSWAGSubnetMask:
+        subnetmask_kwargs = dict(model=model, likelihood=likelihood)
+    else:
+        subnetmask_kwargs = dict(model=model)
+
+    # should raise error if we don't pass number of subnet parameters within the subnetmask_kwargs
+    with pytest.raises(TypeError):
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+
+    # should raise error if we set number of subnet parameters to None
+    subnetmask_kwargs.update(n_params_subnet=None)
+    with pytest.raises(ValueError):
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+
+    # should raise error if number of subnet parameters is larger than number of model parameters
+    subnetmask_kwargs.update(n_params_subnet=99999)
+    with pytest.raises(ValueError):
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+
+    # define subnetwork mask
+    n_params_subnet = 32
+    subnetmask_kwargs.update(n_params_subnet=n_params_subnet)
+    subnetmask = subnetwork_mask(**subnetmask_kwargs)
+
+    # should raise error if we try to access the subnet indices before the subnet has been selected
+    with pytest.raises(AttributeError):
+        subnetmask.indices
+
+    # select subnet mask
+    subnetmask.select(loader)
+
+    # should raise error if we try to select the subnet again
+    with pytest.raises(ValueError):
+        subnetmask.select(loader)
+
+    # define valid subnet Laplace model
+    lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure='full')
+    assert isinstance(lap, SubnetLaplace)
+
+    # fit Laplace model
+    lap.fit(loader)
+
+    # check some parameters
+    assert subnetmask.indices.equal(lap.backend.subnetwork_indices)
+    assert subnetmask.n_params_subnet == n_params_subnet
+    assert lap.n_params_subnet == n_params_subnet
+    assert parameters_to_vector(model.parameters()).equal(model_params)
+
+    # check that Hessian and prior precision is of correct shape
+    assert lap.H.shape == (n_params_subnet, n_params_subnet)
+    assert lap.prior_precision_diag.shape == (n_params_subnet,)
+
+
+@pytest.mark.parametrize('subnetwork_mask,likelihood', product(layer_subnet_masks, likelihoods))
+def test_layer_subnet_masks(model, likelihood, subnetwork_mask, class_loader, reg_loader):
+    loader = class_loader if likelihood == 'classification' else reg_loader
+    subnetmask_kwargs = dict(model=model)
+
+    # fit last-layer Laplace model
+    lllap = Laplace(model, likelihood=likelihood, subset_of_weights='last_layer',
+                    hessian_structure='full')
+    lllap.fit(loader)
+
+    # should raise error if we pass number of subnet parameters
+    subnetmask_kwargs.update(n_params_subnet=32)
+    with pytest.raises(TypeError):
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+
+    subnetmask_kwargs = dict(model=model)
+    if subnetwork_mask == ParamNameSubnetMask:
+        # should raise error if we pass no parameter name list
+        subnetmask_kwargs.update()
+        with pytest.raises(TypeError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # should raise error if we pass an empty parameter name list
+        subnetmask_kwargs.update(parameter_names=[])
+        with pytest.raises(ValueError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # should raise error if we pass a parameter name list with invalid parameter names
+        subnetmask_kwargs.update(parameter_names=['123'])
+        with pytest.raises(ValueError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # define last-layer Laplace model by parameter names and check that 
+        # Hessian is identical to that of a full LLLaplace model
+        subnetmask_kwargs.update(parameter_names=['1.weight', '1.bias'])
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert lllap.H.equal(lap.H)
+
+        # define valid parameter name subnet mask
+        subnetmask_kwargs.update(parameter_names=['0.weight', '1.bias'])
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+
+        # should raise error if we access number of subnet parameters before selecting the subnet
+        n_params_subnet = 62
+        with pytest.raises(AttributeError):
+            n_params_subnet = subnetmask.n_params_subnet
+
+        # select subnet mask and fit Laplace model
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert isinstance(lap, SubnetLaplace)
+
+    elif subnetwork_mask == ModuleNameSubnetMask:
+        # should raise error if we pass no module name list
+        subnetmask_kwargs.update()
+        with pytest.raises(TypeError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # should raise error if we pass an empty module name list
+        subnetmask_kwargs.update(module_names=[])
+        with pytest.raises(ValueError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # should raise error if we pass a module name list with invalid module names
+        subnetmask_kwargs.update(module_names=['123'])
+        with pytest.raises(ValueError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # define last-layer Laplace model by module name and check that
+        # Hessian is identical to that of a full LLLaplace model
+        subnetmask_kwargs.update(module_names=['1'])
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert lllap.H.equal(lap.H)
+
+        # define valid parameter name subnet mask
+        subnetmask_kwargs.update(module_names=['0'])
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+
+        # should raise error if we access number of subnet parameters before selecting the subnet
+        n_params_subnet = 80
+        with pytest.raises(AttributeError):
+            n_params_subnet = subnetmask.n_params_subnet
+
+        # select subnet mask and fit Laplace model
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert isinstance(lap, SubnetLaplace)
+
+    elif subnetwork_mask == LastLayerSubnetMask:
+        # should raise error if we pass invalid last-layer name 
+        subnetmask_kwargs.update(last_layer_name='123')
+        with pytest.raises(KeyError):
+            subnetmask = subnetwork_mask(**subnetmask_kwargs)
+            subnetmask.select(loader)
+
+        # define valid last-layer subnet mask (without passing the last-layer name)
+        subnetmask_kwargs = dict(model=model)
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+
+        # should raise error if we access number of subnet parameters before selecting the subnet
+        with pytest.raises(AttributeError):
+            n_params_subnet = subnetmask.n_params_subnet
+
+        # select subnet mask and fit Laplace model
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert isinstance(lap, SubnetLaplace)
+
+        # check that Hessian is identical to that of a full LLLaplace model
+        assert lllap.H.equal(lap.H)
+
+        # define valid last-layer subnet mask (with passing the last-layer name)
+        subnetmask_kwargs.update(last_layer_name='1')
+        subnetmask = subnetwork_mask(**subnetmask_kwargs)
+
+        # should raise error if we access number of subnet parameters before selecting the subnet
+        n_params_subnet = 42
+        with pytest.raises(AttributeError):
+            n_params_subnet = subnetmask.n_params_subnet
+
+        # select subnet mask and fit Laplace model
+        subnetmask.select(loader)
+        lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                      subnetwork_indices=subnetmask.indices, hessian_structure='full')
+        lap.fit(loader)
+        assert isinstance(lap, SubnetLaplace)
+
+        # check that Hessian is identical to that of a full LLLaplace model
+        assert lllap.H.equal(lap.H)
+
+    # check some parameters
+    assert subnetmask.indices.equal(lap.backend.subnetwork_indices)
+    assert subnetmask.n_params_subnet == n_params_subnet
+    assert lap.n_params_subnet == n_params_subnet
+
+    # check that Hessian and prior precision is of correct shape
+    assert lap.H.shape == (n_params_subnet, n_params_subnet)
+    assert lap.prior_precision_diag.shape == (n_params_subnet,)
+
+
+@pytest.mark.parametrize('likelihood', likelihoods)
+def test_full_subnet_mask(model, likelihood, class_loader, reg_loader):
+    loader = class_loader if likelihood == 'classification' else reg_loader
+
+    # define full model 'subnet' mask class (i.e. where all parameters are part of the subnet)
+    class FullSubnetMask(SubnetMask):
+        def get_subnet_mask(self, train_loader):
+            return torch.ones(model.n_params).byte()
+
+    # define and fit valid full subnet Laplace model
+    subnetwork_mask = FullSubnetMask
+    subnetmask = subnetwork_mask(model=model)
+    subnetmask.select(loader)
+    lap = Laplace(model, likelihood=likelihood, subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure='full')
+    lap.fit(loader)
+    assert isinstance(lap, SubnetLaplace)
+
+    # check some parameters
+    assert subnetmask.indices.equal(torch.tensor(list(range(model.n_params))))
+    assert subnetmask.n_params_subnet == model.n_params
+    assert lap.n_params_subnet == model.n_params
+
+    # check that the Hessian is identical to that of a all-weights FullLaplace model
+    full_lap = Laplace(model, likelihood=likelihood, subset_of_weights='all',
+                       hessian_structure='full')
+    full_lap.fit(loader)
+    assert full_lap.H.equal(lap.H)
+
+
+@pytest.mark.parametrize('subnetwork_mask', all_subnet_masks)
+def test_regression_predictive(model, reg_loader, subnetwork_mask):
+    subnetmask_kwargs = dict(model=model)
+    if subnetwork_mask in score_based_subnet_masks:
+        subnetmask_kwargs.update(n_params_subnet=32)
+        if subnetwork_mask == LargestVarianceSWAGSubnetMask:
+            subnetmask_kwargs.update(likelihood='regression')
+        elif subnetwork_mask == LargestVarianceDiagLaplaceSubnetMask:
+            diag_laplace_model = DiagLaplace(model, 'regression')
+            subnetmask_kwargs.update(diag_laplace_model=diag_laplace_model)
+    elif subnetwork_mask == ParamNameSubnetMask:
+        subnetmask_kwargs.update(parameter_names=['0.weight', '1.bias'])
+    elif subnetwork_mask == ModuleNameSubnetMask:
+        subnetmask_kwargs.update(module_names=['0'])
+
+    subnetmask = subnetwork_mask(**subnetmask_kwargs)
+    subnetmask.select(reg_loader)
+    lap = Laplace(model, likelihood='regression', subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure='full')
+    assert isinstance(lap, SubnetLaplace)
+
+    lap.fit(reg_loader)
+    X, _ = reg_loader.dataset.tensors
+    f = model(X)
+
+    # error
+    with pytest.raises(ValueError):
+        lap(X, pred_type='linear')
+
+    # GLM predictive
+    f_mu, f_var = lap(X, pred_type='glm')
+    assert torch.allclose(f_mu, f)
+    assert f_var.shape == torch.Size([f_mu.shape[0], f_mu.shape[1], f_mu.shape[1]])
+    assert len(f_mu) == len(X)
+
+    # NN predictive (only diagonal variance estimation)
+    f_mu, f_var = lap(X, pred_type='nn')
+    assert f_mu.shape == f_var.shape
+    assert f_var.shape == torch.Size([f_mu.shape[0], f_mu.shape[1]])
+    assert len(f_mu) == len(X)
+
+
+@pytest.mark.parametrize('subnetwork_mask', all_subnet_masks)
+def test_classification_predictive(model, class_loader, subnetwork_mask):
+    subnetmask_kwargs = dict(model=model)
+    if subnetwork_mask in score_based_subnet_masks:
+        subnetmask_kwargs.update(n_params_subnet=32)
+        if subnetwork_mask == LargestVarianceSWAGSubnetMask:
+            subnetmask_kwargs.update(likelihood='classification')
+        elif subnetwork_mask == LargestVarianceDiagLaplaceSubnetMask:
+            diag_laplace_model = DiagLaplace(model, 'classification')
+            subnetmask_kwargs.update(diag_laplace_model=diag_laplace_model)
+    elif subnetwork_mask == ParamNameSubnetMask:
+        subnetmask_kwargs.update(parameter_names=['0.weight', '1.bias'])
+    elif subnetwork_mask == ModuleNameSubnetMask:
+        subnetmask_kwargs.update(module_names=['0'])
+
+    subnetmask = subnetwork_mask(**subnetmask_kwargs)
+    subnetmask.select(class_loader)
+    lap = Laplace(model, likelihood='classification', subset_of_weights='subnetwork',
+                  subnetwork_indices=subnetmask.indices, hessian_structure='full')
+    assert isinstance(lap, SubnetLaplace)
+
+    lap.fit(class_loader)
+    X, _ = class_loader.dataset.tensors
+    f = torch.softmax(model(X), dim=-1)
+
+    # error
+    with pytest.raises(ValueError):
+        lap(X, pred_type='linear')
+
+    # GLM predictive
+    f_pred = lap(X, pred_type='glm', link_approx='mc', n_samples=100)
+    assert f_pred.shape == f.shape
+    assert torch.allclose(f_pred.sum(), torch.tensor(len(f_pred), dtype=torch.double))  # sum up to 1
+    f_pred = lap(X, pred_type='glm', link_approx='probit')
+    assert f_pred.shape == f.shape
+    assert torch.allclose(f_pred.sum(), torch.tensor(len(f_pred), dtype=torch.double))  # sum up to 1
+    f_pred = lap(X, pred_type='glm', link_approx='bridge')
+    assert f_pred.shape == f.shape
+    assert torch.allclose(f_pred.sum(), torch.tensor(len(f_pred), dtype=torch.double))  # sum up to 1
+
+    # NN predictive
+    f_pred = lap(X, pred_type='nn', n_samples=100)
+    assert f_pred.shape == f.shape
+    assert torch.allclose(f_pred.sum(), torch.tensor(len(f_pred), dtype=torch.double))  # sum up to 1