openkim · mjwen · Apr 4, 2023 · Jan 9, 2023 · Jan 9, 2023 · Jan 9, 2023
diff --git a/.gitignore b/.gitignore
@@ -27,6 +27,9 @@ tests/echo*
 tests/fingerprints/
 tmp_*
 *_kliff_trained/
+tests/uq/*.pkl
+tests/uq/*.json
+tests/uq/kliff_saved_model
 
 # dataset
 Si_training_set_4_configs

diff --git a/docs/source/auto_examples/auto_examples_jupyter.zip b/docs/source/auto_examples/auto_examples_jupyter.zip
diff --git a/docs/source/auto_examples/auto_examples_python.zip b/docs/source/auto_examples/auto_examples_python.zip
diff --git a/docs/source/auto_examples/example_uq_bootstrap.ipynb b/docs/source/auto_examples/example_uq_bootstrap.ipynb
@@ -0,0 +1,158 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n\n# Bootstrapping\n\nIn this example, we demonstrate how to perform uncertainty quantification (UQ) using\nbootstrap method. We use a Stillinger-Weber (SW) potential for silicon that is archived\nin OpenKIM_.\n\nFor simplicity, we only set the energy-scaling parameters, i.e., ``A`` and ``lambda`` as\nthe tunable parameters. These parameters will be calibrated to energies and forces of a\nsmall dataset, consisting of 4 compressed and stretched configurations of diamond silicon\nstructure.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "To start, let's first install the SW model::\n\n   $ kim-api-collections-management install user SW_StillingerWeber_1985_Si__MO_405512056662_006\n\n.. seealso::\n   This installs the model and its driver into the ``User Collection``. See\n   `install_model` for more information about installing KIM models.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import matplotlib.pyplot as plt\nimport numpy as np\n\nfrom kliff.calculators import Calculator\nfrom kliff.dataset import Dataset\nfrom kliff.loss import Loss\nfrom kliff.models import KIMModel\nfrom kliff.uq.bootstrap import BootstrapEmpiricalModel\nfrom kliff.utils import download_dataset"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Before running bootstrap, we need to define a loss function and train the model. More\ndetail information about this step can be found in `tut_kim_sw` and\n`tut_params_transform`.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Create the model\nmodel = KIMModel(model_name=\"SW_StillingerWeber_1985_Si__MO_405512056662_006\")\n\n# Set the tunable parameters and the initial guess\nopt_params = {\"A\": [[\"default\"]], \"lambda\": [[\"default\"]]}\n\nmodel.set_opt_params(**opt_params)\nmodel.echo_opt_params()\n\n# Get the dataset\ndataset_path = download_dataset(dataset_name=\"Si_training_set_4_configs\")\n# Read the dataset\ntset = Dataset(dataset_path)\nconfigs = tset.get_configs()\n\n# Create calculator\ncalc = Calculator(model)\n# Only use the forces data\nca = calc.create(configs, use_energy=False, use_forces=True)\n\n# Instantiate the loss function\nresidual_data = {\"normalize_by_natoms\": False}\nloss = Loss(calc, residual_data=residual_data)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "To perform UQ by bootstrapping, the general workflow starts by instantiating\n:class:`~kliff.uq.bootstrap.BootstrapEmpiricalModel`, or\n:class:`~kliff.uq.bootstrap.BootstrapNeuralNetworkModel` if using a neural network\npotential.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Instantiate bootstrap class object\nBS = BootstrapEmpiricalModel(loss, seed=1717)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Then, we generate some bootstrap compute arguments. This is equivalent to generating\nbootstrap data. Typically, we just need to specify how many bootstrap data samples to\ngenerate. Additionally, if we call ``generate_bootstrap_compute_arguments`` multiple\ntimes, the new generated data samples will be appended to the previously generated data\nsamples. This is also the behavior if we read the data samples from the previously\nexported file.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Generate bootstrap compute arguments\nBS.generate_bootstrap_compute_arguments(100)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Finally, we will iterate over these bootstrap data samples and train the potential\nusing each data sample. The resulting optimal parameters from each data sample give a\nsingle sample of parameters. By iterating over all data samples, then we will get an\nensemble of parameters.\n\nNote that the mapping from the bootstrap dataset to the parameters involve optimization.\nWe suggest to use the same mapping, i.e., the same optimizer setting, in each iteration.\nThis includes using the same set of initial parameter guess. In the case when the loss\nfunction has multiple local minima, we don't want the parameter ensemble to be biased\non the results of the other optimizations. For neural network model, we need to reset\nthe initial parameter value, which is done internally.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Run bootstrap\nmin_kwargs = dict(method=\"lm\")  # Optimizer setting\ninitial_guess = calc.get_opt_params()  # Initial guess in the optimization\nBS.run(min_kwargs=min_kwargs, initial_guess=initial_guess)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The resulting parameter ensemble can be accessed in `BS.samples` as a `np.ndarray`.\nThen, we can plot the distribution of the parameters, as an example, or propagate the\nerror to the target quantities we want to study.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Plot the distribution of the parameters\nplt.figure()\nplt.plot(*(BS.samples.T), \".\", alpha=0.5)\nparam_names = list(opt_params.keys())\nplt.xlabel(param_names[0])\nplt.ylabel(param_names[1])\nplt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.10"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
diff --git a/docs/source/auto_examples/example_uq_bootstrap.py b/docs/source/auto_examples/example_uq_bootstrap.py
@@ -0,0 +1,123 @@
+"""
+.. _tut_bootstrap:
+
+Bootstrapping
+=============
+
+In this example, we demonstrate how to perform uncertainty quantification (UQ) using
+bootstrap method. We use a Stillinger-Weber (SW) potential for silicon that is archived
+in OpenKIM_.
+
+For simplicity, we only set the energy-scaling parameters, i.e., ``A`` and ``lambda`` as
+the tunable parameters. These parameters will be calibrated to energies and forces of a
+small dataset, consisting of 4 compressed and stretched configurations of diamond silicon
+structure.
+"""
+
+
+##########################################################################################
+# To start, let's first install the SW model::
+#
+#    $ kim-api-collections-management install user SW_StillingerWeber_1985_Si__MO_405512056662_006
+#
+# .. seealso::
+#    This installs the model and its driver into the ``User Collection``. See
+#    :ref:`install_model` for more information about installing KIM models.
+
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+from kliff.calculators import Calculator
+from kliff.dataset import Dataset
+from kliff.loss import Loss
+from kliff.models import KIMModel
+from kliff.uq.bootstrap import BootstrapEmpiricalModel
+from kliff.utils import download_dataset
+
+##########################################################################################
+# Before running bootstrap, we need to define a loss function and train the model. More
+# detail information about this step can be found in :ref:`tut_kim_sw` and
+# :ref:`tut_params_transform`.
+
+# Create the model
+model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")
+
+# Set the tunable parameters and the initial guess
+opt_params = {"A": [["default"]], "lambda": [["default"]]}
+
+model.set_opt_params(**opt_params)
+model.echo_opt_params()
+
+# Get the dataset
+dataset_path = download_dataset(dataset_name="Si_training_set_4_configs")
+# Read the dataset
+tset = Dataset(dataset_path)
+configs = tset.get_configs()
+
+# Create calculator
+calc = Calculator(model)
+# Only use the forces data
+ca = calc.create(configs, use_energy=False, use_forces=True)
+
+# Instantiate the loss function
+residual_data = {"normalize_by_natoms": False}
+loss = Loss(calc, residual_data=residual_data)
+
+##########################################################################################
+# To perform UQ by bootstrapping, the general workflow starts by instantiating
+# :class:`~kliff.uq.bootstrap.BootstrapEmpiricalModel`, or
+# :class:`~kliff.uq.bootstrap.BootstrapNeuralNetworkModel` if using a neural network
+# potential.
+
+
+# Instantiate bootstrap class object
+BS = BootstrapEmpiricalModel(loss, seed=1717)
+
+##########################################################################################
+# Then, we generate some bootstrap compute arguments. This is equivalent to generating
+# bootstrap data. Typically, we just need to specify how many bootstrap data samples to
+# generate. Additionally, if we call ``generate_bootstrap_compute_arguments`` multiple
+# times, the new generated data samples will be appended to the previously generated data
+# samples. This is also the behavior if we read the data samples from the previously
+# exported file.
+
+
+# Generate bootstrap compute arguments
+BS.generate_bootstrap_compute_arguments(100)
+
+##########################################################################################
+# Finally, we will iterate over these bootstrap data samples and train the potential
+# using each data sample. The resulting optimal parameters from each data sample give a
+# single sample of parameters. By iterating over all data samples, then we will get an
+# ensemble of parameters.
+#
+# Note that the mapping from the bootstrap dataset to the parameters involve optimization.
+# We suggest to use the same mapping, i.e., the same optimizer setting, in each iteration.
+# This includes using the same set of initial parameter guess. In the case when the loss
+# function has multiple local minima, we don't want the parameter ensemble to be biased
+# on the results of the other optimizations. For neural network model, we need to reset
+# the initial parameter value, which is done internally.
+
+
+# Run bootstrap
+min_kwargs = dict(method="lm")  # Optimizer setting
+initial_guess = calc.get_opt_params()  # Initial guess in the optimization
+BS.run(min_kwargs=min_kwargs, initial_guess=initial_guess)
+
+##########################################################################################
+# The resulting parameter ensemble can be accessed in `BS.samples` as a `np.ndarray`.
+# Then, we can plot the distribution of the parameters, as an example, or propagate the
+# error to the target quantities we want to study.
+
+
+# Plot the distribution of the parameters
+plt.figure()
+plt.plot(*(BS.samples.T), ".", alpha=0.5)
+param_names = list(opt_params.keys())
+plt.xlabel(param_names[0])
+plt.ylabel(param_names[1])
+plt.show()
+
+##########################################################################################
+# .. _OpenKIM: https://openkim.org
diff --git a/docs/source/auto_examples/example_uq_bootstrap.py.md5 b/docs/source/auto_examples/example_uq_bootstrap.py.md5
@@ -0,0 +1 @@
+d16579e397f3c9e5d2537a623bd65313