Skip to content

Commit

Permalink
final
Browse files Browse the repository at this point in the history
  • Loading branch information
juanitorduz committed Oct 6, 2024
1 parent aee428d commit e97d0bc
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 46 deletions.
20 changes: 10 additions & 10 deletions Python/electricity_forecast.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -951,9 +951,18 @@
"The ELBO loss is decreasing as expected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Posterior Predictive Checks\n",
"\n",
"We now generate samples for the training and test data. We are interested in both the likelihood (demand) and the posterior distribution of the temperature effect."
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -978,15 +987,6 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Posterior Predictive Checks\n",
"\n",
"We now generate samples for the training and test data. We are interested in both the likelihood (demand) and the posterior distribution of the temperature effect."
]
},
{
"cell_type": "code",
"execution_count": 18,
Expand Down
102 changes: 66 additions & 36 deletions Python/electricity_forecast_with_priors.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Electricity Demand Forecast: Dynamic Time-Series Model\n",
"# Electricity Demand Forecast: Dynamic Time-Series Model with Prior Calibration\n",
"\n",
"We work out a classical electricity demand forecasting model form the case study [Structural Time Series Modeling Case Studies: Atmospheric CO2 and Electricity Demand](https://www.tensorflow.org/probability/examples/Structural_Time_Series_Modeling_Case_Studies_Atmospheric_CO2_and_Electricity_Demand) from the TensorFlow Probability documentation. The idea of this example is to use temperature as a linear covariate to model the electricity demand. In this example, we show how to use a (Hilbert Space Approximation) Gaussian process to model the non-linear relationship between temperature and electricity demand (for an introduction to the topic see [A Conceptual and Practical Introduction to Hilbert Space GPs Approximation Methods](https://juanitorduz.github.io/hsgp_intro/)). This technique improves the simple linear model in and out of sample predictions as we aro not using the Gaussian process to extrapolate over time, but rather to model the non-linear relationship between temperature and electricity demand, similarly as how it has done in the example [Time-Varying Regression Coefficients via Hilbert Space Gaussian Process Approximation](https://juanitorduz.github.io/bikes_gp/). "
"The model we present here is a dynamic forecasting time-series model that incorporates a prior calibration process to estimate the temperature effect on electricity demand. The model is based on the previous example [Electricity Demand Forecast: Dynamic Time-Series Model](https://juanitorduz.github.io/electricity_forecast/). In this second iteration, we borrow the ideas from the Pyro great example [Forecasting with Dynamic Linear Model (DLM)](https://pyro.ai/examples/forecasting_dlm.html) where they use a prior calibration process on a local level forecasting model. In our case, we use the same technique with a Hilbert Space Gaussian Process latent component model.\n",
"\n",
"We encourage the reader to check the previous example to have a better understanding of the model. Here we focus on the calibration procedure.\n"
]
},
{
Expand Down Expand Up @@ -58,8 +60,7 @@
"source": [
"## Load Data\n",
"\n",
"We load the data explicitly as in the Tensorflow Probability example. We reference the original comment:\n",
"> *\"Victoria electricity demand dataset, as presented at https://otexts.com/fpp2/scatterplots.html and downloaded from https://github.com/robjhyndman/fpp2-package/blob/master/data/elecdaily.rda . This series contains the first eight weeks (starting Jan 1). The original dataset was half-hourly data; here we've downsampled to hourly data by taking every other timestep.\"*"
"We load the data as in the previous example. "
]
},
{
Expand Down Expand Up @@ -136,14 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We clearly see an overall positive correlation between temperature and electricity demand. This can be particularly seen when we plot both demand and temperature in the same (twin) axis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course there are strong seasonal effects hidden in these plots. Therefore we want to use a model that can capture thee temperature effect while controlling for other factors. "
"As a reminder, we are interested in a forecasting model where use temperature as a regressor. In addition, we want to understand the non-linear effect of temperature on demand."
]
},
{
Expand All @@ -152,7 +146,7 @@
"source": [
"## Training and Test Data\n",
"\n",
"We split the data as in the original example. In addition, we create a `day_of_week` feature to include in the model later.\n"
"To prepare the data for the model, we do a simple train-test split and create some seasonal features."
]
},
{
Expand Down Expand Up @@ -247,14 +241,21 @@
"source": [
"## Model Specification\n",
"\n",
"Here is a description of the modeling strategy:\n",
"Here is a reminder of the base model components:\n",
"\n",
"- We use a linear model to predict demand as a function of temperature and two seasonal effects: hour of day and day of week. We use Zero-Sum Normal distributions to model these seasonal effects.\n",
"- We use a Matérn 5/2 kernel to model the temperature effect on demand using the Hilbert Space Gaussian Process (HSGP) approximation from NumPyro (see [here](https://num.pyro.ai/en/stable/contrib.html#hilbert-space-gaussian-processes-approximation)).\n",
"- The noise scale will vary with the temperature.\n",
"- We use a Student-t distribution to model the residual error."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For one of the seasonal components we need the next handy utility function:"
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand Down Expand Up @@ -303,22 +304,37 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are ready to specify the NumPyro model.\n"
"### Domain Knowledge Priors\n",
"\n",
"Let us assume that we know from domain knowledge (science or a natural experiment, for example when the electricity system broke in a similar area and we could estimate lifts) that the effect of temperature on demand over $32$°C is somehow stable at around a value of $0.13$. We believe this is useful information and in our previous baseline model the effect at this regime was increasing linearly from $0.11$ to $0.15$. The baseline still oscillates around the expected value of $0.13$, but we want to make sure our effect is very conservative at this regime.\n",
"\n",
"In order to do so, the idea presented in the Pyro example [Forecasting with Dynamic Linear Model (DLM)](https://pyro.ai/examples/forecasting_dlm.html) is to add an additional an likelihood for the temperature range of interest where we set the latent effect variable as `observed`.\n",
"\n",
"Let's look into the details of this approach. The baseline model is exactly the same as in our example before [Electricity Demand Forecast: Dynamic Time-Series Model](https://juanitorduz.github.io/electricity_forecast/).\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"# Set the temperature threshold for the prior\n",
"temperature_threshold_prior = 32.0\n",
"\n",
"# Get the indices of the temperature data that is over the threshold\n",
"temperature_prior_idx = jnp.where(\n",
" temperature_training_data > temperature_threshold_prior\n",
")[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are ready to specify the NumPyro model with the additional likelihood for the temperature prior.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
Expand Down Expand Up @@ -383,7 +399,7 @@
" # effect on demand for very hot days\n",
" if temperature_prior_idx is not None:\n",
" temperature_prior_effect_mean = 0.13\n",
" temperature_prior_effect_scale = 0.01\n",
" temperature_prior_effect_scale = 0.01 # <- Uncertainty of the prior\n",
" numpyro.sample(\n",
" \"temperature_prior\",\n",
" dist.Normal(\n",
Expand All @@ -406,6 +422,13 @@
" numpyro.sample(\"obs\", dist.StudentT(df=nu, loc=mu, scale=scale), obs=demand)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the model is exactly the same as in the previous example, we go directly to inference (even though we should *always* run the prior predictive checks)."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -473,9 +496,18 @@
"The ELBO loss is decreasing as expected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Posterior Predictive Checks\n",
"\n",
"We now collect the posterior samples for the in and out of sample data."
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -501,15 +533,6 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Posterior Predictive Checks\n",
"\n",
"We now generate samples for the training and test data. We are interested in both the likelihood (demand) and the posterior distribution of the temperature effect."
]
},
{
"cell_type": "code",
"execution_count": 11,
Expand Down Expand Up @@ -713,7 +736,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The predictions look very good! They are actually look much better that the basic linear model used in the TensorFlow Probability tutorial (which is fine as they are focusing on the core API description)."
"The forecasting metrics are pretty much the same as in the previous uncalibrated model (a really minor difference)."
]
},
{
Expand All @@ -722,14 +745,7 @@
"source": [
"### Temperature Effect on Demand\n",
"\n",
"Being happy about the forecast performance, we can dig deeper into the temperature effect. First we simply plot the predictions and the raw values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The non-linearity is clearly visible! Nevertheless, we are interested in the laten unseen variable between temperature and demand. In therms of the model, we are interested in the posterior distribution of the Gaussian Process component."
"finally, let's plot the temperature effect on demand."
]
},
{
Expand Down Expand Up @@ -808,6 +824,20 @@
" ylabel=\"Effect on Demand\",\n",
");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The overall shape of the curve resembles the one in the previous baseline uncalibrated model. Still, we see that the effect of the priors is making the estimate for temperatures over $32$°C to be more conservative around the expected value of $0.13$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Observe that this opens great opportunities to calibrate forecasting models with domain knowledge, possible extracted from experimental or observational data."
]
}
],
"metadata": {
Expand Down

0 comments on commit e97d0bc

Please sign in to comment.