Model data as beliefs (at a certain time) about events (at a certain time).
The timely-beliefs
package provides a convenient data model for numerical time series,
that is both simple enough for humans to understand and sufficiently rich for forecasting and machine learning.
The data model is an extended pandas DataFrame that assigns properties and index levels to describe:
- What the data is about
- Who (or what) created the data
- When the data was created
- How certain they were
Getting started (or try one of the other ways to create a BeliefsDataFrame):
>>> import timely_beliefs as tb
>>> df = tb.BeliefsDataFrame([tb.TimedBelief(tb.Sensor("Indoor temperature", "°C"), tb.BeliefSource("Thermometer"), 21, event_time="2000-03-05 11:00Z", belief_horizon="0H")])
>>> print(df)
event_value
event_start belief_time source cumulative_probability
2000-03-05 11:00:00+00:00 2000-03-05 11:00:00+00:00 Thermometer 0.5 21
sensor: <Sensor: Indoor temperature>, event_resolution: 0:00:00
The package contains the following functionality:
- A model for time series data, suitable for a notebook or a database-backed program (using sqlalchemy)
- Selecting/querying beliefs, e.g. those held at a certain moment in time
- Computing accuracy, e.g. against after-the-fact knowledge, also works with probabilistic forecasts
- Resampling time series with uncertainty (experimental)
- Visualising time series and accuracy metrics (experimental)
Some use cases of the package:
- Clearly distinguish forecasts from rolling forecasts.
- Analyse your predictive power by showing forecast accuracy as you approach an event.
- Learn when someone is a bad predictor.
- Evaluate the risk of being wrong about an event.
Check out our interactive demonstration comparing forecasting models for renewable energy production. These visuals are created simply by calling the plot method on our BeliefsDataFrame, using the visualisation library Altair.
The BeliefsDataFrame is the basic data model that represents data as probabilistic beliefs about events. It is an extended pandas DataFrame with the following index levels:
event_start
; keeping track of the time of whatever it is that the data point describes (an event)belief_time
; keeping track of the time at which the data point was created (a belief)source
; keeping track of who or what created the data point (a source)cumulative_probability
; keeping track of the confidence in the data point (a probability)
Together these index levels describe data points as probabilistic beliefs. Because of the sparse representation of index levels (a clever default setting in pandas) we get clean-looking data, as we show here in a printout of the example BeliefsDataFrame in our examples module:
>>> import timely_beliefs as tb
>>> df = tb.examples.get_example_df()
>>> df.head(8)
event_value
event_start belief_time source cumulative_probability
2000-01-03 09:00:00+00:00 2000-01-01 00:00:00+00:00 Source A 0.1587 90
0.5000 100
0.8413 110
Source B 0.5000 0
1.0000 100
2000-01-01 01:00:00+00:00 Source A 0.1587 99
0.5000 100
0.8413 101
sensor: <Sensor: weight>, event_resolution: 0:15:00
The first 8 entries of this BeliefsDataFrame show beliefs about a single event. Beliefs were formed by two distinct sources (A and B), with the first updating its beliefs at a later time. Source A first thought the value of this event would be 100 ± 10 (the probabilities suggest a normal distribution), and then increased its accuracy by lowering the standard deviation to 1. Source B thought the value would be equally likely to be 0 or 100.
More information about what actually constitutes an event is stored as metadata in the BeliefsDataFrame, which is printed out just below the frame. The sensor property keeps track of invariable information such as the unit of the data and the resolution of events.
>>> df.sensor.unit
'kg'
Currently, a BeliefsDataFrame contains data about a single sensor only. For a future release we are considering adding the sensor as another index level, to offer out-of-the-box support for aggregating over multiple sensors.
- Read more about how to create a BeliefsDataFrame.
- Read more about how the DataFrame is keeping track of time.
- Read more about how the DataFrame is keeping track of confidence.
- Discover convenient slicing methods (e.g. to show a rolling horizon forecast).
- Serve your data fast by resampling (while taking into account auto-correlation).
- Track where your data comes from, by following its lineage.
All of the above can be done with TimedBelief
objects in a BeliefsDataFrame
.
However, if you are dealing with a lot of data and need performance, you'll want to persist your belief data in a database.
Read more about how timely-beliefs supports this.
The accuracy of a belief is defined with respect to some reference. The default reference is the most recent belief held by the same source, but it is possible to set beliefs held by a specific source at a specific time to serve as the reference instead.
There are two common use cases for wanting to know the accuracy of beliefs,
each with a different viewpoint.
With a rolling viewpoint, you get the accuracy of beliefs at a certain belief_horizon
before (or after) knowledge_time
,
for example, some days before each event ends.
>>> from datetime import timedelta
>>> df.rolling_viewpoint_accuracy(timedelta(days=2, hours=9), reference_source=df.lineage.sources[0])
mae mape wape
source
Source A 1.482075 0.014821 0.005928
Source B 125.853250 0.503413 0.503413
sensor: <Sensor: weight>, event_resolution: 0:15:00
With a fixed viewpoint, you get the accuracy of beliefs held at a certain belief_time
.
>>> from datetime import datetime
>>> import pytz
>>> df = df.fixed_viewpoint_accuracy(datetime(2000, 1, 2, tzinfo=pytz.utc), reference_source=df.lineage.sources[0])
mae mape wape
source
Source A 0.00000 0.000000 0.000000
Source B 125.85325 0.503413 0.503413
sensor: <Sensor: weight>, event_resolution: 0:15:00
For an intuitive representation of accuracy that works in many cases, we suggest to use:
>>> df["accuracy"] = 1 - df["wape"]
A more detailed discussion of accuracy and error metrics can be found here.
To enable forecast support, use pip install timely-beliefs[forecast]
to install the required dependencies.
New forecasts can be generated from a given BeliefsDataFrame by passing an sktime
forecaster to the form_beliefs
method.
This method takes a belief_time
and an event_start
(for a single forecast) or event_time_window
(for a number of forecasts from a fixed viewpoint).
The source
defines how the forecast should be attributed in the resulting BeliefsDataFrame.
This feature currently only supports BeliefsDataFrames containing a single deterministic belief per event, by a single source.
Create interactive charts using Altair and view them in your browser.
>>> chart = df.plot(reference_source=df.lineage.sources[0], show_accuracy=True)
>>> chart.serve()
This will create an interactive Vega-Lite chart like the one in the screenshot at the top of this Readme.
Read more about built-in visualisation such as ridgeline plots.
The timely_beliefs
package runs on pandas>=1.1.5
.
Contact us if you need support for older versions.
We welcome other contributions to timely_beliefs
.