Default threshold_outlier value for EWMA monitoring class #4

loicdtx · 2023-01-28T00:46:04Z

While experimenting with simulated time-series, I found out that the default value for the threshold_outlier argument should be significantly higher (10 could be a more reasonable value, though that would depend on the range of the monitored values too I guess). When threshold_outlier is too low, all observations following a disturbance are considered extreme outliers and the process value is no longer updated, resulting in the break never being detected.

default sensitivity also appears to be a bit low (meaning too sensitive). I obtain best performances with a sensitivity of 4; though the synthetic data may not be realistic enough to be fully conclusive on that aspect.

import datetime

import numpy as np
import xarray as xr
from nrt import data
from nrt.monitor.ewma import EWMA
import matplotlib.pyplot as plt

# Create synthetic time-series
dates = np.arange('2018-01-01', '2022-06-15', dtype='datetime64[W]')
params_ds = data.make_cube_parameters(shape=(50,50),
                                      n_outliers_interval=(0,5),
                                      n_nan_interval=(0,7),
                                      break_idx_interval=(105,dates.size - 20))
cube = data.make_cube(dates=dates, params_ds=params_ds)

# Split stable history and monitoring period
cube_history = cube.sel(time=slice('2018-01-01','2019-12-31'))
cube_monitor = cube.sel(time=slice('2020-01-01', '2022-12-31'))


# INstantiation and fitting 
EwmaMonitor = EWMA(trend=False, harmonic_order=1, lambda_=0.3, sensitivity=4, threshold_outlier=10)
EwmaMonitor.fit(dataarray=cube_history)

# Function to 
def get_monitoring_state(nrtInstance, date, array, pixel_idx):
    """Retrieve monitoring paramters of a single pixel
    """
    process = nrtInstance.process[pixel_idx]
    boundary = nrtInstance.boundary[pixel_idx]
    model = nrtInstance.predict(date)[pixel_idx]
    value = array[pixel_idx]
    mask = nrtInstance.mask[pixel_idx]
    return value, model, process, boundary, mask

process_list = []
boundary_list = []
model_list = []
value_list = []
date_list = []
mask_list = []


# Emulate monitoring
for array, date in zip(cube_monitor.values,
                       cube_monitor.time.values.astype('M8[s]').astype(datetime.datetime)):
    EwmaMonitor.monitor(array=array, date=date)
    # Collect monitoring state for visualization and inspection
    value, model, process, boundary, mask = get_monitoring_state(EwmaMonitor, date, array, (5,5))
    process_list.append(process)
    boundary_list.append(boundary)
    model_list.append(model)
    value_list.append(value)
    date_list.append(date)
    mask_list.append(mask)

fig, [ax0, ax1, ax2] = plt.subplots(3,1)
ax0.plot(date_list, model_list)
ax0.plot(date_list, value_list, marker='.', color='green')
ax1.plot(date_list, process_list, color='magenta')
ax1.plot(date_list, boundary_list, color='black')
ax2.plot(date_list, mask_list)
plt.show()

# Compute precision and recall of the monitoring simulation
def accuracy(nrtInstance, params_ds, dates, delta=180):
    """Compute accuracy metrics (precision, recall) of a nrt simulation on synthetic data

    Args:
        nrtInstance: Instance of a nrt monitoring class used for monitoring
        params_ds: Time-series generation paramaters
        dates: Array of numpy.datetime64 dates used for synthetic time-series generation
        delta: Number of days after for which a detection is still considered valid
    """
    detection_date = nrtInstance._report(layers=['detection_date'], dtype=np.uint16)
    dates_true = np.where(params_ds.break_idx != -1,
                          dates[params_ds.break_idx.values],
                          np.datetime64('NaT'))
    dates_true_bound = dates_true + np.timedelta64(delta)
    dates_pred = np.datetime64('1970-01-01') + np.timedelta64(1) * detection_date
    dates_pred[dates_pred == np.datetime64('1970-01-01')] = np.datetime64('NaT')
    # COmputes arrays of TP, FP, FN (they should be mutually exclusive
    TP = np.where(np.logical_and(dates_pred >= dates_true, dates_pred <= dates_true_bound),
                  1, 0)
    FP = np.where(np.logical_and(TP == 0, ~np.isnat(dates_pred)), 1, 0)
    FN = np.where(np.logical_and(np.isnat(dates_pred), ~np.isnat(dates_true)), 1, 0)
    precision = TP.sum() / (TP.sum() + FP.sum())
    recall = TP.sum() / (TP.sum() + FN.sum())
    return precision, recall

print(accuracy(EwmaMonitor, params_ds, dates))

The text was updated successfully, but these errors were encountered:

loicdtx · 2023-01-28T00:47:52Z

Example with threshold of 2:

First panel: obs + fit
Second panel: process and boundary
Third panel: mask value

- Small bug fix in report method Fixes #2 and #4

loicdtx added a commit that referenced this issue Mar 16, 2023

- Change default threshold_outlier value for EWMA

4a7c078

- Small bug fix in report method Fixes #2 and #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default threshold_outlier value for EWMA monitoring class #4

Default threshold_outlier value for EWMA monitoring class #4

loicdtx commented Jan 28, 2023 •

edited

Loading

loicdtx commented Jan 28, 2023

Default threshold_outlier value for EWMA monitoring class #4

Default threshold_outlier value for EWMA monitoring class #4

Comments

loicdtx commented Jan 28, 2023 • edited Loading

loicdtx commented Jan 28, 2023

loicdtx commented Jan 28, 2023 •

edited

Loading