Dynamic Time Warping metric as a potential PySR loss function for time-series modeling #563

pukpr · 2024-03-08T23:17:57Z

pukpr
Mar 8, 2024

I've been using a home-grown optimizer for non-linear modeling of time-series data. I discovered that a loss function built on a Dynamic Time Warping metric is very helpful to guard against over-fitting, see my blog post https://geoenergymath.com/2024/03/08/dynamic-time-warping. A DTW metric as a PySR loss function may also be very useful in an exploratory mode as it can narrow in on a potential solution more quickly. It wouldn't be difficult to prototype as a custom loss function with a configurable window size. A code snippet is in the following GIST https://gist.github.com/pukpr/fcc9ba38c5f92bde0b53dc95c44ff7dc -- note that it outputs 1/cost because I was experimenting with a maximizing goal instead of minimizing.

What I don't understand as well is how to normalize the DTW metric so it acts like a correlation coefficient. I describe my attempt in the first link.

See this 2024 paper as well: https://arxiv.org/pdf/2401.10359.pdf

” In this paper, we propose a non-intrusive overfitting detection and prevention approach using time series classifiers trained on the training history of DL models. Our approach (when using the KNN-DTW time series classifier) has (1) better classification performance than correlation-based approaches for overfitting detection, and (2) greater accuracy than early stopping for overfitting prevention with a shorter delay. We evaluate our approach on a real-world dataset of labelled training histories collected from the papers published at top AI venues in the last 5 years. Our approach can be a useful tool for researchers and developers of DL software” --
LI, HAO, ET AL. “KEEPING DEEP LEARNING MODELS IN CHECK: A HISTORY-BASED APPROACH TO MITIGATE OVERFITTING.”

pukpr · 2024-03-18T19:17:23Z

pukpr
Mar 18, 2024
Author

Here is a Julia loss_function for a Dynamic Time Warping metric.

function f(tree, dataset::Dataset{T,L}, options) where {T,L}
    X, completed = eval_tree_array(tree, dataset.X, options)
    if !completed
        return L(Inf)
    end
    Y = dataset.y

    # DTW algorithm
    window_size = 9
    n = length(X)
    max_size = max(length(X), length(Y))
    
    # Initialize the DTW matrices with infinite values
    dtw_current = fill(Inf, max_size + 2 * window_size)
    dtw_previous = fill(Inf, max_size + 2 * window_size)
    
    dtw_previous[1] = 0.0  # Boundary condition
    
    for i in 1:n
        dtw_current[1] = Inf  # Reset current row
        
        # Determine the iteration bounds considering the window size
        start = max(1, i - window_size)
        finish= min(n, i + window_size)
        
        for j in start:finish
            cost = abs(X[i] - Y[j])
            
            # Compute minimum cost considering the DTW constraint
            min_cost = min(
                dtw_previous[j],
                dtw_current[j],
                j+1 <= length(Y) ? dtw_previous[j + 1] : Inf
            )
            
            dtw_current[j + 1] = cost + min_cost
        end
        
        # Swap the rows for the next iteration
        dtw_previous, dtw_current = dtw_current, dtw_previous
    end
    
    return dtw_previous[n+1]
end

I have been evaluating it today and note that it has some nice properties in accommodating corrections to a found solution. For example, using RMSE, I found the $x_0$ and $x_1$ solutions were distinct in ranked complexity but using DTW, I could get mixed $x_0$ and $x_1$ solutions.

A question I have is in the possibility of passing in the window_size through the options parameter

0 replies

MilesCranmer · 2024-03-24T23:56:49Z

MilesCranmer
Mar 24, 2024
Maintainer

Very cool! I'm happy to hear you have had good results with this. I will try it out too when I get a chance.

A question I have is in the possibility of passing in the window_size through the options parameter.

You can customize the backend, see https://astroautomata.com/PySR/backend/ for a walkthrough of how to do this.

If you want something simple and quick, if you are in PySR, another option is to simply do a search-replace on the string:

loss_function_base = """
function f(tree, dataset::Dataset{T,L}, options) where {T,L}
    X, completed = eval_tree_array(tree, dataset.X, options)
    if !completed
        return L(Inf)
    end
    Y = dataset.y

    # DTW algorithm
    window_size = WINDOW_SIZE
    n = length(X)
    max_size = max(length(X), length(Y))
    
    # Initialize the DTW matrices with infinite values
    dtw_current = fill(Inf, max_size + 2 * window_size)
    dtw_previous = fill(Inf, max_size + 2 * window_size)
    
    dtw_previous[1] = 0.0  # Boundary condition
    
    for i in 1:n
        dtw_current[1] = Inf  # Reset current row
        
        # Determine the iteration bounds considering the window size
        start = max(1, i - window_size)
        finish= min(n, i + window_size)
        
        for j in start:finish
            cost = abs(X[i] - Y[j])
            
            # Compute minimum cost considering the DTW constraint
            min_cost = min(
                dtw_previous[j],
                dtw_current[j],
                j+1 <= length(Y) ? dtw_previous[j + 1] : Inf
            )
            
            dtw_current[j + 1] = cost + min_cost
        end
        
        # Swap the rows for the next iteration
        dtw_previous, dtw_current = dtw_current, dtw_previous
    end
    
    return dtw_previous[n+1]
end
"""

followed by

loss_function = loss_function_base.replace("WINDOW_SIZE", "9")

to pass in your parameter (and then pass that to PySR).

2 replies

MilesCranmer Mar 25, 2024
Maintainer

Also @pukpr one quick tip is I would make dtw_current and dtw_previous have the correct output type with:

    dtw_current = fill(L(Inf), max_size + 2 * window_size)
    dtw_previous = fill(L(Inf), max_size + 2 * window_size)

Thus if you have precision=32 (default), these arrays will correctly have 32-bit floats rather than 64 (which is what Inf is).

pukpr Mar 25, 2024
Author

You're welcome. Compared to simple loss metrics Iike RMS, it is noticeably slower in throughput. The dynamic programming algorithm embedded in there helps quite a bit, and in terms of the Window_Size, reducing it helps speed it up at the expense of warping distance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Time Warping metric as a potential PySR loss function for time-series modeling #563

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Dynamic Time Warping metric as a potential PySR loss function for time-series modeling #563

pukpr Mar 8, 2024

Replies: 2 comments · 2 replies

pukpr Mar 18, 2024 Author

MilesCranmer Mar 24, 2024 Maintainer

MilesCranmer Mar 25, 2024 Maintainer

pukpr Mar 25, 2024 Author

pukpr
Mar 8, 2024

Replies: 2 comments 2 replies

pukpr
Mar 18, 2024
Author

MilesCranmer
Mar 24, 2024
Maintainer

MilesCranmer Mar 25, 2024
Maintainer

pukpr Mar 25, 2024
Author