Replies: 2 comments 2 replies
-
Here is a Julia function f(tree, dataset::Dataset{T,L}, options) where {T,L}
X, completed = eval_tree_array(tree, dataset.X, options)
if !completed
return L(Inf)
end
Y = dataset.y
# DTW algorithm
window_size = 9
n = length(X)
max_size = max(length(X), length(Y))
# Initialize the DTW matrices with infinite values
dtw_current = fill(Inf, max_size + 2 * window_size)
dtw_previous = fill(Inf, max_size + 2 * window_size)
dtw_previous[1] = 0.0 # Boundary condition
for i in 1:n
dtw_current[1] = Inf # Reset current row
# Determine the iteration bounds considering the window size
start = max(1, i - window_size)
finish= min(n, i + window_size)
for j in start:finish
cost = abs(X[i] - Y[j])
# Compute minimum cost considering the DTW constraint
min_cost = min(
dtw_previous[j],
dtw_current[j],
j+1 <= length(Y) ? dtw_previous[j + 1] : Inf
)
dtw_current[j + 1] = cost + min_cost
end
# Swap the rows for the next iteration
dtw_previous, dtw_current = dtw_current, dtw_previous
end
return dtw_previous[n+1]
end I have been evaluating it today and note that it has some nice properties in accommodating corrections to a found solution. For example, using RMSE, I found the A question I have is in the possibility of passing in the |
Beta Was this translation helpful? Give feedback.
-
Very cool! I'm happy to hear you have had good results with this. I will try it out too when I get a chance.
You can customize the backend, see https://astroautomata.com/PySR/backend/ for a walkthrough of how to do this. If you want something simple and quick, if you are in PySR, another option is to simply do a search-replace on the string: loss_function_base = """
function f(tree, dataset::Dataset{T,L}, options) where {T,L}
X, completed = eval_tree_array(tree, dataset.X, options)
if !completed
return L(Inf)
end
Y = dataset.y
# DTW algorithm
window_size = WINDOW_SIZE
n = length(X)
max_size = max(length(X), length(Y))
# Initialize the DTW matrices with infinite values
dtw_current = fill(Inf, max_size + 2 * window_size)
dtw_previous = fill(Inf, max_size + 2 * window_size)
dtw_previous[1] = 0.0 # Boundary condition
for i in 1:n
dtw_current[1] = Inf # Reset current row
# Determine the iteration bounds considering the window size
start = max(1, i - window_size)
finish= min(n, i + window_size)
for j in start:finish
cost = abs(X[i] - Y[j])
# Compute minimum cost considering the DTW constraint
min_cost = min(
dtw_previous[j],
dtw_current[j],
j+1 <= length(Y) ? dtw_previous[j + 1] : Inf
)
dtw_current[j + 1] = cost + min_cost
end
# Swap the rows for the next iteration
dtw_previous, dtw_current = dtw_current, dtw_previous
end
return dtw_previous[n+1]
end
""" followed by loss_function = loss_function_base.replace("WINDOW_SIZE", "9") to pass in your parameter (and then pass that to PySR). |
Beta Was this translation helpful? Give feedback.
-
I've been using a home-grown optimizer for non-linear modeling of time-series data. I discovered that a loss function built on a Dynamic Time Warping metric is very helpful to guard against over-fitting, see my blog post https://geoenergymath.com/2024/03/08/dynamic-time-warping. A DTW metric as a PySR loss function may also be very useful in an exploratory mode as it can narrow in on a potential solution more quickly. It wouldn't be difficult to prototype as a custom loss function with a configurable window size. A code snippet is in the following GIST https://gist.github.com/pukpr/fcc9ba38c5f92bde0b53dc95c44ff7dc -- note that it outputs 1/cost because I was experimenting with a maximizing goal instead of minimizing.
What I don't understand as well is how to normalize the DTW metric so it acts like a correlation coefficient. I describe my attempt in the first link.
See this 2024 paper as well: https://arxiv.org/pdf/2401.10359.pdf
Beta Was this translation helpful? Give feedback.
All reactions