Help with custom loss function #613
-
Summary:
Back_calculated_e is an variable that can be back calculated
Pysr is used to Predict e based on pre-operative variables:
Following this we can verify how good our Predicted_E is by using it to compare:
Why not just use Back_Calculated_e as y and use MSE as a loss function? This is what I'm currently doing with Back_Calculated e as the y, with a,k,d,L,w,o as X. However, as there are measurement errors of baseline variables (a in particular), this makes Back_Calculated_e unreliable as a increases, and so it is much better to rely on absolute truths for loss function, e.g. an average of MSE between [Predicted_P and Actual_P], and [Predicted_S and Actual_S]. How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables? Thank you very much in advance, or if anyone has any ideas of how this could be achieved?
Cannot get this to work... function custom_loss(tree, data)
a, k, d, L, w, o, g = extract_variables(tree) #how to extract these variables?
# Extract actual P and S from the data structure
actual_p = ?#how to feed through
actual_s = ?#how to feed through
# Predicted e from the symbolic expression (tree)
predicted_e = evaluate(tree, [a, k, d, L, w, o, g])
# Calculate predicted P and S using preexisting functions
predicted_p = calculate_predicted_p(predicted_e, actual_s)
predicted_s = calculate_predicted_s(predicted_e, actual_p)
# Combine errors using weighted MSE
weight_p = 0.5 # Adjust weight for P prediction
weight_s = 0.5 # Adjust weight for S prediction
return weight_p * mean((predicted_p - actual_p) .^ 2) + weight_s * mean((predicted_s - actual_s) .^ 2)
end Current Code: import numpy as np
from pysr import PySRRegressor
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel("path.xlsx", sheet_name="DataTrain")
x = df[['a', 'k','d', 'L', 'w','g','o']].to_numpy() #'AVAL',
y = df[['e']].to_numpy()
model = PySRRegressor(
elementwise_loss = "L1DistLoss()",
model_selection="accuracy",
niterations=1000000, #1000000
#ncycles_per_iteration=1500,
binary_operators=["+", "*", "-", "/"],
unary_operators=[
"cos",
"tan",
"exp",
"sin",
"sqrt",
"inv",
"square",
"log"
],
maxsize=100,
warm_start=True,
populations=18,
population_size=300,
fraction_replaced_hof = 0.1,
parsimony = 0.01,
bumper=True,
nested_constraints={
"sin": {"sin": 0, "cos": 0, "tan": 0},
"cos": {"sin": 0, "cos": 0, "tan": 0},
"tan": {"sin": 0, "cos": 0, "tan": 0},
"exp": {"exp": 0, "log": 1},
"log": {"exp": 1, "log": 0},
"square": {"square": 2, "sqrt": 4},
"sqrt": {"square": 4, "sqrt": 2},
}
)
model.fit(x, y,variable_names=["a","k","d","L","w","g","o"]) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Happy to help. I am a bit confused about one thing in your question:
Does this mean you want to find the same |
Beta Was this translation helpful? Give feedback.
-
What I normally do here is just feed in the variables as additional columns of function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
X = copy(dataset.X)
y = copy(dataset.y) # Don't need to copy if you aren't modifying; but just a safer habit
# Ordering (depends how you pass to .fit)
# 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7.
# Thus, say that 'P' is 8 and 'S' is 9
X_without_P_and_S = vcat(
X[1:7, :], # The actual data
X[8:9, :] .* 0, # Pass zeroed version
)
# Need to pass the the full shape,
# as the genetic algorithm will still sometimes use features 8 and 9!
# Thus, we simply hide that information from it.
prediction, complete = eval_tree_array(tree, X_without_P_and_S, options)
if !complete
return L(Inf)
end
mse = sum(i -> (prediction[i] - y[i])^2, eachindex(y)) / length(y)
# Do something with X[8, :] and X[9, :] ?
return loss
end Hopefully this helps you get started! And pass this entire thing as a string to the |
Beta Was this translation helpful? Give feedback.
What I normally do here is just feed in the variables as additional columns of
X
, but then zero them out within the custom loss function. For example: