Help with custom loss function #613

gm89uk · 2024-04-30T13:38:05Z

gm89uk
Apr 30, 2024

Summary:
I am trying to create a loss function that incorporates multiple variables that are not used in X for Pysr. I'm not sure if this is possible. Essentially I am trying to achieve multi-objective optimization through a custom loss function.

a,k,d,L,w,o and g, these will be used to predict e.

Back_calculated_e is an variable that can be back calculated

Back_calculated_e = f1(a,k,Actual_P,Actual_S), this is used for training data (y)

Pysr is used to Predict e based on pre-operative variables:

Predicted_e = f2(a,k,d,L,w,o) with y This is the Pysr equation

Following this we can verify how good our Predicted_E is by using it to compare:

Predicted_P to Actual_P, Predicted P=f3(a,k,Actual_S,Predicted_e)
Predicted_S to Actual_S, Predicted S=f4(a,k,Actual_P,Predicted_e)

Why not just use Back_Calculated_e as y and use MSE as a loss function? This is what I'm currently doing with Back_Calculated e as the y, with a,k,d,L,w,o as X.

However, as there are measurement errors of baseline variables (a in particular), this makes Back_Calculated_e unreliable as a increases, and so it is much better to rely on absolute truths for loss function, e.g. an average of MSE between [Predicted_P and Actual_P], and [Predicted_S and Actual_S].

How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables?
Actual_S and Actual_P are just two arrays that I can read from an Excelsheet.

Thank you very much in advance, or if anyone has any ideas of how this could be achieved?
Failing this:

Is it possible to feed in Actual_P (more important than Actual_S) as a weight then make that the custom loss from Predicted_e?
I will run a separate Pysr model that does not use a to predict e, when a is large. However, it seems to be a suboptimal solution.

Cannot get this to work...

function custom_loss(tree, data)
  a, k, d, L, w, o, g = extract_variables(tree) #how to extract these variables?

  # Extract actual P and S from the data structure
  actual_p = ?#how to feed through
  actual_s = ?#how to feed through

  # Predicted e from the symbolic expression (tree)
  predicted_e = evaluate(tree, [a, k, d, L, w, o, g])

  # Calculate predicted P and S using preexisting functions
  predicted_p = calculate_predicted_p(predicted_e, actual_s)
  predicted_s = calculate_predicted_s(predicted_e, actual_p)

  # Combine errors using weighted MSE
  weight_p = 0.5  # Adjust weight for P prediction
  weight_s = 0.5  # Adjust weight for S prediction
  return weight_p * mean((predicted_p - actual_p) .^ 2) + weight_s * mean((predicted_s - actual_s) .^ 2)
end

Current Code:

import numpy as np
from pysr import PySRRegressor
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel("path.xlsx", sheet_name="DataTrain")
x = df[['a', 'k','d', 'L', 'w','g','o']].to_numpy() #'AVAL',
y = df[['e']].to_numpy()

model = PySRRegressor(
    elementwise_loss = "L1DistLoss()",
    model_selection="accuracy",  
    niterations=1000000, #1000000
    #ncycles_per_iteration=1500,
    binary_operators=["+", "*", "-", "/"],
    unary_operators=[
        "cos",
       "tan",
        "exp",
        "sin",
        "sqrt",
        "inv",
        "square",
	"log"
    ],
    maxsize=100,
    warm_start=True,
    populations=18,
    population_size=300,
    fraction_replaced_hof = 0.1,
    parsimony = 0.01,    
    bumper=True,
    nested_constraints={
        "sin": {"sin": 0, "cos": 0, "tan": 0}, 
        "cos": {"sin": 0, "cos": 0, "tan": 0},
        "tan": {"sin": 0, "cos": 0, "tan": 0},
        "exp": {"exp": 0, "log": 1},
	"log": {"exp": 1, "log": 0},
        "square": {"square": 2, "sqrt": 4},
        "sqrt": {"square": 4, "sqrt": 2},
    }
)
model.fit(x, y,variable_names=["a","k","d","L","w","g","o"])

Answered by MilesCranmer

May 2, 2024

How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables?

What I normally do here is just feed in the variables as additional columns of X, but then zero them out within the custom loss function. For example:

function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    X = copy(dataset.X)
    y = copy(dataset.y)  # Don't need to copy if you aren't modifying; but just a safer habit

    # Ordering (depends how you pass to .fit)
    # 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7.
    # Thus, say that 'P' is 8 and 'S' is 9
    X_without_P_and_S = vcat(
        X[1:7, :],  # The actual data
…

View full answer

MilesCranmer · 2024-05-02T17:08:25Z

MilesCranmer
May 2, 2024
Maintainer

Happy to help. I am a bit confused about one thing in your question:

Predicted_P to Actual_P, Predicted P=f(a,k,Actual_S,Predicted_e)
Predicted_S to Actual_S, Predicted S=f(a,k,Actual_P,Predicted_e)

Does this mean you want to find the same f for both, but just with the variable (S or P) swapped? If not, could you rewrite your question with a different symbol for functions that should be different? Thanks!

0 replies

MilesCranmer · 2024-05-02T17:17:20Z

MilesCranmer
May 2, 2024
Maintainer

How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables?

What I normally do here is just feed in the variables as additional columns of X, but then zero them out within the custom loss function. For example:

function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    X = copy(dataset.X)
    y = copy(dataset.y)  # Don't need to copy if you aren't modifying; but just a safer habit

    # Ordering (depends how you pass to .fit)
    # 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7.
    # Thus, say that 'P' is 8 and 'S' is 9
    X_without_P_and_S = vcat(
        X[1:7, :],  # The actual data
        X[8:9, :] .* 0,  # Pass zeroed version
    )
    # Need to pass the the full shape,
    # as the genetic algorithm will still sometimes use features 8 and 9!
    # Thus, we simply hide that information from it.

    prediction, complete = eval_tree_array(tree, X_without_P_and_S, options)
    if !complete
        return L(Inf)
    end

    mse = sum(i -> (prediction[i] - y[i])^2, eachindex(y)) / length(y)

    # Do something with X[8, :] and X[9, :] ? 

    return loss
end

Hopefully this helps you get started! And pass this entire thing as a string to the loss_function parameter.

2 replies

gm89uk May 3, 2024
Author

Thank you so much @MilesCranmer, I'm away for a week but looking forward to trying it out, and will fix my notation then. I will let you know how I get on.

gm89uk May 20, 2024
Author

Thank you very much @MilesCranmer

Edited to reduce the clutter:
I gave up on feeding in p and s as X[10] and X[11] then zeroing as it ran too slow.

I then tried to feed Actual_P and Actual_S through weights,although it has to be the same shape as y by:

feeding in database.weights as a complex number and then taking the real part as P and the complex part as S. However, only the real aspect of weights was passed through to the custom loss function. If you know how I can make this work, I would be very grateful as it should be much quicker than zeroing P and S.
I attempted to pack two floats (P + S) into weights, then decode it in the custom loss function. This worked, but ended up being the same speed as just passing through X[10] and X[11] and zeroing them out.

In the end I fed through Theoretical_P (if s = 0) through weights and using that for the loss function which works great, almost as fast as the inbuilt loss functions.

# Prepare your data
X = df[['Axiallength', 'MeanK', 'ACDepth', 'LensThickness', 'WTWCornealDiameter', 'Gender', 'Age', 'AK1336', 'AK']].to_numpy()
y = df[['ELP']].to_numpy().flatten()

# Extract Actual_P (IOLPowerEmmetropia) from the Excel file
weights = df['IOLPowerEmmetropia'].to_numpy()

elementwise_loss = """
function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    # 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7, 'akmin' - 8, 'ak' - 9, 'w' - Theoretical_P_S_0
    X = dataset.X
    y = dataset.y
    prediction, complete = eval_tree_array(tree, X, options)
    if !complete
        return L(Inf)
    end
    Predicted_P = 1336 ./ (X[1, :] .- prediction) .- 1336 ./ ((1336 ./ X[2, :]) .- prediction)
    # Check for NaN values
    if any(isnan.(Predicted_P))
        return L(Inf)
    end
    
    # Calculate MSE
    mse = sum((Predicted_P .- dataset.weights) .^ 2 .+ (prediction .- y) .^ 2) / (2 * length(y))
    return mse
end
"""


# Initialize the PySRRegressor
model = PySRRegressor(
    model_selection="accuracy",
    niterations=1000000,
    #ncycles_per_iteration=1500,
    binary_operators=["+", "*", "-", "/"],
    unary_operators=[
        "cos",
       "tan",
        "exp",
        "sin",
        "sqrt",
        "inv",
        "square",
        "relu",
        #"cube",
    	"log"
    ],
    maxsize=100,
    weights = weights,
    warm_start=True,
    populations=18,
    population_size=300,
    fraction_replaced_hof=0.1,
    parsimony=0.01,
    batch_size=50,
    bumper=True,
    nested_constraints={
        "sin": {"sin": 0, "cos": 0, "tan": 0},
        "cos": {"sin": 0, "cos": 0, "tan": 0},
        "tan": {"sin": 0, "cos": 0, "tan": 0},
        "exp": {"exp": 0, "log": 1},
        "log": {"exp": 1, "log": 0},
        "square": {"square": 2, "sqrt": 4},
        "sqrt": {"square": 4, "sqrt": 2},
    },
    loss_function=elementwise_loss,
)

# Fit the model, passing theoretical_p as weights
model.fit(X, y, weights=weights, variable_names=["a", "k", "d", "L", "w", "g", "o", 'akmin', 'ak'])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with custom loss function #613

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Help with custom loss function #613

gm89uk Apr 30, 2024

Replies: 2 comments · 2 replies

MilesCranmer May 2, 2024 Maintainer

MilesCranmer May 2, 2024 Maintainer

gm89uk May 3, 2024 Author

gm89uk May 20, 2024 Author

gm89uk
Apr 30, 2024

Replies: 2 comments 2 replies

MilesCranmer
May 2, 2024
Maintainer

MilesCranmer
May 2, 2024
Maintainer

gm89uk May 3, 2024
Author

gm89uk May 20, 2024
Author