Stochastic Custom Loss Function #738

gm89uk · 2024-10-21T00:25:00Z

gm89uk
Oct 21, 2024

Thank you again for your work on this package!

With my limited compute power available, I am trying to improve efficiency of my custom loss function.
I have several variables in X that I am trying to ensure all have monotonicity with Y in a custom loss function.
Instead of checking each one at each loss function, I have made it select a random feature and hold the other variables stable (randomly selected) and check the monotonicity.

Just wondering if this would work with pysr? i.e. if the loss is really low and an equation enters the HoF, then the next loss is high (due to the random nature of the feature selection), would pysr know to remove that equation from the HoF?

jl.seval("""
using Statistics, DataFrames
const common_bounds = -8:0.5:9 #standardised data with wide spread, -8 to 9 represent SDs
const feature_length = length(common_bounds)
const num_features = 6
X_fixed = Matrix{Float32}(undef, num_features, feature_length)

""")
elementwise_loss = """
function loss_function(tree::Node, dataset::Dataset{T,L}, options::Options, idx) where {T,L}    
    # Extract data for the given indices
    X = idx === nothing ? dataset.X : view(dataset.X, :, idx)
    y = idx === nothing ? dataset.y : view(dataset.y, idx)
    weights = idx === nothing ? dataset.weights : view(dataset.weights, idx)
    prediction, complete = eval_tree_array(tree, X, options)
    if !complete
        return L(Inf)
    end
    penalty = 0
    featureCounter = rand(1:5)
    for i in 1:5
        if i != featureCounter
            X_fixed[i, :] .= rand(common_bounds)  # Randomly initialise for all features except the one of interest
        end
    end
    X_fixed[6, :] .= rand(0:1)  #boolean variable
    # Replace the feature of interest with the fixed common bounds
    X_fixed[featureCounter, :] .= common_bounds  # Vary only the selected feature
    s_values, completeSub = eval_tree_array(tree, X_fixed, options)
     s_diff = diff(s_values)
     if !(all(s_diff .>= 0) || all(s_diff .<= 0))
 	     penalty += 0.5  # Add penalty for non-monotonicity if there is a sign change
     end        
    mse = mean((prediction .- y).^2)
    return mse + penalty

Answered by MilesCranmer

Oct 21, 2024

So PySR expects that the loss function for an expression is deterministic, due to caching, as well as the absolute ordering in the hall of fame. Therefore, if you have randomness, you could either use a fixed seed in the loss (and maybe average the loss over a few different evaluations), or perhaps re-run the search each time (with a warm start) and randomness introduced each time?

View full answer

MilesCranmer · 2024-10-21T03:10:49Z

MilesCranmer
Oct 21, 2024
Maintainer

So PySR expects that the loss function for an expression is deterministic, due to caching, as well as the absolute ordering in the hall of fame. Therefore, if you have randomness, you could either use a fixed seed in the loss (and maybe average the loss over a few different evaluations), or perhaps re-run the search each time (with a warm start) and randomness introduced each time?

1 reply

gm89uk Oct 24, 2024
Author

Thanks @MilesCranmer. Before your response I did try the Schotastic loss function (but with Inf as penalty for non-monotonicity) and surprisingly it worked perfectly well! Although I'm not sure how with the explanation you gave above. I did manage to code a deterministic loss function that is reasonable in speed but obviously much slower, which is exponentially slower with more features.

Perhaps with a long enough ncycles_per_iteration between iterations, there is enough randomness between each iteration to weed out any breaches in monotonicity? Thank you again for the explanation.

Edit: the random approach did not work, there was a combination of features that breached monotonicity, which at least makes sense.

gm89uk · 2024-11-10T04:52:04Z

gm89uk
Nov 10, 2024
Author

Hi @MilesCranmer, I think I have a working compromise. When batching=true then the loss function is also nondeterministic, but when idx == nothing, at the end of each iteration, the whole data is assessed making it deterministic again.

So, the loss function can combine a much faster stochastic loss function when idx=batch_size but when idx==nothing, perform a deterministic loss function, before updating the HoF.

If you wanted to do take advantage of this but without batching, could you turn batching=true and batch_size=length(y)?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic Custom Loss Function #738

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Stochastic Custom Loss Function #738

gm89uk Oct 21, 2024

Replies: 2 comments · 1 reply

MilesCranmer Oct 21, 2024 Maintainer

gm89uk Oct 24, 2024 Author

gm89uk Nov 10, 2024 Author

gm89uk
Oct 21, 2024

Replies: 2 comments 1 reply

MilesCranmer
Oct 21, 2024
Maintainer

gm89uk Oct 24, 2024
Author

gm89uk
Nov 10, 2024
Author