Dimensional constraints don't seem to be applied #423

SrGonao · 2023-09-06T09:05:27Z

SrGonao
Sep 6, 2023

What happened?

Hi! I'm experimenting with dimensional constraints and finding non-expected behaviour.

When initializing the model I set "dimensional_constraint_penalty=10**5" and "select_k_features=8".
Then when fitting the model I'm using a pandas data frame (with 8 columns) and using as X_units: ["mol","V","nm","nm","J / V^2 / nm","mol / nm^3","J"] and setting as y_units ["J / V"].

The first 2 equations found by the model have, as expected, a 10000 penalty to the loss, and they have the wrong units ([nm] and [nm]+[J / V ^ 2]. The next equation is a constant and does not have the 10000 penalty loss. All following equations also do not have the 10000 penalty loss even though they are not in units J / V (which in SI units should be A*s).

The same is happening when setting the units to be J / V ^ 2 for instance. Is this the expected behaviour, or am I missing something? The function chosen, with a loss of 0.8, is not dimensionally consistent with itself, and is not even close to J/V.

Thank you

Version

0.16.3

Operating System

Windows

Package Manager

pip

Interface

Jupyter Notebook

Relevant log output

No response

Extra Info

This is what I'm using to run, the data here is random, just so it could be run, in my actual code I'm using real data.

radius = 0.58 
lenght = 2.8 
pi = 3.1415
e0 = 2.15 
rho = 33.3 

x = np.random.rand(1000,2)

X = pd.DataFrame(x,columns=["ns","voltage"])
X["radius"] = radius
X["lenght"] = lenght
X["e0"] = e0
X["rho"] = rho
X["kbt"] = 1

error = np.random.rand(1000,2)
Y = np.random.rand(1000,2)

model.fit(X, y,weights=weights,X_units=["mol","V","nm","nm","J / V^2 / nm","mol / nm^3","J"],y_units=["J / V"])

Answered by MilesCranmer

Sep 7, 2023

@SrGonao btw, it should be pretty easy to turn off the "wildcard unit" functionality, so that all learned constants are dimensionless. We just need to add a boolean parameter to the Options struct in SymbolicRegression.jl. That parameter would just end up here:

    t.constant && return WildcardQuantity{Q}(Quantity(t.val::T, R), true, false)

https://github.com/MilesCranmer/SymbolicRegression.jl/blob/5c95478d8a01f2c6b6b54da26615c07fb6d3aee1/src/DimensionalAnalysis.jl#L121

It would set that true to false if the user chooses to disable the wildcard units. That's literally all that's needed. Now I just need to find some time to add it...

View full answer

MilesCranmer · 2023-09-06T10:55:56Z

MilesCranmer
Sep 6, 2023
Maintainer

Also, I would consider lowering the dimensional_constraint_penalty. Sometimes when it is too harsh of a penalty, it prevents the search from exploring efficiently.

0 replies

MilesCranmer · 2023-09-06T11:18:01Z

MilesCranmer
Sep 6, 2023
Maintainer

Actually maybe I misunderstood the problem. It could already be working but maybe the output is unclear. Could you describe:

The function chosen, with a loss of 0.8, is not dimensionally consistent with itself, and is not even close to J/V.

what you mean here with an example? Note that any constants found during the search actually have their own units. The string [⋅] basically means it can take on any units that make the equation work. (I'd consider adding an option to remove this "wildcard dimensions" functionality if you want)

0 replies

SrGonao · 2023-09-06T20:43:36Z

SrGonao
Sep 6, 2023
Author

Also, I would consider lowering the dimensional_constraint_penalty. Sometimes when it is too harsh of a penalty, it prevents the search from exploring efficiently.

Thanks for the tip, I will try it, I had put it to such high value because it is the one suggested in the example of dimensional, but I guess in that case it was so high because the values were also very high.

Actually maybe I misunderstood the problem. It could already be working but maybe the output is unclear. Could you describe:

The function chosen, with a loss of 0.8, is not dimensionally consistent with itself, and is not even close to J/V.

what you mean here with an example? Note that any constants found during the search actually have their own units. The string [⋅] basically means it can take on any units that make the equation work. (I'd consider adding an option to remove this "wildcard dimensions" functionality if you want)

I was thinking about exactly that, that maybe this was not a bug, but just a misunderstanding from my part about the functionality. From what I gathered, there is no way to get the individual units of each constant term, which could be important.

About your suggestion, the removal of the wildcard, or at least a way to distinguish between unitless constants and constants with units (by having them have a chosen complexity for instance), would be very applicable. Maybe I'm misunderstanding the algorithm backstage, but if you can have any constant be any unit then it seems that it will most likely bypass the unit search (maybe in my case it was more pressing, because some of my variables were indeed constants).

This means that clearly this was not a bug, but just a misunderstanding of my part.

0 replies

MilesCranmer · 2023-09-07T12:56:36Z

MilesCranmer
Sep 7, 2023
Maintainer

Right, as an example, the expression:

"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅])"

is actually dimensionally consistent, because the ⋅, when solved, can take on the units of m s⁻².

However, the expression:

"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅] + m[kg])"

would not be dimensionally consistent, because there does not exist any such units inserted into the ⋅ that could make this expression work.

So, you may be asking: why not show units in the ⋅ instead of just leaving it blank (and having the user figure it out afterwards)? The reason is basically that I found it is much faster to check dimensional consistency this way. If we were to solve exactly what units should be used in each ⋅, it would be a bit slower (not to mention sometimes there are multiple solutions). The reason is: dimensional check, you basically just have to trace from leaves of the expression upwards, recording if there is a "wildcard" dimension or not. But for getting the specific units, you would have to first trace from leaves to root, and then from root to leaves to fill in the units.

Since we need to very rapidly evaluate dimensional consistency, the tradeoff just did not seem worth it, compared to the user figuring out the units afterwards.

But maybe there is a fast way to do it, and we could display the units instead of ⋅. The dimensional analysis portion of code was fairly recent and I'm open to suggestions/changing it!

You can see the dimensional analysis code here:

https://github.com/MilesCranmer/SymbolicRegression.jl/blob/5c95478d8a01f2c6b6b54da26615c07fb6d3aee1/src/DimensionalAnalysis.jl

For example, the code for addition and subtraction operators is given here (op is either + or -)

@eval function $(op)(l::W, r::W) where {Q,W<:WildcardQuantity{Q}}
        l.violates && return l
        r.violates && return r
        if same_dimensions(l, r)
            return W($(op)(l.val, r.val), l.wildcard && r.wildcard, false)
        elseif l.wildcard && r.wildcard
            return W(
                constructor_of(Q)($(op)(ustrip(l), ustrip(r)), typeof(dimension(l))),
                true,
                false,
            )
        elseif l.wildcard
            return W($(op)(constructor_of(Q)(ustrip(l), dimension(r)), r.val), false, false)
        elseif r.wildcard
            return W($(op)(l.val, constructor_of(Q)(ustrip(r), dimension(l))), false, false)
        else
            return W(one(Q), false, true)
        end
    end

You can see there are five branches (after first checking if either the left or right argument is already dimensionally invalid):

Both left and right have the same dimensionality => the expression is valid AND any wildcard units are propagated upwards to the parent expression (so they can be consumed later).
Otherwise, we check if both left and right have a wildcard unit => the expression is valid AND the wildcard unit is propagated.
Otherwise, we check if the left has a wildcard => expression is valid, no wildcard unit propagated. This consumes the wildcard unit (basically a point that sets the unit of the constant. Maybe we could fill in the unit with a pointer here...).
Similar, but for right argument.
Invalid expression, as not dimensionally consistent and there is no wildcard.

1 reply

SrGonao Sep 7, 2023
Author

Thanks for the explanation. I'm not fluent at all in Julia, so I don't know if the comments I left below make sense/are easy to implement/go in the direction you are hoping for with the dimensional constraint.
I think it is OK to have wildcards be left to the user.

MilesCranmer · 2023-09-07T13:01:01Z

MilesCranmer
Sep 7, 2023
Maintainer

The type used for "wildcard" quantities is this one:

"""
    WildcardQuantity{Q<:AbstractQuantity}


A wrapper for a `AbstractQuantity` that allows for a wildcard feature, indicating
there is a free constant whose dimensions are not yet determined.
Also stores a flag indicating whether an expression is dimensionally consistent.
"""
struct WildcardQuantity{Q<:AbstractQuantity}
    val::Q
    wildcard::Bool
    violates::Bool
end

Q is a quantity-like type. The quantity objects are from the units package DynamicQuantities.jl, but this WildcardQuantity is defined in SymbolicRegression.jl

0 replies

MilesCranmer · 2023-09-07T13:09:37Z

MilesCranmer
Sep 7, 2023
Maintainer

Also, as a quick and dirty way to avoid learned constants, you can use complexity_of_constants=100. Then constants are prohibitively expensive so the search will avoid them.

2 replies

SrGonao Sep 7, 2023
Author

I understand this point, but for me there seems to be a difference between "fundamental" difference between dimensional and adimensional constants when trying to keep dimensional consistency. Eg. I have a feature with units [V] and another with units [mol] and I want to find something that is [J / V] like in my example.
My idea of dimensional consistency was to give it some dimensional constants (that make sense in my system), so that these would be used as placeholders for the dimensions (and also because I feel like they could be useful for the physical interpretation of the data). That's why I give it a feature with units [m] (even though this feature is constant), as well as a feature with units [J] and [J / V2 / m] and [mol/m3]. In this case, even the complexity of features is lower than the complexity of constants it has to find some adimensional constants (maybe the expression is "1.5 * [lenght]**3/[density]") while in the case where dimensional constants are equally complex the expression found would just be " 1.2412 [.]", which is less complex, because it only has one constant instead of one constant and 4 features.
I understand that maybe my case is an edge one, but it seemed more useful for me to be able to set some dimensional "pointers" as features, also because then maybe you get back expressions that are actually physically interpretable using some parameters of your system that are constant. This was just a thought, have no idea how hard it would be to implement a complexity_of_wildcards, and to have wildcard constants and non wildcard ones.

Thanks for the discussion either way.

MilesCranmer Sep 7, 2023
Maintainer

complexity_of_wildcards could be a good idea. I'll think about how hard it would be to implement. It's definitely easier to just turn on/off wildcards completely.

MilesCranmer · 2023-09-07T13:10:28Z

MilesCranmer
Sep 7, 2023
Maintainer

(Moved to discussion as this seems to be not a bug. Let me know if otherwise and I can move it back!)

4 replies

SrGonao Sep 7, 2023
Author

I agree

MilesCranmer Sep 7, 2023
Maintainer

@SrGonao btw, it should be pretty easy to turn off the "wildcard unit" functionality, so that all learned constants are dimensionless. We just need to add a boolean parameter to the Options struct in SymbolicRegression.jl. That parameter would just end up here:

    t.constant && return WildcardQuantity{Q}(Quantity(t.val::T, R), true, false)

https://github.com/MilesCranmer/SymbolicRegression.jl/blob/5c95478d8a01f2c6b6b54da26615c07fb6d3aee1/src/DimensionalAnalysis.jl#L121

It would set that true to false if the user chooses to disable the wildcard units. That's literally all that's needed. Now I just need to find some time to add it...

Answer selected by SrGonao

SrGonao Sep 7, 2023
Author

If this is the case, this would mostly solve my issue, I would say. Thanks for the time replying, I will keep an eye if theres an update to the code, I understand you probably have other things in hand.

MilesCranmer Sep 7, 2023
Maintainer

If you want to, you could easily add this tweak locally. Follow the guide here: https://astroautomata.com/PySR/backend/ for customizing the search code, and edit src/DimensionalAnalysis.jl to change that true to a false. No need to generalize it with a user option yet if you just want to tweak it locally.

This comment has been hidden.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimensional constraints don't seem to be applied #423

{{title}}

Replies: 9 comments 7 replies

This comment has been hidden.

{{title}}

This comment has been hidden.

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dimensional constraints don't seem to be applied #423

SrGonao Sep 6, 2023

What happened?

Version

Operating System

Package Manager

Interface

Relevant log output

Extra Info

Replies: 9 comments · 7 replies

This comment has been hidden.

MilesCranmer Sep 6, 2023 Maintainer

This comment has been hidden.

MilesCranmer Sep 6, 2023 Maintainer

SrGonao Sep 6, 2023 Author

MilesCranmer Sep 7, 2023 Maintainer

SrGonao Sep 7, 2023 Author

MilesCranmer Sep 7, 2023 Maintainer

MilesCranmer Sep 7, 2023 Maintainer

SrGonao Sep 7, 2023 Author

MilesCranmer Sep 7, 2023 Maintainer

MilesCranmer Sep 7, 2023 Maintainer

SrGonao Sep 7, 2023 Author

MilesCranmer Sep 7, 2023 Maintainer

SrGonao Sep 7, 2023 Author

MilesCranmer Sep 7, 2023 Maintainer

SrGonao
Sep 6, 2023

Replies: 9 comments 7 replies

MilesCranmer
Sep 6, 2023
Maintainer

MilesCranmer
Sep 6, 2023
Maintainer

SrGonao
Sep 6, 2023
Author

MilesCranmer
Sep 7, 2023
Maintainer

SrGonao Sep 7, 2023
Author

MilesCranmer
Sep 7, 2023
Maintainer

MilesCranmer
Sep 7, 2023
Maintainer

SrGonao Sep 7, 2023
Author

MilesCranmer Sep 7, 2023
Maintainer

MilesCranmer
Sep 7, 2023
Maintainer

SrGonao Sep 7, 2023
Author

MilesCranmer Sep 7, 2023
Maintainer

SrGonao Sep 7, 2023
Author

MilesCranmer Sep 7, 2023
Maintainer