Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create TemplateExpression for providing a pre-defined functional structure and constraints #355

Merged
merged 60 commits into from
Oct 18, 2024

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Oct 13, 2024

TemplateExpression

This lets you easily create expressions with a specific structure.

For example, if you want to have $$f(x_1, x_2) + exp(- g(x_3) \cdot g(x_3))$$, this PR will let you.

There are some remaining issues though that we need to fix. It seems there is an undefined reference error when simplifying that I think is likely due to get_tree stitching the inner expressions together, whereby some function (that I haven't figured out yet) mutates the resultant expressions.

This should never happen, and is always avoided by mutate! because of the fact that they call the new function get_contents_for_mutation which prevents this. But I guess there are some other functions which also need this guard in place.

Fixes aliasing issues

This was a very bad bug that I just happened to identify as part of this PR ([1],[2]). Basically, the crossover operation would sometimes happen between a population member and itself. There was no check to prevent the crossover from sampling the exact same member from the population!

In addition to weird effects from actually performing a crossover operation on the same expression, this also meant that multiple individuals in a population could end up being the same. This means that all downstream operations (including ones ran in parallel!) would be susceptible to aliasing issues.

In short, this was a nasty bug, and it is surprising that it only showed up explicitly when implementing the BlueprintExpression. I wonder if this was the source of various crashes (especially for long-running jobs) over the past 1+ years, and I was just never able to identify the source of the race condition.

Anyways, this seems like it is fixed now, because we copy population members immediately after sampling.

Fix maxsize issues

There were some issues hanging around that had to do with maxsize being violated. It turns out this was simply an issue of a < appearing where a <= should have been. This has been fixed.


TODO:

  • Fix aliasing issue due to some function mutating the combined expression rather than working only on a single inner expression at a time.
  • Fix a name for this. Currently it is ConstrainedExpression but I'm not a fan of this. ConstrainedStructuredExpression is too long. Maybe MultiExpression?
  • Document the new type.
  • Add unittests. Want to reach 100% diff coverage as there's a lot of new features here.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Oct 13, 2024

I wonder if the aliasing could be due to the use of copy_node rather than copy in the codebase? Note that copy calls copy_node but not the other way around. Nope, the expression version of copy_node simply calls copy.

Very odd. It's extremely worrying if there's an aliasing issue though. I can't immediately see where it might be coming from.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Oct 14, 2024

It looks like sometimes, crossover attempts to cross the same expression with itself. Perhaps that is why this issue arises? Not because of the fact that there is aliasing, but because copy corrects any undefined fields in the tree?


Yep, it looks like crossover applied to the exact same population member is enough to cause aliasing! Because it will put both "children" back into the population. If that's the same member, then that will mean that multiple population members are the same!

Base automatically changed from deprecate-jl-1.9 to abstract-options October 14, 2024 00:57
@MilesCranmer MilesCranmer changed the title Add structured expressions Create BlueprintExpression for providing a pre-defined functional structure and constraints Oct 14, 2024
Base automatically changed from abstract-options to master October 14, 2024 01:32
Copy link
Contributor

github-actions bot commented Oct 14, 2024

Benchmark Results

master a9e5332... master/a9e5332c7a335e...
search/multithreading 25.6 ± 2.1 s 25.9 ± 2.6 s 0.988
search/serial 36.5 ± 1.5 s 38.3 ± 1.6 s 0.953
utils/best_of_sample 1 ± 0.41 μs 1.84 ± 1.7 μs 0.544
utils/check_constraints_x10 12.3 ± 3.3 μs 12.5 ± 3.4 μs 0.988
utils/compute_complexity_x10/Float64 2.09 ± 0.11 μs 2.08 ± 0.11 μs 1
utils/compute_complexity_x10/Int64 2.04 ± 0.11 μs 2.02 ± 0.11 μs 1.01
utils/compute_complexity_x10/nothing 1.43 ± 0.13 μs 1.41 ± 0.1 μs 1.01
utils/insert_random_op_x10 5.84 ± 1.9 μs 5.84 ± 1.8 μs 1
utils/next_generation_x100 0.424 ± 0.057 ms 0.436 ± 0.064 ms 0.973
utils/optimize_constants_x10 0.0371 ± 0.0087 s 0.0376 ± 0.0087 s 0.986
utils/randomly_rotate_tree_x10 5.21 ± 0.62 μs 5.34 ± 0.62 μs 0.976
time_to_load 1.92 ± 0.028 s 1.6 ± 0.0052 s 1.2

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@coveralls
Copy link

coveralls commented Oct 14, 2024

Pull Request Test Coverage Report for Build 11394658450

Details

  • 253 of 257 (98.44%) changed or added relevant lines in 15 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.6%) to 95.332%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Dataset.jl 2 3 66.67%
src/OptionsStruct.jl 1 2 50.0%
src/MutationFunctions.jl 45 47 95.74%
Files with Coverage Reduction New Missed Lines %
src/Configure.jl 1 88.39%
src/Dataset.jl 2 90.0%
Totals Coverage Status
Change from base Build 11319643087: 0.6%
Covered Lines: 2818
Relevant Lines: 2956

💛 - Coveralls

@MilesCranmer
Copy link
Owner Author

TemplateExpression might be better actually

@MilesCranmer MilesCranmer changed the title Create BlueprintExpression for providing a pre-defined functional structure and constraints Create TemplateExpression for providing a pre-defined functional structure and constraints Oct 15, 2024
@MilesCranmer MilesCranmer merged commit a38f901 into master Oct 18, 2024
17 checks passed
@MilesCranmer MilesCranmer deleted the structured-expressions branch October 18, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants