Code optimizations #325
Replies: 3 comments
-
MilesCranmer/SymbolicRegression.jl#210 implements caching of the complexities for 2. |
Beta Was this translation helpful? Give feedback.
-
Here is the profiling script I'm using: using SymbolicRegression
using Profile
using DataFrames
using CSV
using PrettyTables
X = randn(5, 500);
y = randn(500);
options = Options(;
unary_operators=[exp, sin],
binary_operators=[+, -, *, /],
nested_constraints=[exp => [exp => 0, sin => 0], sin => [sin => 0, exp => 1]],
constraints=[exp => 5, sin => 7],
maxsize=50,
mutation_weights=(optimize=0.001,),
)
EquationSearch(X, y; options, parallelism=:serial)
Profile.clear()
Profile.@profile EquationSearch(X, y; options, niterations=40, parallelism=:serial)
df = let
d = mktemp() do path, io
Profile.print(
IOContext(io, :displaysize => ntuple(_ -> typemax(Int), 2));
format=:flat,
sortedby=:count,
mincount=5,
)
CSV.read(
path,
DataFrame;
skipto=3,
footerskip=1,
ignorerepeated=true,
delim=' ',
header=[:count, :overhead, :file, :line, :function],
)
end
# Last column needs to be reparsed:
d[:, 5] = [
join(filter(t -> !ismissing(t), row |> collect), " ")
for row in eachrow(d[!, 5:end])
]
d[!, :line] = [
let i=tryparse(Int, l)
i isa Nothing ? missing : i
end
for l in d[!, :line]
]
d = d[!, 1:5]
sort(d, :count; rev=true)
end
local_libs = filter(df) do r
startswith(r[:file], r"@Symbolic|@Dynamic")
end
open("profile_results.md", "w") do io
pretty_table(io, df; tf=tf_markdown)
end |
Beta Was this translation helpful? Give feedback.
-
With SymbolicRegression.jl v0.18.0, pretty much all of the performance bottleneck is due to the constant optimizations now Here are the full profiling results (click to expand):
A better way to optimize constants would probably go a long way. Maybe we could try re-introducing automatic differentiation, or play around with reversediff. @foolnotion mentioned he has found it to be much faster in operon; might be worth trying it again. Enzyme.jl seems to compatible and could be used for reverse diff (I think my hand-coded one might not have been optimized), although it seems pretty unstable right now. Maybe in the future... |
Beta Was this translation helpful? Give feedback.
-
Edit: these measurements are out-of-date. See below for updated ones (I think these were also run without
-O3
, so give weird results).Just putting up some code optimization ideas in case anybody is interested in helping out.
Here is the current breakdown of internal functions' share of the compute expense for a small-scale run:
eval_loss
andeval_tree_array
: ~25,000 units (likely near-optimal)count_nodes
(==compute_complexity
): ~16,000 units (much more than I had anticipated)best_of_sample
: ~10,000 unitscopy_node
: ~10,000 unitsupdate_progress_bar!
: ~8,000 unitsnext_generation
: ~4,500 unitsMost of these overlap in some way. However this clearly indicates what would give the best performance improvements if we could get them down.
In particular I'm very very surprised that
count_nodes
is still that high. Perhaps that just indicates how well the evaluation code has been optimized... But that seems like a clear target for performance improvements.I think the easiest thing would be to create a
complexity
node inPopMember
to cache the complexity, and use that at various points of the code rather than recalculating from scratch.To ensure the complexity is always up to date,
PopMember
could have a customizedsetindex!
such that calling it on thetree
would trigger the recalculation of properties. Andgetindex
would simply return thetree
as is.Beta Was this translation helpful? Give feedback.
All reactions