Can PySR handle recursive functions? #540
-
I am very impressed by the capabilities of PySR. I am just starting with it and it appears very useful to me! But I have a question. If this is not possible directly, how could I formulate the problem so that PySR could solve it, perhaps even partially or approximately? The main goal is to show that the Mandelbrot set, although it looks infinitely complex to a human observer because it shows infinitely new and complex structure if you zoom into it infinitely wide, in reality it’s structure has a very low (algoritmic) complexity because of the fact that it’s generating (recursive) function is so simple: Zn+1 = Zn^2 + C.
I hope someone can help me overcome this hurdle! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @jabogithub, Really interesting question! I spent some time on this out of interest and managed to get it to work. This lets you rediscover the Mandelbrot set recursive relation from random samples of the Mandelbrot set. First, here's a Julia version:using SymbolicRegression
using MLJBase: machine, fit!, predict
using Random: Random
# Mandelbrot set
function in_mandelbrot(c; max_iteration = 1000)
z = zero(c)
iteration = 0
while abs(z) <= 2.0 && iteration < max_iteration
z = z^2 + c
iteration += 1
end
return (abs(z) <= 2.0 && iteration == max_iteration)
end
"""
my_loss(tree, dataset, options)
Custom loss function for sets defined by recursive functions.
- `tree`: The expression tree defining the recursive function. The first feature
will be used to store the output of the function.
- `dataset`: The dataset to evaluate the function on, of shape (n_features, n_samples).
- `options`: The options for the symbolic regression model.
"""
function my_loss(tree, dataset::Dataset{T,L}, options::Options) where {T,L}
num_rows = size(dataset.X, 2)
max_iteration = 1000
boundary = 2.0
X = copy(dataset.X) # Copy dataset for modification
for i in axes(X, 2)
X[1, i] = zero(T) # Initialize z to 0.0 + 0.0im
end
in_bounds = [true for _ = 1:num_rows]
for iteration in 1:max_iteration
# Only work on subset of data that is in-bounds
sliced_X = @view X[:, in_bounds]
sliced_in_bounds = @view in_bounds[in_bounds]
# (The @view macro is used to avoid copying the data)
sliced_out, completed = eval_tree_array(tree, sliced_X, options)
if !completed
# Penalty if incomplete evaluation due to NaNs,
# but reduce penalty if it got further along
return L(1000000 * (1 + max_iteration - iteration))
end
# Update z
for i in axes(sliced_X, 2)
sliced_in_bounds[i] &= abs(sliced_out[i]) <= boundary
if sliced_in_bounds[i]
# Only update if still in bounds
sliced_X[1, i] = sliced_out[i]
end
end
if !any(in_bounds)
break
end
end
return L(sum(i -> abs(dataset.y[i] - in_bounds[i]), 1:num_rows) / num_rows)
end
N = 300
z = [zero(ComplexF32) for _ = 1:N]
c = let RNG = Random.MersenneTwister(0)
[rand(RNG, ComplexF32) * 2 - 1 for _ = 1:N]
end
X = (; z, c)
y = ComplexF32.(in_mandelbrot.(c))
println("Proportion of dataset in Mandelbrot set: ", Float64(sum(y) / N))
model = SRRegressor(
binary_operators=(+, -, *),
unary_operators=(cos,),
niterations=100,
loss_function=my_loss,
)
mach = machine(model, X, y)
fit!(mach) We can see that this recovers the correct Mandelbrot set relation! Note that this code is written to be pedagogical so does not use all the Julia tricks out there. Also, it is currently quite expensive to run for some reason. Perhaps that is unavoidable or maybe there is a better way to structure things. Disabling some things like constant optimization, if not needed, might speed things up. Finally, note that the order of features input matters – it is assumed in the code that the first feature is The key part here is the definition of The PySR equivalent would be as follows: import numpy as np
import pandas as pd
from pysr import PySRRegressor
def in_mandelbrot(c, max_iteration=1000):
z = 0.0 + 0.0j
iteration = 0
while abs(z) <= 2.0 and iteration < max_iteration:
z = z**2 + c
iteration += 1
return abs(z) <= 2.0 and iteration == max_iteration
full_objective = """
function my_loss(tree, dataset::Dataset{T,L}, options::Options) where {T,L}
num_rows = size(dataset.X, 2)
max_iteration = 1000
boundary = 2.0
X = copy(dataset.X) # Copy dataset for modification
for i in axes(X, 2)
X[1, i] = zero(T) # Initialize z to 0.0 + 0.0im
end
in_bounds = [true for _ = 1:num_rows]
for iteration = 1:max_iteration
# Only work on subset of data that is in-bounds
sliced_X = @view X[:, in_bounds]
sliced_in_bounds = @view in_bounds[in_bounds]
# (The @view macro is used to avoid copying the data)
sliced_out, completed = eval_tree_array(tree, sliced_X, options)
if !completed
# Penalty if incomplete evaluation due to NaNs,
# but reduce penalty if it got further along
return L(1000000 * (1 + max_iteration - iteration))
end
# Update z
for i in axes(sliced_X, 2)
sliced_in_bounds[i] &= abs(sliced_out[i]) <= boundary
if sliced_in_bounds[i]
# Only update if still in bounds
sliced_X[1, i] = sliced_out[i]
end
end
if !any(in_bounds)
break
end
end
return L(sum(i -> abs(dataset.y[i] - in_bounds[i]), 1:num_rows) / num_rows)
end
"""
N = 300
rstate = np.random.RandomState(0)
z = np.zeros(N, dtype=np.complex128)
c = rstate.uniform(-1, 1, N) + 1j * rstate.uniform(-1, 1, N)
X = pd.DataFrame({"z": z, "c": c})
y = (np.vectorize(in_mandelbrot)(c)).astype(np.complex128)
model = PySRRegressor(
binary_operators=["+", "-", "*"],
unary_operators=["cos"],
niterations=100,
full_objective=full_objective,
)
model.fit(X, y) Here's part way through the search: a little bit farther (with the correct relation now showing up): and here it is once it finds it: Cheers, |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer ! Amazing work, which goes to show the power of Symbolic Regression in general, but more specific the power of PySR . Also this proves that PySR is able to uncover the simplicity behind seemingly overly complex data! (Although we know beforehand the ground truth in this use case: the Mandelbrot data generating recursive function, but it had to be seen if PySR has the power to uncover that, which it has according to your effort!). That was my main goal: getting to know if PySR is up to this challenging problem. It strengthens my trust that PySR in principle can solve difficult problems, just from raw (experimental) data alone. I still have to go into your code, because it is not so straightforward as it seems, to me as totally inexperienced PySR practitioner at least. In particular the construction of the loss function looks not so trivial at first sight! I will study your solution and of course try it out next week! Very exciting I must say. Perhaps I come back to you here in this space … |
Beta Was this translation helpful? Give feedback.
Hi @jabogithub,
Really interesting question! I spent some time on this out of interest and managed to get it to work. This lets you rediscover the Mandelbrot set recursive relation from random samples of the Mandelbrot set.
First, here's a Julia version: