Replies: 4 comments 2 replies
-
It depends on how complex you want the equation to be, and also noise, and how many operators you are searching with. But symbolic regression is pretty data-efficient as equations are not that expressive, so you can get away with very very few datapoints. What I would basically do is a train/validation/test split. Train on the train data, and evaluate the model on the validation dataset. Are the predictions good/bad? If the predictions are bad, you need more data. If the predictions are close in performance to the results on the training data, then you probably have enough data. https://scikit-learn.org/stable/modules/cross_validation.html |
Beta Was this translation helpful? Give feedback.
-
We are looking at something like this. 10-12 datapoints, around 2/3 features. Your package provides a loss that can be calculated in train/test, but there are no pvalues right? |
Beta Was this translation helpful? Give feedback.
-
After some work, I have realised that the question is off. Still, for a function like the above. X =[x1,n] and Y. There are many functions that in this range can have the same shape. How do you provide scientifical statistical validity to the equatian? |
Beta Was this translation helpful? Give feedback.
-
In binary classification I have seen some people using Classifier Two Sample Tests:
In binary classification is easier, as it naturally allows for a hypothesis testing. |
Beta Was this translation helpful? Give feedback.
-
I am running an experiment that is quite expensive. Therefore each datapoint takes time and cost budget.
What is the minimum number of datapoints that are needed to have some reliability?
Is there any documentation/references/previous discussion?
Beta Was this translation helpful? Give feedback.
All reactions