-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Warn if better linear model available #536
Comments
Thanks for the feature request @Maltimore. Indeed this could be something to try. Actually, rather than warning the user, one thing that some SR algorithms do is initialise the fit with the best linear model (or even a polynomial fit), so that could be a very cool thing to try!! Should be pretty easy to set up as well. Take a look at the backend here: https://github.com/MilesCranmer/SymbolicRegression.jl The line that initialises the population is https://github.com/MilesCranmer/SymbolicRegression.jl/blob/feab045092498b018b03fbf83e2cfc32543d9775/src/SymbolicRegression.jl#L709-L715. Should be pretty easy to just replace this with the result of a linear fit, and then evolve from there. The only reason I haven't already done this is I have tried to make PySR/SymbolicRegression.jl completely invariant to the choice of operator, rather than have any specific "preference" or special behavior for some. In other words, I have tried to make the operators However I do realise that the vast majority of people are going to be happy if Actually a linear regression with 7 variables would be a complexity of 29 (if you are using default options for complexity) – since it is a binary tree with an additional julia> using SymbolicRegression: Options, Node, compute_complexity
julia> options = Options(binary_operators=(+, *));
julia> x = [Node(Float64; feature=i) for i in 1:7]; # 7 features
julia> A = ones(Float64, 7); b=one(Float64);
julia> A' * x + b # Create expression
((((((((1.0 * x1) + (1.0 * x2)) + (1.0 * x3)) + (1.0 * x4)) + (1.0 * x5)) + (1.0 * x6)) + (1.0 * x7)) + 1.0)
julia> compute_complexity(A' * x + b, options)
29 |
Also if you can make a working example with a different dataset that you can share, that would be much appreciated. It is very weird that it takes 1 hour to evolve a linear equation, perhaps there is something wrong in the hyperparams. Would love to take a look if you can give me an example. |
Just responding to your last comment, because I didn't make that clear before: I didn't set up the symbolic regression to only evolve linear equations, also a few nonlinear operators were allowed. And I used two-fold cross validation with a 2x2 hyperparameter gridsearch, which all in all means that the SR algorithm was run 8 times I guess. In total that needed roughly an hour to run. |
I thought a bit about what you said about special privileges of One idea I just had: keeping the binary |
Feature Request
I haven't thought this issue all the way through yet, but I thought I'd quickly make a github issue before I don't report it at all. But feel free to close this.
I have tested a few algorithms on a regression dataset (unfortunately I can't share the dataset, but it shouldn't be too atypical). It turned out that linear models like ElasticNet or LARS worked better than e.g. symbolic regression with PySR. I don't know exactly what to do in this case, but it felt a little silly that a linear model that takes a few miliseconds to train is better than SR which needed an hour of runtime (including some hyperparameter optimization). The dataset only has 7 variables, so it's not an issue of feature selection.
I guess a linear regression on 7 variables (including intercept) would have a rather high complexity of
2*7 + 1 + 7 = 22
(and would not even be considered with the defaultmaxsize
of 20), so it would likely never "win" against shorter equations in terms of the score?One thing I thought is that one could warn the user that a better linear model is available? The added computational cost would be negligible.
The text was updated successfully, but these errors were encountered: