add methods #4

lacava · 2019-02-20T18:02:46Z

add SR methods for comparison. the following come to mind:

jmmcd · 2019-03-05T11:27:57Z

Another one that would be very nice is PGE (Worm & Chiu, GECCO 2013).

Paper: http://seminars.math.binghamton.edu/ComboSem/worm-chiu.pge_gecco2013.pdf
Code: https://github.com/verdverm/pypge

lacava · 2019-03-05T16:59:04Z

thanks, I'll reach out. doesn't look like it's being maintained.

folivetti · 2020-12-11T20:19:31Z

if I may, my algorithm was just accepted for publication:

Paper: https://www.mitpressjournals.org/doi/abs/10.1162/evco_a_00285
Code: https://github.com/folivetti/ITEA

Even though the code is in Haskell, I have included a Python wrapper in my repository, similar to your wrappers. Let me know if I can be of any help!

lacava · 2020-12-21T19:49:42Z

hi @folivetti , thanks for sharing. I'm going to be uploading a contributing guide soon that will detail how to include your method. Please stay tuned.

lacava · 2021-01-13T16:21:48Z

hi @folivetti, please see the contributing guide on the dev branch: https://github.com/EpistasisLab/regression-benchmark/blob/dev/CONTRIBUTING.md

Eventually this will be merged into master (still working on some hiccups with existing methods), but if you would like to start now, you can issue a PR to contribute your method to the dev branch. Let me know if you have any questions!

folivetti · 2021-01-14T11:54:26Z

thanks! I guess my code is already halfway through. As soon as I get the time to do so, I'll make the PR.

folivetti · 2021-03-21T20:39:48Z

I finally took the time to implement the python wrapper for ITEA. I have one final question: my code is written in Haskell using stack as the project manager. Should I include the installation of stack into the install script or should I put this requirement in a README file?

To install the stack environment you only need to run curl -sSL https://get.haskellstack.org/ | sh, but it may require a sudo permission since it installs GMP.

lacava · 2021-03-21T21:54:17Z

Sounds great. The entire install needs to be automated, so yes, any installation requirement needs to be implemented. When the code is set and you issue a PR, the repo is set up to test to make sure the installation passes and a mini benchmark runs without error.

…

On Sun, Mar 21, 2021, 4:40 PM Fabricio Olivetti de Franca < ***@***.***> wrote: I finally took the time to implement the python wrapper for ITEA. I have one final question: my code is written in Haskell using stack as the project manager. Should I include the installation of stack into the install script or should I put this requirement in a README file? To install the stack environment you only need to run curl -sSL https://get.haskellstack.org/ | sh, but it may require a sudo permission since it installs GMP. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSGCGX7637QNJQDCWYQOUDTEZKSHANCNFSM4GYXGDNQ> .

lacava · 2021-03-22T15:57:49Z

To install the stack environment you only need to run curl -sSL https://get.haskellstack.org/ | sh, but it may require a sudo permission since it installs GMP.

sudo is ok

also wanted to mention you can test the install locally, doing something like

./configure
./install 
cd experiment
python -m pytest -v

also see the github workflow for more info

lacava · 2021-04-17T20:42:20Z

@folivetti hope you got my email, but just checking if you think you'll have time to get ITEA integrated this week? many thanks!

folivetti · 2021-04-17T21:06:22Z

@folivetti hope you got my email, but just checking if you think you'll have time to get ITEA integrated this week? many thanks!

yes, I did receive the e-mail, thanks :-)
I have everything ready and should make the PR tomorrow. I'm just running some tests to double check that everything works. Thanks!

gkronber · 2021-05-04T06:31:27Z

I'm adding a few more methods for future reference.

Equation Learner (EQL) https://github.com/martius-lab/EQL (already has Python code)
SymbolicRegression.jl https://github.com/MilesCranmer/SymbolicRegression.jl
Continued Fraction Regression (CFR, I reached out to the corresponding author https://www.sciencedirect.com/science/article/abs/pii/S0957417421004590)
Universal Functions Originator (UFO, https://ieeexplore.ieee.org/abstract/document/9409797)
AI Feynman [2.0] (https://advances.sciencemag.org/content/6/16/eaay2631)
Feyn / QLattice Regression (https://arxiv.org/abs/2104.05417, https://arxiv.org/abs/2103.15147)
"Accelerating symbolic regression with deep learning" (https://github.com/twhughes/Symbolic-Regression)
Deep symbolic regression (https://arxiv.org/abs/1912.04871)

While it would be nice to have a transparent and objective way to compare all those methods it will probably be impossible to have all SymReg methods included into srbench for various reasons (e.g. closed source, difficulty to provide a Python wrapper, method is tuned to work well for certain problem characteristics, authors not cooperative, ...).

Researchers publishing SymReg methods should be made aware of srbench. I argue that we should be increasingly careful about new SymReg methods which are not included in srbench when reviewing or reading papers even when they are published in reputable journals.

lacava · 2021-05-04T13:23:21Z

Thanks for the list @gkronber. Deep SR is implemented and i'm working on AI-Feynman

MilesCranmer · 2021-12-12T01:44:35Z

Hi @lacava et al., thanks for making this benchmark suite, it looks great! I just found out about your efforts on this today, I think it is a great idea.

I would be interested in helping add my methods: the Julia library SymbolicRegression.jl (mentioned in @gkronber's post) and the Python frontend PySR which I actively maintain. Before I get started, just to check, would it be doable to include Julia as part of the benchmarking script?

Second, what kinds of resources are available for the benchmark? My library tends to find better results the longer it's run for and can be parallelized over multi-node.

Third, my methods output a list of equations rather than a single one. Is there I way I can pass the entire list through, or should I make a choice of one equation to pass?

Lastly, I was wondering about benchmark coverage: I have a "high-dimensional" SR method described a bit here (https://arxiv.org/abs/2006.11287) which is made for sequences, sets, and graphs. Is there a benchmark included here for high-dimensional SR?

Thanks!
Miles

lacava · 2021-12-13T15:13:23Z

Hi @lacava et al., thanks for making this benchmark suite, it looks great! I just found out about your efforts on this today, I think it is a great idea.

Great! Thanks for reaching out.

I would be interested in helping add my methods: the Julia library SymbolicRegression.jl (mentioned in @gkronber's post) and the Python frontend PySR which I actively maintain. Before I get started, just to check, would it be doable to include Julia as part of the benchmarking script?

We should definitely be able to support Julia. It will be easiest if there is a conda dependency for it. But we also are moving towards a Docker environment eventually.

Second, what kinds of resources are available for the benchmark? My library tends to find better results the longer it's run for and can be parallelized over multi-node.

In our current experiment (Table 2) we set the termination criteria to 500k evaluations per training or 48 hours for the real-world datasets, and 1M evaluations or 8 hours for the synthetic ground-truth datasets.

Most of the methods here are parallelizable, but because we're running 252 datasets, 10 trials, and 21 methods, it made more sense to give each a single core. The cluster we used has ~1100 CPU cores.

Third, my methods output a list of equations rather than a single one. Is there I way I can pass the entire list through, or should I make a choice of one equation to pass?

Only a single final model should be returned. Otherwise it wouldn't be a fair comparison since your method would have several chances to "win". (Incidentally, most of the GP-based SR methods also have a set of models, and use a hold-out set for final model selection. We could think about ways of comparing sets of equations in the future, but don't do so right now.)

Also, it would be ideal to return the equation string in sympy-compatible format to avoid a lot of post-processing from the last round.

Lastly, I was wondering about benchmark coverage: I have a "high-dimensional" SR method described a bit here (https://arxiv.org/abs/2006.11287) which is made for sequences, sets, and graphs. Is there a benchmark included here for high-dimensional SR?

Currently we've mostly look at tabular data. Have a look at the datasets in PMLB. the widest datasets are ~100s of features. But, we're always looking for good benchmark problems.

MilesCranmer · 2021-12-13T19:07:43Z

Thanks! This is very helpful.

I have a quick followup question about the suite. Are you benchmarking accuracy, or parsimony, or some combination? Or are you evaluating whether the recovered sympy expression is equal? PySR's default choice for "best" is similar to Eureqa, where they look for "cliffs" in the accuracy-vs-parsimony curve.

Also - final question - can the model use a different set of hyperparameters for the noisy vs non-noisy dataset? (e.g., to simulate whether the experimenter would know a priori if their data was noisy).

Thanks again,
Miles

lacava · 2021-12-14T14:55:02Z

Are you benchmarking accuracy, or parsimony, or some combination? Or are you evaluating whether the recovered sympy expression is equal? PySR's default choice for "best" is similar to Eureqa, where they look for "cliffs" in the accuracy-vs-parsimony curve.

It's probably worth checking details in the paper. we broke the comparison into real-world/"black-box" problems with no known model, and ground-truth problems generated for known functions. we benchmark accuracy and parsimony in the former case and symbolic equivalence (within a linear transformation of the true model) in the latter.

Also - final question - can the model use a different set of hyperparameters for the noisy vs non-noisy dataset? (e.g., to simulate whether the experimenter would know a priori if their data was noisy).

We don't support this at the moment, but most of the benchmarks have some amount of noise. One of our study findings was that AI-Feynman was particularly sensitive to target label noise.

MilesCranmer · 2022-01-15T04:07:49Z

Added PySR and SymbolicRegression.jl in this PR: #62. Let me know what else I need to add, thanks!

fnpdaml · 2023-03-02T16:16:24Z

add SR methods for comparison. the following come to mind:

FFX

Feat

EFS

FEW

TPOT? Not symbolic regression per se but interesting to include.

https://github.com/ying531/MCMC-SymReg

operon (added to dev)

differentiable CGP

gplearn

Also please consider:
"TuringBot"
https://turingbotsoftware.com/
(Free version is limited to max. 50 rows of input data and max. 3 variables)
nevertheless, best success ratio from my empirical, personal usage.

from the documentation:
"TuringBot is also a console application that can be executed in a fully automated and customizable way"
https://turingbotsoftware.com/documentation.html#command-line

Again, I have no relation to the authors and/or copyright holders.
Cheers.

lacava added help wanted Extra attention is needed enhancement New feature or request labels Feb 20, 2019

lacava mentioned this issue Dec 1, 2020

Refactor validation #11

Merged

lacava mentioned this issue May 7, 2021

Add AI-feynman #38

Merged

lacava mentioned this issue Feb 4, 2023

Have you considered including TensorGP? #138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add methods #4

add methods #4

lacava commented Feb 20, 2019 •

edited

Loading

jmmcd commented Mar 5, 2019 •

edited

Loading

lacava commented Mar 5, 2019

folivetti commented Dec 11, 2020

lacava commented Dec 21, 2020

lacava commented Jan 13, 2021

folivetti commented Jan 14, 2021

folivetti commented Mar 21, 2021

lacava commented Mar 21, 2021 via email

lacava commented Mar 22, 2021

lacava commented Apr 17, 2021

folivetti commented Apr 17, 2021

gkronber commented May 4, 2021 •

edited

Loading

lacava commented May 4, 2021

MilesCranmer commented Dec 12, 2021

lacava commented Dec 13, 2021

MilesCranmer commented Dec 13, 2021

lacava commented Dec 14, 2021

MilesCranmer commented Jan 15, 2022

fnpdaml commented Mar 2, 2023 •

edited

Loading

add methods #4

add methods #4

Comments

lacava commented Feb 20, 2019 • edited Loading

jmmcd commented Mar 5, 2019 • edited Loading

lacava commented Mar 5, 2019

folivetti commented Dec 11, 2020

lacava commented Dec 21, 2020

lacava commented Jan 13, 2021

folivetti commented Jan 14, 2021

folivetti commented Mar 21, 2021

lacava commented Mar 21, 2021 via email

lacava commented Mar 22, 2021

lacava commented Apr 17, 2021

folivetti commented Apr 17, 2021

gkronber commented May 4, 2021 • edited Loading

lacava commented May 4, 2021

MilesCranmer commented Dec 12, 2021

lacava commented Dec 13, 2021

MilesCranmer commented Dec 13, 2021

lacava commented Dec 14, 2021

MilesCranmer commented Jan 15, 2022

fnpdaml commented Mar 2, 2023 • edited Loading

lacava commented Feb 20, 2019 •

edited

Loading

jmmcd commented Mar 5, 2019 •

edited

Loading

gkronber commented May 4, 2021 •

edited

Loading

fnpdaml commented Mar 2, 2023 •

edited

Loading