-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add methods #4
Comments
Another one that would be very nice is PGE (Worm & Chiu, GECCO 2013). Paper: http://seminars.math.binghamton.edu/ComboSem/worm-chiu.pge_gecco2013.pdf |
thanks, I'll reach out. doesn't look like it's being maintained. |
if I may, my algorithm was just accepted for publication: Paper: https://www.mitpressjournals.org/doi/abs/10.1162/evco_a_00285 Even though the code is in Haskell, I have included a Python wrapper in my repository, similar to your wrappers. Let me know if I can be of any help! |
hi @folivetti , thanks for sharing. I'm going to be uploading a contributing guide soon that will detail how to include your method. Please stay tuned. |
hi @folivetti, please see the contributing guide on the dev branch: https://github.com/EpistasisLab/regression-benchmark/blob/dev/CONTRIBUTING.md Eventually this will be merged into master (still working on some hiccups with existing methods), but if you would like to start now, you can issue a PR to contribute your method to the dev branch. Let me know if you have any questions! |
thanks! I guess my code is already halfway through. As soon as I get the time to do so, I'll make the PR. |
I finally took the time to implement the python wrapper for ITEA. I have one final question: my code is written in Haskell using To install the stack environment you only need to run |
Sounds great. The entire install needs to be automated, so yes, any
installation requirement needs to be implemented. When the code is set and
you issue a PR, the repo is set up to test to make sure the installation
passes and a mini benchmark runs without error.
…On Sun, Mar 21, 2021, 4:40 PM Fabricio Olivetti de Franca < ***@***.***> wrote:
I finally took the time to implement the python wrapper for ITEA. I have
one final question: my code is written in Haskell using stack as the
project manager. Should I include the installation of stack into the
install script or should I put this requirement in a README file?
To install the stack environment you only need to run curl -sSL
https://get.haskellstack.org/ | sh, but it may require a sudo permission
since it installs GMP.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSGCGX7637QNJQDCWYQOUDTEZKSHANCNFSM4GYXGDNQ>
.
|
sudo is ok also wanted to mention you can test the install locally, doing something like
also see the github workflow for more info |
@folivetti hope you got my email, but just checking if you think you'll have time to get ITEA integrated this week? many thanks! |
yes, I did receive the e-mail, thanks :-) |
I'm adding a few more methods for future reference.
While it would be nice to have a transparent and objective way to compare all those methods it will probably be impossible to have all SymReg methods included into srbench for various reasons (e.g. closed source, difficulty to provide a Python wrapper, method is tuned to work well for certain problem characteristics, authors not cooperative, ...). Researchers publishing SymReg methods should be made aware of srbench. I argue that we should be increasingly careful about new SymReg methods which are not included in srbench when reviewing or reading papers even when they are published in reputable journals. |
Thanks for the list @gkronber. Deep SR is implemented and i'm working on AI-Feynman |
Hi @lacava et al., thanks for making this benchmark suite, it looks great! I just found out about your efforts on this today, I think it is a great idea. I would be interested in helping add my methods: the Julia library SymbolicRegression.jl (mentioned in @gkronber's post) and the Python frontend PySR which I actively maintain. Before I get started, just to check, would it be doable to include Julia as part of the benchmarking script? Second, what kinds of resources are available for the benchmark? My library tends to find better results the longer it's run for and can be parallelized over multi-node. Third, my methods output a list of equations rather than a single one. Is there I way I can pass the entire list through, or should I make a choice of one equation to pass? Lastly, I was wondering about benchmark coverage: I have a "high-dimensional" SR method described a bit here (https://arxiv.org/abs/2006.11287) which is made for sequences, sets, and graphs. Is there a benchmark included here for high-dimensional SR? Thanks! |
Great! Thanks for reaching out.
We should definitely be able to support Julia. It will be easiest if there is a conda dependency for it. But we also are moving towards a Docker environment eventually.
In our current experiment (Table 2) we set the termination criteria to 500k evaluations per training or 48 hours for the real-world datasets, and 1M evaluations or 8 hours for the synthetic ground-truth datasets. Most of the methods here are parallelizable, but because we're running 252 datasets, 10 trials, and 21 methods, it made more sense to give each a single core. The cluster we used has ~1100 CPU cores.
Only a single final model should be returned. Otherwise it wouldn't be a fair comparison since your method would have several chances to "win". (Incidentally, most of the GP-based SR methods also have a set of models, and use a hold-out set for final model selection. We could think about ways of comparing sets of equations in the future, but don't do so right now.) Also, it would be ideal to return the equation string in sympy-compatible format to avoid a lot of post-processing from the last round.
Currently we've mostly look at tabular data. Have a look at the datasets in PMLB. the widest datasets are ~100s of features. But, we're always looking for good benchmark problems. |
Thanks! This is very helpful. I have a quick followup question about the suite. Are you benchmarking accuracy, or parsimony, or some combination? Or are you evaluating whether the recovered sympy expression is equal? PySR's default choice for "best" is similar to Eureqa, where they look for "cliffs" in the accuracy-vs-parsimony curve. Also - final question - can the model use a different set of hyperparameters for the noisy vs non-noisy dataset? (e.g., to simulate whether the experimenter would know a priori if their data was noisy). Thanks again, |
It's probably worth checking details in the paper. we broke the comparison into real-world/"black-box" problems with no known model, and ground-truth problems generated for known functions. we benchmark accuracy and parsimony in the former case and symbolic equivalence (within a linear transformation of the true model) in the latter.
We don't support this at the moment, but most of the benchmarks have some amount of noise. One of our study findings was that AI-Feynman was particularly sensitive to target label noise. |
Added PySR and SymbolicRegression.jl in this PR: #62. Let me know what else I need to add, thanks! |
Also please consider: from the documentation: Again, I have no relation to the authors and/or copyright holders. |
add SR methods for comparison. the following come to mind:
The text was updated successfully, but these errors were encountered: