-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New datasets + reorganization of current benchmarks #153
Comments
Like we discussed in our meeting, I'll make a first pass on the current results and identify potential datasets to remove from the benchmark and some possibilities to select and categorize them. |
@lacava @foolnotion @MilesCranmer
I've also checked the uniqueness of the Z+ datasets:
Some of them looks like a multiclass problem. Plotting the median of medians of R^2 errorbar we get: indeed most algorithms perform quite well for My suggestion is that we may remove those from SRBench 3.0 and reinsert them in SRBench 4.0 under a different track (non-gaussian distributions). In another analysis, I picked the top-10 algorithms wrt r2_test and removed from the list of datasets those that:
With this procedure we end up with 85 datasets. If we only keep those with domain |
Some remarks on:
These files (and many others in PMLB) were most likely taken verbatim from StatLib (http://lib.stat.cmu.edu/datasets/) which references the sources. Also relevant is the effort by @alexzwanenburg (EpistasisLab/pmlb#180) who invested a lot of time to identify duplicates and clean up some of the datasets in PMLB. |
Thanks @gkronber this was going to be my next step (search duplication), so this PR will make my life much easier :-) |
|
On a recent study (https://dl.acm.org/doi/abs/10.1145/3597312) I've noticed that the difference between the top-N (N = 15 or more) algorithms in most datasets are insignificant. They only differ on a small selection of the Friedman datasets. Maybe it is a good idea to separate the comparison of the algorithms in different groups:
Given this, my other proposal is to add the benchmarks of those two competitions into the benchmark and the one proposed by @MilesCranmer. For the 2023 competition I can also generate datasets with different levels of noise and other nasty features! Also, we can grab other benchmark functions from multimodal optimization to create more of those.
The text was updated successfully, but these errors were encountered: