First benchmarks for kmeans #1

fcharras · 2023-09-12T18:19:29Z

This setup the file tree for benchmarking KMeans using benchopt.

TODOs:

Currently adding:

PCA benchmarks

In follow-up PRs:

matmul benchmark ?
topk benchmark ?
bfKNN benchmark ?

…rch_engine and scikit-learn-intelex

ogrisel

I think the CI should run the fast variant of the benchmark for all CPU-compatible engines.

EDIT: I see there is already an item in the TODO list:

setup CI that runs all benchmarks on CPU (except those that can't run on CPU)

Other than that, LGTM!

LICENSE.txt

benchmarks/kmeans/objective.py

ogrisel · 2023-09-13T08:45:36Z

setup.cfg

+[flake8]
+# max line length for black
+max-line-length = 88
+target-version = ['py37']


I think we can already target 38 or 39 :)

Also nowadays I would tend to use ruff instead of flake8 but not big deal, especially on a small code base.

I mindlessly copied files that we copied from sklearn at the time when sklearn-numba-dpex was created but it's been evolving since then, we should indeed update and apply ruff on all our repos now...

…direct daal4py calls

…ble structure of the output parquet table

…ng script draft for kmeans.

… the repo

ogrisel · 2023-09-19T09:20:13Z

README.md

+for running the benchmarks from a benchmark file tree, and refer to the documentation
+of the dependencies of the solvers you're interested in running to gather prerequisite
+installation instructions.
+


I think you should give one or two examples of canonical commands (adapted to the folder structure of this repo) to get started here and then refer to the benchopt doc for variations.

I added a link to the github workflow for testing on cpu that's better for a complete practical guide I think ?

README.md

…column with dedicated command parameter to manage the list of known gpu names

…e consolidated files because of missing values and update consolidated csv accordingly

…arks

ogrisel

Some feedback.

Also please include Array API with PyTorch for scikit-learn.

ogrisel · 2023-10-10T13:05:49Z

benchmarks/pca/solvers/cuml.py

+
+        # if tol == 0:
+        #     tol = 1e-16
+        # self.tol = tol


Can be removed.

benchmarks/pca/objective.py

ogrisel · 2023-10-10T13:06:54Z

benchmarks/pca/objective.py

+    parameters = dict(
+        n_components=[10],
+        whiten=[False],
+        tol=[0.0],


Which solver requires a tol parameter?

The 3 current solvers expose the parameters and it's only used by sklearn(/ex)'s arpack and cupy's jacobi

ogrisel · 2023-10-10T13:09:43Z

benchmarks/pca/solvers/cuml.py

+        random_state,
+        verbose,
+    ):
+        if self.device == "cpu":


I don't think cuml.decomposition.PCA can ever run on CPU. It can accept host-allocated inputs but it will do the device-allocation + copy automatically in that case.

In my opinion, let's not waste benchmark time and reporting readability to measure this: for the cuml case, I would only run the device == "gpu" case and remove the fake device == "cpu" case.

(I don't really now the general ins and outs but cuml is introducing experimental cpu / gpu device selection and PCA is compatible so maybe at some point it could be included.)

benchmarks/pca/solvers/scikit_learn_intelex.py

benchmarks/pca/datasets/simulated_blobs.py

ogrisel · 2023-10-10T13:17:03Z

benchmarks/pca/solvers/scikit_learn.py

+
+    parameters = dict(
+        svd_solver=["full", "arpack", "randomized"],
+        power_iteration_normalizer=["QR", "LU", "none"],


I would rather not bench all those combinations.

For randomized we could set power_iteration_normalizer="LU" when using numpy and power_iteration_normalizer="QR" when using Array API (PyTorch).

benchmarks/pca/solvers/scikit_learn_intelex.py

… and pca - fix pca tests

…README

fcharras · 2023-10-11T16:06:41Z

For KMeans the result spreadsheet is starting to look good after the latest debug.

Still missing:

sklearn_pytorch_engine on cpu and xpu
sklearn_pytorch_engine on M1 for large dataset (10_000_000 and 50_000_000)
benchmarks on my laptop (8 cores / iGPU)

Add kmeans solvers for scikit-learn, sklearn_numba_dpex, sklearn_pyto…

27bd685

…rch_engine and scikit-learn-intelex

fcharras force-pushed the first_kmeans_benchmarks branch from 4dcd241 to 27bd685 Compare September 12, 2023 18:23

ogrisel approved these changes Sep 13, 2023

View reviewed changes

fcharras added 2 commits September 13, 2023 10:59

Use scikit-learn-intelex interface rather than digging internals for …

0698c7f

…direct daal4py calls

add cuml solver

a345b04

fcharras force-pushed the first_kmeans_benchmarks branch from a23f683 to a345b04 Compare September 13, 2023 12:52

Add kmeans_dpcpp solver

9eac893

fcharras force-pushed the first_kmeans_benchmarks branch 2 times, most recently from 677f367 to dd4a1a3 Compare September 14, 2023 08:45

Add CI run for kmeans cpu benchmarks for sklearn numba dpex

2fabb18

fcharras force-pushed the first_kmeans_benchmarks branch 11 times, most recently from 784002f to 75407b0 Compare September 14, 2023 15:00

Run all backends compatible with CPU in CI

5d9c3a0

fcharras force-pushed the first_kmeans_benchmarks branch from 75407b0 to 5d9c3a0 Compare September 14, 2023 17:09

fcharras added 2 commits September 15, 2023 09:59

have return all parameters for all runs in a dict for a better browsa…

b19f2ae

…ble structure of the output parquet table

Expose more strutured metadata in the parquet table, + parquet cleani…

6f9622b

…ng script draft for kmeans.

fcharras force-pushed the first_kmeans_benchmarks branch 3 times, most recently from 8ad78fd to 6f9622b Compare September 15, 2023 14:18

Expose a command to maintain the csv database of benchmark results in…

ed66d5a

… the repo

fcharras force-pushed the first_kmeans_benchmarks branch from 115e118 to cae8cbc Compare September 18, 2023 15:20

ogrisel reviewed Sep 19, 2023

View reviewed changes

fcharras and others added 14 commits September 19, 2023 17:12

Add missing columns in the database of results, including a gpu name …

07fce26

…column with dedicated command parameter to manage the list of known gpu names

Update result.csv for kmeans

5ee2c73

Initiate result csv with real data + minor tweaks

ee02e86

Remove minimum benchopt version requirement

159bbbd

Consolidate Apple M1 results for the small dataset kmeans benchmark

b8493d0

Add the raw benchopt output for the Apple M1 run

13064b1

Update csv results with appropriate n cpus fixed to 32 with taskset

7a7dbfc

Consolidate Apple M1 and sklearn-numba-dpex CPU on IDC

4369b2f

Consolidate IDC results on 10000000 samples

937ba81

Fix a bug were some benchopt parquet files could duplicate rows in th…

b24fbb2

…e consolidated files because of missing values and update consolidated csv accordingly

Add IDC results on 50_000_000 samples

646086f

Revert license change

15635ba

Fix date format issues in consolidation script / add some cuda benchm…

5fb36d5

…arks

Add version information to solver metadata

b6fae1b

fcharras force-pushed the first_kmeans_benchmarks branch from 414b7d2 to 2854cb6 Compare October 7, 2023 08:59

Bump versions, install benchopt from pypi rather than build from source

24bf8c0

fcharras force-pushed the first_kmeans_benchmarks branch from 2854cb6 to 24bf8c0 Compare October 7, 2023 22:33

WIP: add PCA

dd5c0fe

ogrisel reviewed Oct 10, 2023

View reviewed changes

fcharras added 6 commits October 10, 2023 18:06

remove uninteresting parameter combinations for benchmarks for kmeans…

1ed7c27

… and pca - fix pca tests

Add consolidate result script for PCA

ad2c2ca

List PCA benchmark in README section

e897181

Adress a few remaining comments, add links to benchopt howtos in the …

36eae32

…README

Fix sanity check workflow yaml

b748755

Upload clean results

c8ea93e

Remove PCA additions before first merge for KMeans

dd80dab

fcharras merged commit 88357ca into main Oct 11, 2023

fcharras deleted the first_kmeans_benchmarks branch October 24, 2023 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First benchmarks for kmeans #1

First benchmarks for kmeans #1

fcharras commented Sep 12, 2023 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel Sep 13, 2023

ogrisel Sep 13, 2023

fcharras Sep 13, 2023

ogrisel Sep 19, 2023 •

edited

Loading

fcharras Oct 11, 2023

ogrisel left a comment

ogrisel Oct 10, 2023

ogrisel Oct 10, 2023

fcharras Oct 10, 2023 •

edited

Loading

ogrisel Oct 10, 2023

fcharras Oct 10, 2023 •

edited

Loading

ogrisel Oct 10, 2023

fcharras commented Oct 11, 2023 •

edited

Loading

First benchmarks for kmeans #1

First benchmarks for kmeans #1

Conversation

fcharras commented Sep 12, 2023 • edited Loading

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcharras Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcharras Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcharras commented Oct 11, 2023 • edited Loading

fcharras commented Sep 12, 2023 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel Sep 19, 2023 •

edited

Loading

fcharras Oct 10, 2023 •

edited

Loading

fcharras Oct 10, 2023 •

edited

Loading

fcharras commented Oct 11, 2023 •

edited

Loading