Skip to content

Commit

Permalink
Merge pull request #2 from soda-inria/fea/add_pca_benchmarks
Browse files Browse the repository at this point in the history
Add PCA benchmarks
Adds a new tab "PCA" to the spreadsheet.
  • Loading branch information
fcharras authored Oct 24, 2023
2 parents ed30c76 + 5191783 commit 8d29293
Show file tree
Hide file tree
Showing 19 changed files with 1,236 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ jobs:
- name: Check sanity of benchmark files
run: |
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv --check-csv
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv --check-csv
5 changes: 4 additions & 1 deletion .github/workflows/sync_benchmark_files_to_gsheet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches: main

jobs:
sync_kmeans_benchmark_result_file_with_gsheet:
sync_benchmark_result_file_with_gsheet:
name: Run synchronization script
runs-on: ubuntu-latest
environment: Publish
Expand All @@ -23,6 +23,9 @@ jobs:
GSPREAD_URL: ${{vars.GSPREAD_URL}}
run: |
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv --check-csv
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv --check-csv
echo "$GSPREAD_SERVICE_ACCOUNT_AUTH_KEY" > service_account.json
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv \
--sync-to-gspread --gspread-url $GSPREAD_URL --gspread-auth-key ./service_account.json
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv \
--sync-to-gspread --gspread-url $GSPREAD_URL --gspread-auth-key ./service_account.json
4 changes: 3 additions & 1 deletion .github/workflows/test_cpu_benchmarks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,9 @@ jobs:
- name: Check oneapi default device
run: python -c "import dpctl; dpctl.select_default_device().print_device_info()"

- name: Run k-means benchmarks
- name: Run benchmarks
run: |
cd benchmarks/kmeans
PYTHONPATH=$PYTHONPATH:$(realpath ../../kmeans_dpcpp/) benchopt run --no-plot -l -d Simulated_correlated_data[n_samples=1000,n_features=14]
cd ../pca
benchopt run --no-plot -l -d Simulated_correlated_data[n_samples=100,n_features=100]
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ hardware.

Benchmarks are currently available for the following algorithms:
- [k-means](https://github.com/soda-inria/sklearn-engine-benchmarks/tree/main/benchmarks/kmeans)
- [PCA](https://github.com/soda-inria/sklearn-engine-benchmarks/tree/main/benchmarks/pca)

Here is a (non-exhaustive) list of libraries that are compared in the benchmarks:
- [scikit-learn](https://scikit-learn.org/stable/index.html)
Expand Down
20 changes: 14 additions & 6 deletions benchmarks/kmeans/consolidate_result_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,18 +381,30 @@ def _gspread_sync(source, gspread_url, gspread_auth_key):
gs = gspread.service_account(gspread_auth_key)
sheet = gs.open_by_url(gspread_url)

global_range = (
f"{gspread.utils.rowcol_to_a1(1, 1)}:"
f"{gspread.utils.rowcol_to_a1(n_rows + 1, n_cols)}"
)

try:
worksheet = sheet.worksheet(GOOGLE_WORKSHEET_NAME)
worksheet.clear()
worksheet.clear_basic_filter()
worksheet.freeze(0, 0)
worksheet.resize(rows=n_rows + 1, cols=n_cols)
worksheet.clear_notes(global_range)
white_background = dict(
backgroundColorStyle=dict(rgbColor=dict(red=1, green=1, blue=1, alpha=1))
)
worksheet.format(global_range, white_background)
except gspread.WorksheetNotFound:
worksheet = sheet.add_worksheet(
GOOGLE_WORKSHEET_NAME, rows=n_rows + 1, cols=n_cols
)
# ensure worksheets are sorted alphabetically
sheet.reorder_worksheets(sorted(sheet.worksheets(), key=attrgetter("title")))
# ensure worksheets are sorted anti-alphabetically
sheet.reorder_worksheets(
sorted(sheet.worksheets(), key=attrgetter("title"), reverse=True)
)

# upload all values
worksheet.update(
Expand All @@ -413,10 +425,6 @@ def _gspread_sync(source, gspread_url, gspread_auth_key):
verticalAlignment="MIDDLE",
wrapStrategy="WRAP",
)
global_range = (
f"{gspread.utils.rowcol_to_a1(1, 1)}:"
f"{gspread.utils.rowcol_to_a1(n_rows + 1, n_cols)}"
)
format_queries.append(dict(range=global_range, format=global_format))

# benchmark_id and walltime columns are bold
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/kmeans/solvers/sklearn_pytorch_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def set_objective(
device = self.device
# Copy the data before running the benchmark to ensure that no unfortunate side
# effects can happen
self.X = torch.asarray(X, copy=True, device=self.device)
self.X = torch.asarray(X, copy=True, device=device)

if hasattr(sample_weight, "copy"):
sample_weight = torch.asarray(sample_weight, copy=True, device=device)
Expand Down
Loading

0 comments on commit 8d29293

Please sign in to comment.