-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-sklearn baseline v1 #32
Open
AxiomAlive
wants to merge
66
commits into
main
Choose a base branch
from
auto-sklearn_baseline
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
107714e
create requirements.txt
MorrisNein e67bde8
move to FEDOT 0.7.0
MorrisNein 2458654
create Dockerfile
MorrisNein e8fee30
prepare experiment demo
MorrisNein 5fb00f0
adapt to FEDOT 0.7.0 again
MorrisNein 310a578
fix similarity assessors
MorrisNein ae9c909
allow PymfeExtractor fill values with median
MorrisNein 60dc77a
use FEDOT version with fixed initial assumptions
MorrisNein cf25066
optional cache usage for MFE extractor
MorrisNein a5a0c8a
allow to advise only the n best models
MorrisNein 3bfaf50
finalize experiment
MorrisNein 75ea275
finalize experiment [2]
MorrisNein 8f29cf7
wrap & log exceptions; log progress to file
MorrisNein 168a4dd
update requirements.txt
MorrisNein 1ae8511
update timeouts
MorrisNein cc71c47
remove GOLEM from requirements.txt to inherit version required by FEDOT
MorrisNein 1e7be91
clean openml cache
MorrisNein a10174c
update Dockerfile
MorrisNein a309eef
make experiment safer
MorrisNein 066cd3e
add .dockerignore
MorrisNein 69b4915
fix save path
MorrisNein b490f05
Making code more reusable and qualitative
AxiomAlive e7e4bf8
Adding auto-sklearn run script with an example
AxiomAlive 1dead9c
Merge branch 'dont_download_cached_dataset_qualities' of github.com:I…
AxiomAlive 7f74e70
move to FEDOT 0.7.0
MorrisNein 94e0afa
create Dockerfile
MorrisNein d24247f
prepare experiment demo
MorrisNein c4b3f91
fix similarity assessors
MorrisNein e0661f3
allow PymfeExtractor fill values with median
MorrisNein 4f10b03
use FEDOT version with fixed initial assumptions
MorrisNein a78be30
optional cache usage for MFE extractor
MorrisNein 9bf6d97
allow to advise only the n best models
MorrisNein fdee481
finalize experiment
MorrisNein 169ab3e
finalize experiment [2]
MorrisNein 1270d80
wrap & log exceptions; log progress to file
MorrisNein a796ea7
update timeouts
MorrisNein 8665204
remove GOLEM from requirements.txt to inherit version required by FEDOT
MorrisNein 0f5ac53
clean openml cache
MorrisNein 6eddbb1
update Dockerfile
MorrisNein d8bd536
make experiment safer
MorrisNein 36c1d01
add .dockerignore
MorrisNein 29b8cb9
fix save path
MorrisNein e7b7861
Merging remote
AxiomAlive fbe04ea
Resolving conflict
AxiomAlive ac060ee
add logging in PymfeExtractor
MorrisNein 7c42e79
add intelligent datasets train/test split
MorrisNein cb11a3c
Refactor data storage (#15)
MorrisNein 0b9ed49
Auto-sklearn baseline in a progress
AxiomAlive 42e343b
WIP: auto-sklearn baseline
AxiomAlive 6c5e4b8
examples/4_advising_models conflict resolving
AxiomAlive 26d57b8
Implemented Auto-sklearn baseline.
AxiomAlive 5c10658
fix inner components
MorrisNein e2c1b89
separate framework cache from other data
MorrisNein 20fb439
use yaml config for the experiment
MorrisNein d4d50ce
refactor run.py
MorrisNein e581c9e
update requirements
MorrisNein 2f8b409
Removing IDE configuration files.
AxiomAlive fc105d2
Conflict resolving
AxiomAlive 67812b7
make absolute path to config.yaml
MorrisNein 4a0b144
fix train test split
MorrisNein 44857b0
refactor for frequent results saving
MorrisNein 68a2443
fix logging
MorrisNein b4c714f
Adding an AutoML baseline class
AxiomAlive 645a98f
Reflecting API changes in an asklearn baseline
AxiomAlive 6b91ca9
Merge pull request #37 from ITMO-NSS-team/automl_baseline
AxiomAlive 359a4ce
Merge branch 'docker_and_experiments' into auto-sklearn_baseline
AxiomAlive File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Config & info files | ||
.pep8speaks.yml | ||
Dockerfile | ||
LICENSE | ||
README.md | ||
|
||
# Unnecessary files | ||
examples | ||
notebooks | ||
test | ||
|
||
# User data | ||
data/cache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Download base image ubuntu 20.04 | ||
FROM ubuntu:20.04 | ||
|
||
# For apt to be noninteractive | ||
ENV DEBIAN_FRONTEND noninteractive | ||
ENV DEBCONF_NONINTERACTIVE_SEEN true | ||
|
||
# Preseed tzdata, update package index, upgrade packages and install needed software | ||
RUN truncate -s0 /tmp/preseed.cfg; \ | ||
echo "tzdata tzdata/Areas select Europe" >> /tmp/preseed.cfg; \ | ||
echo "tzdata tzdata/Zones/Europe select Berlin" >> /tmp/preseed.cfg; \ | ||
debconf-set-selections /tmp/preseed.cfg && \ | ||
rm -f /etc/timezone /etc/localtime && \ | ||
apt-get update && \ | ||
apt-get install -y nano && \ | ||
apt-get install -y mc && \ | ||
apt-get install -y python3.9 python3-pip && \ | ||
apt-get install -y git && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
# Set the workdir | ||
ENV WORKDIR /home/meta-automl-research | ||
WORKDIR $WORKDIR | ||
COPY . $WORKDIR | ||
|
||
RUN pip3 install pip && \ | ||
pip install wheel && \ | ||
pip install --trusted-host pypi.python.org -r ${WORKDIR}/requirements.txt | ||
|
||
ENV PYTHONPATH $WORKDIR |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
import csv | ||
import time | ||
|
||
from typing import Any, Tuple, Dict | ||
|
||
import numpy as np | ||
import logging | ||
|
||
import autosklearn.classification | ||
import autosklearn.ensembles | ||
|
||
from sklearn import model_selection, metrics | ||
|
||
from baselines.automl_baseline import AutoMLBaseline | ||
from meta_automl.data_preparation.datasets_loaders import OpenMLDatasetsLoader | ||
from meta_automl.data_preparation.models_loaders import KnowledgeBaseModelsLoader | ||
from autosklearn.classification import AutoSklearnClassifier | ||
|
||
|
||
class AutoSklearnBaseline(AutoMLBaseline): | ||
def __init__(self, ensemble_type, time_limit): | ||
self.estimator = AutoSklearnClassifier( | ||
ensemble_class=ensemble_type, | ||
time_left_for_this_task=time_limit, | ||
) | ||
self.knowledge_base_loader = KnowledgeBaseModelsLoader() | ||
|
||
@staticmethod | ||
def make_quality_metric_estimates(y, predictions, prediction_proba, is_multi_label): | ||
""" Compute roc_auc, f1, accuracy, log_loss and precision scores. """ | ||
results = { | ||
'roc_auc': -1 * float( | ||
"{:.3f}".format( | ||
metrics.roc_auc_score( | ||
y, | ||
prediction_proba if is_multi_label else predictions, | ||
multi_class='ovr' | ||
) | ||
) | ||
), | ||
'f1': -1 * float( | ||
"{:.3f}".format( | ||
metrics.f1_score( | ||
y, | ||
predictions, | ||
average='macro' if is_multi_label else 'binary' | ||
) | ||
) | ||
), | ||
'accuracy': -1 * float( | ||
"{:.3f}".format( | ||
metrics.accuracy_score( | ||
y, | ||
predictions | ||
) | ||
) | ||
), | ||
'logloss': float( | ||
"{:.3f}".format( | ||
metrics.log_loss( | ||
y, | ||
prediction_proba if is_multi_label else predictions | ||
) | ||
) | ||
), | ||
'precision': -1 * float( | ||
"{:.3f}".format( | ||
metrics.precision_score( | ||
y, | ||
predictions, | ||
average='macro' if is_multi_label else 'binary', | ||
labels=np.unique(predictions) | ||
) | ||
) | ||
) | ||
} | ||
return results | ||
|
||
def run(self): | ||
""" Fit auto-sklearn meta-optimizer to knowledge base datasets and output a single best model. """ | ||
dataset_ids_to_load = [ | ||
dataset_id for dataset_id in self.knowledge_base_loader | ||
.parse_datasets('test') | ||
.loc[:, 'dataset_id'] | ||
] | ||
# dataset_ids_to_load = [dataset_ids_to_load[dataset_ids_to_load.index(41166)]] | ||
|
||
loaded_datasets = OpenMLDatasetsLoader().load(dataset_ids_to_load) | ||
|
||
for iteration, dataset in enumerate(loaded_datasets): | ||
logging.log(logging.INFO, f"Loaded dataset name: {dataset.name}") | ||
dataset_data = dataset.get_data() | ||
|
||
X_train, X_test, y_train, y_test = model_selection.train_test_split( | ||
dataset_data.x, | ||
dataset_data.y, | ||
test_size=0.2, | ||
random_state=42, | ||
stratify=dataset_data.y | ||
) | ||
|
||
fitting_start_time = time.time() | ||
ensemble = self.estimator.fit(X_train, y_train) | ||
fitting_time = time.time() - fitting_start_time | ||
logging.log(logging.INFO, f"Fitting time is {fitting_time}sec") | ||
|
||
inference_start_time = time.time() | ||
predicted_results = self.estimator.predict(X_test) | ||
inference_time = time.time() - inference_start_time | ||
logging.log(logging.INFO, f"Inference time is {inference_time}sec") | ||
|
||
predicted_probabilities = self.estimator.predict_proba(X_test) | ||
|
||
best_single_model = list(ensemble.show_models().values())[0].get('sklearn_classifier') | ||
|
||
# autosklearn_ensemble = pipeline.show_models() | ||
# formatted_ensemble = { | ||
# model_id: { | ||
# 'rank': autosklearn_ensemble[model_id].get('rank'), | ||
# 'cost': float(f"{autosklearn_ensemble[model_id].get('cost'):.3f}"), | ||
# 'ensemble_weight': autosklearn_ensemble[model_id].get('ensemble_weight'), | ||
# 'model': autosklearn_ensemble[model_id].get('sklearn_classifier') | ||
# } for model_id in autosklearn_ensemble.keys() | ||
# } | ||
Comment on lines
+116
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Комментарии с кодом лучше не добавлять в git, если они не служат для понимания остального кода. Можно сделать stash этих изменений, если есть необходимость их сохранить Аналогично для остального закомментированного кода |
||
|
||
general_run_info = { | ||
'dataset_id': dataset.id_, | ||
'dataset_name': dataset.name, | ||
'run_label': 'Auto-sklearn', | ||
} | ||
|
||
is_multilabel_classification = True if len(set(predicted_results)) > 2 else False | ||
quality_metric_estimates = AutoSklearnBaseline.make_quality_metric_estimates( | ||
y_test, | ||
predicted_results, | ||
predicted_probabilities, | ||
is_multilabel_classification | ||
) | ||
|
||
model_dependent_run_info = { | ||
'fit_time': float(f'{fitting_time:.1f}'), | ||
'inference_time': float(f'{inference_time:.1f}'), | ||
'model_str': repr(best_single_model) | ||
} | ||
|
||
results = {**general_run_info, **quality_metric_estimates, **model_dependent_run_info} | ||
|
||
# for key in autosklearn_ensemble.keys(): | ||
# ensemble_model = autosklearn_ensemble[key] | ||
# formatted_ensemble = results['ensemble'] | ||
# for model_id in formatted_ensemble.keys(): | ||
# formatted_ensemble[model_id] = ensemble_model.get("rank", None) | ||
|
||
AutoSklearnBaseline.save_on_disk(results.valuess()) | ||
|
||
return results | ||
|
||
@staticmethod | ||
def save_on_disk(data): | ||
with open('data/experimental_data.csv', 'a', newline='') as file: | ||
writer = csv.writer(file, delimiter=',') | ||
writer.writerow(data) | ||
|
||
|
||
if __name__ == '__main__': | ||
AutoSklearnBaseline(autosklearn.ensembles.SingleBest, 600).run() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
1461,bank-marketing,Auto-sklearn,-0.711,-0.535,-0.907,3.34,-0.648,598.0,0.1,"HistGradientBoostingClassifier(early_stopping=True, | ||
l2_regularization=1.7108930238344161e-10, | ||
learning_rate=0.010827728124541558, loss='auto', | ||
max_iter=512, max_leaf_nodes=25, | ||
min_samples_leaf=4, n_iter_no_change=19, | ||
random_state=1, | ||
validation_fraction=0.1759114608225653, | ||
warm_start=True)" | ||
179,adult,Auto-sklearn,-0.774,-0.91,-0.859,5.077,-0.885,595.3,0.1,"HistGradientBoostingClassifier(early_stopping=True, | ||
l2_regularization=1.7108930238344161e-10, | ||
learning_rate=0.010827728124541558, loss='auto', | ||
max_iter=512, max_leaf_nodes=25, | ||
min_samples_leaf=4, n_iter_no_change=19, | ||
random_state=1, | ||
validation_fraction=0.1759114608225653, | ||
warm_start=True)" | ||
1464,blood-transfusion-service-center,Auto-sklearn,-0.669,-0.5,-0.8,7.209,-0.625,597.6,0.0,"PassiveAggressiveClassifier(C=0.253246830865058, average=True, max_iter=16, | ||
random_state=1, tol=0.01676578241454229, | ||
warm_start=True)" | ||
991,car,Auto-sklearn,-1.0,-1.0,-1.0,0.0,-1.0,596.8,0.0,"HistGradientBoostingClassifier(early_stopping=True, | ||
l2_regularization=1.9280388598217333e-10, | ||
learning_rate=0.24233932723531437, loss='auto', | ||
max_iter=128, max_leaf_nodes=35, | ||
min_samples_leaf=17, n_iter_no_change=1, | ||
random_state=1, validation_fraction=None, | ||
warm_start=True)" | ||
1489,phoneme,Auto-sklearn,-0.848,-0.797,-0.887,4.068,-0.845,600.4,0.1,"AdaBoostClassifier(algorithm='SAMME', | ||
base_estimator=DecisionTreeClassifier(max_depth=10), | ||
learning_rate=1.1377640450285444, n_estimators=352, | ||
random_state=1)" | ||
41027,jungle_chess_2pcs_raw_endgame_complete,Auto-sklearn,-0.975,-0.816,-0.865,0.271,-0.824,595.1,0.2,"HistGradientBoostingClassifier(early_stopping=True, | ||
l2_regularization=9.674948183980905e-09, | ||
learning_rate=0.014247987845444413, loss='auto', | ||
max_iter=512, max_leaf_nodes=55, | ||
min_samples_leaf=164, n_iter_no_change=1, | ||
random_state=1, | ||
validation_fraction=0.11770489601182355, | ||
warm_start=True)" | ||
41166,volkert,Auto-sklearn,-0.874,-0.586,-0.644,1.829,-0.587,595.8,0.3,"LinearDiscriminantAnalysis(shrinkage='auto', solver='lsqr', | ||
tol=0.018821286956948503)" | ||
54,vehicle,Auto-sklearn,-0.964,-0.86,-0.859,0.408,-0.861,595.5,0.0,"MLPClassifier(activation='tanh', alpha=0.0002060405669905105, beta_1=0.999, | ||
beta_2=0.9, hidden_layer_sizes=(87, 87, 87), | ||
learning_rate_init=0.00040205833939989724, max_iter=256, | ||
n_iter_no_change=32, random_state=1, validation_fraction=0.0, | ||
verbose=0, warm_start=True)" | ||
40996,fashion-mnist,Auto-sklearn,-0.968,-0.864,-0.865,1.913,-0.866,296.1,1.2,"KNeighborsClassifier(n_neighbors=4, weights='distance')" | ||
40996,fashion-mnist,Auto-sklearn,-0.968,-0.864,-0.865,1.913,-0.866,595.5,0.8,"KNeighborsClassifier(n_neighbors=4, weights='distance')" | ||
42344,sf-police-incidents,Auto-sklearn,-0.574,-0.589,-0.574,15.367,-0.569,594.8,0.5,"HistGradientBoostingClassifier(early_stopping=True, | ||
l2_regularization=3.609412172481434e-10, | ||
learning_rate=0.05972079854295879, loss='auto', | ||
max_iter=512, max_leaf_nodes=4, | ||
min_samples_leaf=2, n_iter_no_change=14, | ||
random_state=1, validation_fraction=None, | ||
warm_start=True)" | ||
1240,airlinescodrnaadult,Auto-sklearn,-0.62,-0.683,-0.631,13.306,-0.658,594.3,0.1,"SGDClassifier(alpha=1.6992296128865824e-07, average=True, eta0=0.01, loss='log', | ||
max_iter=512, penalty='l1', random_state=1, | ||
tol=1.535384699341134e-05, warm_start=True)" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
from abc import ABC | ||
|
||
|
||
class AutoMLBaseline(ABC): | ||
def run(self): | ||
raise NotImplementedError | ||
|
||
@staticmethod | ||
def save_on_disk(data): | ||
raise NotImplementedError | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно убрать комментарий