Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix common issues #68

Merged
merged 28 commits into from
Nov 10, 2023
Merged

Fix common issues #68

merged 28 commits into from
Nov 10, 2023

Conversation

@pep8speaks
Copy link

pep8speaks commented Nov 1, 2023

Hello @MorrisNein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1:1: F401 '.dataset_similarity_assessor.DatasetSimilarityAssessor' imported but unused
Line 2:1: F401 '.model_based_similarity_assessors.KNeighborsSimilarityAssessor' imported but unused
Line 2:1: F401 '.model_based_similarity_assessors.ModelBasedSimilarityAssessor' imported but unused

Line 1:1: F401 'meta_automl.meta_algorithm.model_advisors.model_advisor.DatasetSimilarityModelAdvisor' imported but unused
Line 1:1: F401 'meta_automl.meta_algorithm.model_advisors.model_advisor.ModelAdvisor' imported but unused
Line 2:1: F401 'meta_automl.meta_algorithm.model_advisors.diverse_model_advisor.DiverseModelAdvisor' imported but unused
Line 3:1: F401 'meta_automl.meta_algorithm.model_advisors.surrogate_advisor.SurrogateGNNPipelineAdvisor' imported but unused

Comment last updated at 2023-11-07 14:41:19 UTC

Copy link

codecov bot commented Nov 1, 2023

Codecov Report

Merging #68 (5edec59) into main (069719f) will increase coverage by 0.25%.
The diff coverage is 26.97%.

❗ Current head 5edec59 differs from pull request most recent head 4061730. Consider uploading reports for the commit 4061730 to get more accurate results

@@            Coverage Diff             @@
##             main      #68      +/-   ##
==========================================
+ Coverage   28.50%   28.76%   +0.25%     
==========================================
  Files          53       53              
  Lines        2319     2347      +28     
==========================================
+ Hits          661      675      +14     
- Misses       1658     1672      +14     
Files Coverage Δ
...ta_preparation/datasets_loaders/datasets_loader.py 88.88% <100.00%> (ø)
...tion/datasets_loaders/timeseries_dataset_loader.py 100.00% <100.00%> (ø)
...ion/feature_preprocessors/feature_preprocessors.py 25.00% <100.00%> (+1.74%) ⬆️
...automl/data_preparation/file_system/file_system.py 90.00% <100.00%> (+1.76%) ⬆️
...ration/meta_features_extractors/pymfe_extractor.py 67.81% <100.00%> (-0.37%) ⬇️
meta_automl/data_preparation/evaluated_model.py 0.00% <0.00%> (ø)
...ion/models_loaders/knowledge_base_models_loader.py 0.00% <0.00%> (ø)
...algorithm/dataset_similarity_assessors/__init__.py 0.00% <0.00%> (ø)
...imilarity_assessors/dataset_similarity_assessor.py 0.00% <0.00%> (ø)
...a_automl/meta_algorithm/model_advisors/__init__.py 0.00% <0.00%> (ø)
... and 9 more

Copy link
Collaborator

@ShikovEgor ShikovEgor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В /experiments/base хранились чекпоинты обученной модели для использования в дальнейшем. Название директорий так себе, но на мой взгляд в /data им тоже не место.
Предлагаю создать директорию model_checkpoints (либо model_weights) в корне. Примерная структура:
model_checkpoinst
______ table_data
____________ checkpoints
____________ events.out.tfevents…..
____________ hparams.yaml
______ timeseries
.....

x_train, x_test = train_test_split(meta_features, train_size=0.75, random_state=42)
y_train = x_train.index
y_test = x_test.index
mf_train, mf_test = train_test_split(meta_features, train_size=0.75, random_state=42)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Здесь тогда тоже можно разделить вместе с индексом сразу

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Сделал

for idx, col in enumerate(x.columns):
is_categorical = cat_cols_indicator[idx]
if is_categorical:
most_frequent = x_new[col].value_counts(sort=True, ascending=False).values[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Мб информация о том, что не удалось посчитать данный признак более информативна, чем просто заменить все на моду и медиану?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Т.е. исключать целиком признак датасета из расчёта мета-признаков, если признак встречает none?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Пока что заменил на более лаконичный расчёт моды

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Нет. Я имел ввиду, что мы можем заменить нан на какое-то специальное значение, которое будет говорить о том, что значения там нет.

Но такое значение, чтобы модель смогла переварить

input_data = InputData(idx=np.array([0]), features=np.array(features).reshape(1, -1), target=None,
task=Task(TaskTypesEnum.classification),
data_type=DataTypesEnum.table)
with IndustrialModels():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это лучше перенести вне цикла, чтобы не вызывать каждый раз тяжелую операцию подгрузки репозитория

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Попробовал убрать, всё работает. Контекст оказался нужен только при загрузке пайплайна

@MorrisNein MorrisNein requested a review from valer1435 November 9, 2023 20:49
else:
median = x_new[col].median()
x_new[col].fillna(median, inplace=True)
fill_value = x_new[col].median(skipna=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Можно оставить и как было. Про замену нанов каким-то специфичным значением просто идея

@MorrisNein MorrisNein merged commit 150d53c into main Nov 10, 2023
@MorrisNein MorrisNein deleted the fix-common-issues branch November 10, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants