-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix common issues #68
Conversation
Hello @MorrisNein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2023-11-07 14:41:19 UTC |
18b285a
to
c4d8ec7
Compare
Codecov Report
@@ Coverage Diff @@
## main #68 +/- ##
==========================================
+ Coverage 28.50% 28.76% +0.25%
==========================================
Files 53 53
Lines 2319 2347 +28
==========================================
+ Hits 661 675 +14
- Misses 1658 1672 +14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В /experiments/base хранились чекпоинты обученной модели для использования в дальнейшем. Название директорий так себе, но на мой взгляд в /data им тоже не место.
Предлагаю создать директорию model_checkpoints (либо model_weights) в корне. Примерная структура:
model_checkpoinst
______ table_data
____________ checkpoints
____________ events.out.tfevents…..
____________ hparams.yaml
______ timeseries
.....
5edec59
to
222d693
Compare
39f9557
to
bb3f03a
Compare
x_train, x_test = train_test_split(meta_features, train_size=0.75, random_state=42) | ||
y_train = x_train.index | ||
y_test = x_test.index | ||
mf_train, mf_test = train_test_split(meta_features, train_size=0.75, random_state=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Здесь тогда тоже можно разделить вместе с индексом сразу
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Сделал
for idx, col in enumerate(x.columns): | ||
is_categorical = cat_cols_indicator[idx] | ||
if is_categorical: | ||
most_frequent = x_new[col].value_counts(sort=True, ascending=False).values[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Мб информация о том, что не удалось посчитать данный признак более информативна, чем просто заменить все на моду и медиану?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Т.е. исключать целиком признак датасета из расчёта мета-признаков, если признак встречает none?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Пока что заменил на более лаконичный расчёт моды
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Нет. Я имел ввиду, что мы можем заменить нан на какое-то специальное значение, которое будет говорить о том, что значения там нет.
Но такое значение, чтобы модель смогла переварить
input_data = InputData(idx=np.array([0]), features=np.array(features).reshape(1, -1), target=None, | ||
task=Task(TaskTypesEnum.classification), | ||
data_type=DataTypesEnum.table) | ||
with IndustrialModels(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это лучше перенести вне цикла, чтобы не вызывать каждый раз тяжелую операцию подгрузки репозитория
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Попробовал убрать, всё работает. Контекст оказался нужен только при загрузке пайплайна
else: | ||
median = x_new[col].median() | ||
x_new[col].fillna(median, inplace=True) | ||
fill_value = x_new[col].median(skipna=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно оставить и как было. Про замену нанов каким-то специфичным значением просто идея
dc6c069
to
4061730
Compare
Solves following issues:
Model
a more specific name #16