-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo/generalize/tighten args shorthand #530
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main things are testing simulation changes and seeing if the humanize
package fulfills some of our needs here!
@@ -88,7 +88,7 @@ def get_train_dataset_params(input_params: dict, old_params: Optional[dict] = No | |||
train_dataset_params['cache_limit'] = input_params['cache_limit'] | |||
train_dataset_params['shuffle'] = input_params['shuffle'] | |||
train_dataset_params['shuffle_algo'] = input_params['shuffle_algo'] | |||
train_dataset_params['shuffle_block_size'] = number_abbrev_to_int( | |||
train_dataset_params['shuffle_block_size'] = normalize_count( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you been able to test these changes? could you make sure these numbers are displaying and being passed as intended by running simulator
and passing in values for these parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could add a short "testing" section to the PR description that would be great as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the procedure for checking i didn't break the simulator?
@@ -1,115 +1,371 @@ | |||
# Copyright 2023 MosaicML Streaming authors | |||
# SPDX-License-Identifier: Apache-2.0 | |||
|
|||
"""Utilities for human-friendly argument shorthand.""" | |||
"""Conversions between human-friendly string forms and int/float.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a suggestion -- would the humanize library be applicable for some of these functions? Would be nice to use an external library for stuff like this, removes some burden on us as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observations for posterity:
- the
humanfriendly
library is very pretty - But it lacks "counts"
- So we are rolling our own functionality either way for that vertical
- Also, we are tight on time and this is enough, so I figure let's just go with this and revisit in 2024
* Redo/generalize/tighten args shorthand, clean up usage, update tests. * Fix (cruft). * Fix (typo). * Fix (reference to member). * Tweak. * Divide tests/test_util.py into tests/util/....py. * Fix. * Error messages. * Lowercase, no space.
Get more specific functionality in place to set the stage for the more interesting PRs.
Beeves I have with existing approach:
i
when using base-1024 (1gb
vs1gib
). Then we can recognize both at the same time, or either.Gb
and didn't get gigabits.1h23m45s
.Paths:
normalize_bytes
->normalize_dec_bytes
,normalize_bin_bytes
->_normalize_nonneg_int
->_normalize_int
->_normalize_num
->_normalize_arg
.normalize_count
->_normalize_nonneg_int
->_normalize_int
->_normalize_num
->_normalize_arg
.normalize_duration
->_normalize_float
->_normalize_num
->_normalize_arg
.Steps of the
_normalize_arg
algorithm:Example of how configuration and functionality are decomposed in this PR: