Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce chunkstore memory footprint #747
base: master
Are you sure you want to change the base?
reduce chunkstore memory footprint #747
Changes from all commits
036a89f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some errors in the tests, so I'm thinking this will need to be tweaked a bit more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see this as a win. It seems like the caller may have a bug if they are specifying duplicate columns, we're just hiding the error now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic is confusing when subsetting data frames with indexes. For example, if you have the data frame:
index: date, security
columns: price, volume
The logic works if the user passes:
['price']
Raises a duplicate columns error when passing:
['date','security','price']
I don't see the value in the check - it should just do the right thing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pandas nomenclature the columns and the index are separate. If there is an index, you always get it back, even if you specify a subset of columns (and even if they do not include the index columns). Maybe the documentation should be improved. If for example, you specify price and security, you'll still get date as well as price and security, so your fix would only introduce more weirdness (in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove index columns from columns and then check for duplicates. This keeps the nomenclature but keeps the user interface 'minimum surprise'. Or raise an error saying they have included index columns in the column list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result would be the same though, no? You'd supply index columns and it wont complain. I foresee someone opening a bug complaining they only specified 1 of 3 index columns but still got all 3 back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in retrospect, that means breaking the API for clients. How about we keep the fuzziness for clients and simply output a warning instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, lets see a warning :) but i still think that get info should change, otherwise how would you ever know how to rid yourself of the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with get_info change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so sounds like you just need to fix the broken tests and add the log and we're all set :D