Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow missing values in columns #268

Merged
merged 2 commits into from
Jan 2, 2025

Conversation

smason
Copy link
Contributor

@smason smason commented Jan 2, 2025

Previously the code would fail with:

DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([somedimension]).

if any of the values were missing. It seems much nicer if they were maintained as missing values of appropriate unit.

smason added 2 commits January 2, 2025 16:39
partial fix for hgrecco#267

previously the code would fail with:

  DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([mass]).
@andrewgsavage andrewgsavage merged commit 8c2fc78 into hgrecco:master Jan 2, 2025
32 checks passed
@andrewgsavage
Copy link
Collaborator

nice

@smason
Copy link
Contributor Author

smason commented Jan 2, 2025

Sorry, I should have tagged that as WIP...

I've been trying to figure out the other changes and realized I don't understand what's going on in PintArray._from_sequence and PintArray._from_sequence_of_strings.

I think the code I changed in _from_sequence_of_strings should probably be in _from_sequence as that gets called when converting from a Pandas Series.

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

@andrewgsavage
Copy link
Collaborator

Sorry, I should have tagged that as WIP...

oh the tests looked like it fixed one of the issues

I've been trying to figure out the other changes and realized I don't understand what's going on in PintArray._from_sequence and PintArray._from_sequence_of_strings.

I think the code I changed in _from_sequence_of_strings should probably be in _from_sequence as that gets called when converting from a Pandas Series.

yes

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

yea go for it, looks like

if isinstance(scalars, PintArray):
needs a if dtype is None

@smason
Copy link
Contributor Author

smason commented Jan 3, 2025

Sorry, I should have tagged that as WIP...

oh the tests looked like it fixed one of the issues

I guess if you're happy with it, it's okay...

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

yea go for it, looks like

if isinstance(scalars, PintArray):

needs a if dtype is None

there's a lot of duplication of code from calls that go through _from_sequence_of_strings to _from_sequence then in __init__. I don't understand why all those checks and adjustments need to be made, along with seemingly many copies of the data, why not just have most of them in __init__?

I was trying to figure out everything that's valid to construct a PintArray with and discovering things like Quantitys can be array valued (I'd only ever used them as scalars).

I've also not done much with the internals of Pandas before, so don't know when/why those methods are called.

@andrewgsavage
Copy link
Collaborator

there's a lot of duplication of code from calls that go through _from_sequence_of_strings to _from_sequence then in __init__. I don't understand why all those checks and adjustments need to be made, along with seemingly many copies of the data, why not just have most of them in __init__?

I was trying to figure out everything that's valid to construct a PintArray with and discovering things like Quantitys can be array valued (I'd only ever used them as scalars).

I've also not done much with the internals of Pandas before, so don't know when/why those methods are called.

The methods _from_sequence_of_strings and _from_sequence are used by pandas internally. __init__ has been expanded on to allow more types as the values arguement. The pandas examples much of this was based on don't allow using strings in __init__, eg https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/numeric.py#L243

I've listed the different possible inputs here https://pint-pandas.readthedocs.io/en/latest/user/initializing.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants