allow missing values in columns #268

smason · 2025-01-02T16:58:50Z

Previously the code would fail with:

DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([somedimension]).

if any of the values were missing. It seems much nicer if they were maintained as missing values of appropriate unit.

Partial fix for handling of data where each value has a unit label #267
Executed pre-commit run --all-files with no errors
The change is fully covered by automated unit tests
Documented in docs/ as appropriate
Added an entry to the CHANGES file

partial fix for hgrecco#267 previously the code would fail with: DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([mass]).

andrewgsavage · 2025-01-02T18:04:45Z

nice

smason · 2025-01-02T19:00:12Z

Sorry, I should have tagged that as WIP...

I've been trying to figure out the other changes and realized I don't understand what's going on in PintArray._from_sequence and PintArray._from_sequence_of_strings.

I think the code I changed in _from_sequence_of_strings should probably be in _from_sequence as that gets called when converting from a Pandas Series.

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

andrewgsavage · 2025-01-02T20:06:11Z

Sorry, I should have tagged that as WIP...

oh the tests looked like it fixed one of the issues

I've been trying to figure out the other changes and realized I don't understand what's going on in PintArray._from_sequence and PintArray._from_sequence_of_strings.

I think the code I changed in _from_sequence_of_strings should probably be in _from_sequence as that gets called when converting from a Pandas Series.

yes

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

yea go for it, looks like

pint-pandas/pint_pandas/pint_array.py

Line 672 in 8c2fc78

if isinstance(scalars, PintArray):

needs a if dtype is None

smason · 2025-01-03T12:19:54Z

Sorry, I should have tagged that as WIP...

oh the tests looked like it fixed one of the issues

I guess if you're happy with it, it's okay...

The code in _from_sequence also seems to override dtype in various situations. I'm not sure why, maybe I should put a pull-request that organizes things more how I'd expect them to be. Might be worth discussing there as I'm not sure if my understanding is correct

yea go for it, looks like

pint-pandas/pint_pandas/pint_array.py

Line 672 in 8c2fc78

if isinstance(scalars, PintArray):

needs a if dtype is None

there's a lot of duplication of code from calls that go through _from_sequence_of_strings to _from_sequence then in __init__. I don't understand why all those checks and adjustments need to be made, along with seemingly many copies of the data, why not just have most of them in __init__?

I was trying to figure out everything that's valid to construct a PintArray with and discovering things like Quantitys can be array valued (I'd only ever used them as scalars).

I've also not done much with the internals of Pandas before, so don't know when/why those methods are called.

andrewgsavage · 2025-01-03T14:49:03Z

there's a lot of duplication of code from calls that go through _from_sequence_of_strings to _from_sequence then in __init__. I don't understand why all those checks and adjustments need to be made, along with seemingly many copies of the data, why not just have most of them in __init__?

I was trying to figure out everything that's valid to construct a PintArray with and discovering things like Quantitys can be array valued (I'd only ever used them as scalars).

I've also not done much with the internals of Pandas before, so don't know when/why those methods are called.

The methods _from_sequence_of_strings and _from_sequence are used by pandas internally. __init__ has been expanded on to allow more types as the values arguement. The pandas examples much of this was based on don't allow using strings in __init__, eg https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/numeric.py#L243

I've listed the different possible inputs here https://pint-pandas.readthedocs.io/en/latest/user/initializing.html

smason added 2 commits January 2, 2025 16:39

allow missing values in columns

1244b39

partial fix for hgrecco#267 previously the code would fail with: DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([mass]).

test that missing values are actually are

a30db55

andrewgsavage merged commit 8c2fc78 into hgrecco:master Jan 2, 2025
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow missing values in columns #268

allow missing values in columns #268

smason commented Jan 2, 2025

andrewgsavage commented Jan 2, 2025

smason commented Jan 2, 2025

andrewgsavage commented Jan 2, 2025

smason commented Jan 3, 2025

andrewgsavage commented Jan 3, 2025

allow missing values in columns #268

allow missing values in columns #268

Conversation

smason commented Jan 2, 2025

andrewgsavage commented Jan 2, 2025

smason commented Jan 2, 2025

andrewgsavage commented Jan 2, 2025

smason commented Jan 3, 2025

andrewgsavage commented Jan 3, 2025