-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sum() and cum_sum() do not cast to UInt64 when data overflows UInt32 #17340
Comments
No opinion on your issue, just chiming in with what may be a clarification.
|
polars is a data transformation/manipulation library with a string focus on performance. imo to a certain point this lies within the responsibility of us as data people to know our data, data types and ranges of our expected outputs 🤔 😃 Not sure if this is the case but if polars is inconsistent and sometimes cast and sometimes not, this inconsistency would be unfortunate. |
This is intended behavior. Polars, like other compute libraries like numpy does not check for overflow in arithmetic and other performance sensitive operations. Separately schemas are determined by the operations and not by the data. For very summing very small integers, we cast them to a large dtype, but of It means that Polars is not allowed to cast on overflow. Data types may not change by data, but are strictly dependent on the operation itself. They can be resolved statically. The only exceptions are pivot, and inference operations like csv, json parsing. |
I think this issue: #1941 is related to our current discussion. Could we consider converting |
Checks
Reproducible example
Log output
No response
Issue description
When a column has dtype
pl.UInt32
and either sum or cum_sum is called, polars ignores any cases of overflow instead of casting topl.UInt64
causing the incorrect result to be silently returned.Expected behavior
Polars should either cast the data to UInt64 and return the correct value or raise an error telling the user to first cast to UInt64. In my case, after several data transformations, i ended up with a large dataframe with a column of small values that was correctly inferred by polars to be UInt32. Because of the large number of rows, the sum overflowed UInt32 but i could not have known this would have happened beforehand.
Installed versions
The text was updated successfully, but these errors were encountered: