-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling mean operation is ignored after diff #21038
Comments
Confusingly enough, this is actually expected When you write df.rolling('index', '3i').agg(pl.col('x').log().diff().mean()) That is to say - the whole expression Let's break that down: In [3]: df.rolling('index', period='3i').agg(pl.col('x'))
Out[3]:
shape: (19, 2)
┌───────┬──────────────┐
│ index ┆ x │
│ --- ┆ --- │
│ u32 ┆ list[i64] │
╞═══════╪══════════════╡
│ 0 ┆ [2] │
│ 1 ┆ [2, 3] │
│ 2 ┆ [2, 3, 4] │
│ 3 ┆ [3, 4, 5] │
│ 4 ┆ [4, 5, 6] │
│ … ┆ … │
│ 14 ┆ [14, 15, 16] │
│ 15 ┆ [15, 16, 17] │
│ 16 ┆ [16, 17, 18] │
│ 17 ┆ [17, 18, 19] │
│ 18 ┆ [18, 19, 20] │
└───────┴──────────────┘
In [4]: df.rolling('index', period='3i').agg(pl.col('x').log())
Out[4]:
shape: (19, 2)
┌───────┬────────────────────────────────┐
│ index ┆ x │
│ --- ┆ --- │
│ u32 ┆ list[f64] │
╞═══════╪════════════════════════════════╡
│ 0 ┆ [0.693147] │
│ 1 ┆ [0.693147, 1.098612] │
│ 2 ┆ [0.693147, 1.098612, 1.386294] │
│ 3 ┆ [1.098612, 1.386294, 1.609438] │
│ 4 ┆ [1.386294, 1.609438, 1.791759] │
│ … ┆ … │
│ 14 ┆ [2.639057, 2.70805, 2.772589] │
│ 15 ┆ [2.70805, 2.772589, 2.833213] │
│ 16 ┆ [2.772589, 2.833213, 2.890372] │
│ 17 ┆ [2.833213, 2.890372, 2.944439] │
│ 18 ┆ [2.890372, 2.944439, 2.995732] │
└───────┴────────────────────────────────┘
In [5]: df.rolling('index', period='3i').agg(pl.col('x').log().diff())
Out[5]:
shape: (19, 2)
┌───────┬────────────────────────────┐
│ index ┆ x │
│ --- ┆ --- │
│ u32 ┆ list[f64] │
╞═══════╪════════════════════════════╡
│ 0 ┆ [null] │
│ 1 ┆ [null, 0.405465] │
│ 2 ┆ [null, 0.405465, 0.287682] │
│ 3 ┆ [null, 0.287682, 0.223144] │
│ 4 ┆ [null, 0.223144, 0.182322] │
│ … ┆ … │
│ 14 ┆ [null, 0.068993, 0.064539] │
│ 15 ┆ [null, 0.064539, 0.060625] │
│ 16 ┆ [null, 0.060625, 0.057158] │
│ 17 ┆ [null, 0.057158, 0.054067] │
│ 18 ┆ [null, 0.054067, 0.051293] │
└───────┴────────────────────────────┘
In [6]: df.rolling('index', period='3i').agg(pl.col('x').log().diff().mean())
Out[6]:
shape: (19, 2)
┌───────┬──────────┐
│ index ┆ x │
│ --- ┆ --- │
│ u32 ┆ f64 │
╞═══════╪══════════╡
│ 0 ┆ null │
│ 1 ┆ 0.405465 │
│ 2 ┆ 0.346574 │
│ 3 ┆ 0.255413 │
│ 4 ┆ 0.202733 │
│ … ┆ … │
│ 14 ┆ 0.066766 │
│ 15 ┆ 0.062582 │
│ 16 ┆ 0.058892 │
│ 17 ┆ 0.055613 │
│ 18 ┆ 0.05268 │
└───────┴──────────┘ I appreciate that this is confusing, but I'd say that it is expected behaviour |
By changing the code a bit, I found a more interpretable reproduction. I removed the mean operation to see which values end up in the rolling window.
For some reason, in the |
because they don't have any previous elements to take a diff with |
Oh, okay, I get it now! Thank you for the answer! I have no other input on this issue, maybe we should have some part of the documentation mentioning this quirk. |
Actually, I may have a follow-up question. Can we have this |
I kind of found a hack, but better solutions are welcome! I multiplied the lag in the rolling window period by 2, therefore there is enough space for the lagged diff to be executed.
|
Checks
Reproducible example
Log output
Issue description
In the provided example the columns
r1
andr2
should be equal, since the underlying calculations are the same, but expressed in two different ways:r1
is calculated directlyr2
is calculated through an intermediate result aliased asld
I suppose there is an error in how Polars optimizes the operations internally.
After removing the
diff
operation, the bug can no longer be reproduced, the two columns are equal.Expected behavior
There should not be any difference between the two columns, since the underlying calculations are the same.
Installed versions
The text was updated successfully, but these errors were encountered: