stats: std population or sample calculation on 1 value #231

slabasan · 2024-11-21T16:11:57Z

Calling th.stats.std(ttk, cols) may result in aggregation of a single row. In thicket, we are calling .agg(np.std), which calculates the standard deviation with a degrees of freedom (ddof) of 1. In other words, it divides by n-1 (where n is the number of elements). This is statistically appropriate for estimating the population standard deviation from a sample. However, with only one element, the calculation becomes 0/0, resulting in a NaN value.

One alternative is to set ddof=0, calculating the standard deviation with ddof=0, dividing by n instead of n-1, resulting in a standard deviation of 0 for a single element:

   import pandas as pd
   import numpy as np

   df = pd.DataFrame({'A': [1]})

   # Calculate standard deviation with ddof=0
   result = df.agg(lambda x: np.std(x, ddof=0))
   print(result)

For standard deviation, it may be appropriate to have an option to toggle between population and sample calculation.

The text was updated successfully, but these errors were encountered:

slabasan added the area-stats Issues and PRs related to Thicket's stats subpackage label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: std population or sample calculation on 1 value #231

stats: std population or sample calculation on 1 value #231

slabasan commented Nov 21, 2024

stats: std population or sample calculation on 1 value #231

stats: std population or sample calculation on 1 value #231

Comments

slabasan commented Nov 21, 2024