Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Calculate Total Spread #15

Closed
icastorm opened this issue Aug 23, 2024 · 5 comments
Closed

Feature Request: Calculate Total Spread #15

icastorm opened this issue Aug 23, 2024 · 5 comments

Comments

@icastorm
Copy link
Contributor

Issue

An important function of the DART diagnostic toolkit is to compare the RMSE to the "total spread", which is defined in src/pydartdiags/obs_sequence/obs_sequence.py as the sqrt(sum(sd+obs_err_var)). Given the usefulness of the total spread value in a variety of contexts (temporal evolution, comparisons between observation types, vertical distribution, etc.), it seem appropriate to add a function that calculates the "total spread" of some set of observations to the plots.py script. Total spread plots could be added later as well.

Solution(s)

  • To make calculating the total spread easier, the observation error variance should be added as a column to the dataframe created by the obs_sequence class. This would probably be a useful addition regardless.
  • A function should be added to the plots.py script that, given a dataframe of observations with columns for observation error variance ensemble variance returns the "total spread" value given by sqrt(sum(ensemble_sd+obs_err_sd)). Here, ensemble_sd and obs_err_sd are the square roots of ensemble variance and observation error variance respectively.

Testing

  • To ensure parity with the matlab diagnostic code base, total_spread values should be compared to those in the output of dart_diags
@hkershaw-brown
Copy link
Member

Thanks @icastorm great request, currently totalspread is missing.

Here is how I was thinking about it with rmse and bias (I was focusing only on these for a demo of the plots!):

The column is single observation calculation, so squared error and bias are created as columns when we create the dataframe

# calculate bias and sq_err is the obs_seq is an obs_seq.final
if 'prior_ensemble_mean'.casefold() in map(str.casefold, self.columns):
self.df['bias'] = (self.df['prior_ensemble_mean'] - self.df['observation'])
self.df['sq_err'] = self.df['bias']**2 # squared error

sq_err = (mean-obs)**2
bias = mean-obs

For rmse, this is over a group of observations, so you select the group of observations and get the rmse and bias for that group of obs.
rmse = sqrt( sum((mean-obs)**2)/n )
bias = sum((mean-obs)/n

def rmse_bias(df):
rmse_bias_df = df.groupby(['hPa', 'type']).agg({'sq_err':mean_then_sqrt, 'bias':'mean'}).reset_index()
rmse_bias_df.rename(columns={'sq_err':'rmse'}, inplace=True)
return rmse_bias_df

I think you're correct that we can treat totalspread in the same way. The function to calculate totalspread is the way to go.

obs_err_var is there as a column in the dataframe. There may be something funky going on if you are not seeing an 'obs_err_var' column

Longer term, I think we might want to split the diagnostic calculations into their own module - I'm guessing someone might want the calculations without necessarily making the plot.

@icastorm
Copy link
Contributor Author

icastorm commented Sep 6, 2024

Sorry for the radio silence, its been a busy couple of weeks and I've been on a bit of a time crunch. I will hopefully be back to working on this next week though...

@icastorm
Copy link
Contributor Author

icastorm commented Dec 5, 2024

Thinking on this more I feel that while for now simply adding to the calculations done in the plots.py script will work, but we should definitely consider moving the diagnostics to their own module that the plots script can draw on as needed.

@icastorm
Copy link
Contributor Author

Because I built this on top of the plotting changes in this pull request, I went ahead and put the commits adding the totalspread calculation and plot there.

@hkershaw-brown
Copy link
Member

total spread added in #24

@github-project-automation github-project-automation bot moved this from Todo to Done in Roadmap to v1 Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants