Computing Area under the ROC curve #1909

diegocarrio · 2022-11-02T08:52:22Z

diegocarrio
Nov 2, 2022

Dear MET_help support team,

I am using MET to compute the ROC curve and the area under the ROC curve associated with precipitation fields. To do this, I use the "grid_stat" as follows:

./MET_TOOL/10.1.2/bin/grid_stat
${file_model}
${file_obs}
GridStatConfig_default
-outdir ${output_dir} -v 2

In the grid_stat_000000L_19700101_000000V_nbrcts.txt file, I found the "PODY" and "POFD" necessary to plot the ROC curve. I was able to do that without problem. However, I was wondering if MET also compute the area under the ROC curve or if I should compute that by myself. In case MET computes that area, how can I found that variable? In grid_stat_000000L_19700101_000000V_nbrcts.txt file, I haven't found anything that makes me think that the area under the ROC curve is computed.

Could you help me with this?

Thank you very much in advance,

Best Regards,
Diego

Answered by ericgilleland

Nov 14, 2022

Hi Diego,

I'm not exactly sure how this works in METplus, but my understanding is that you have to trick it to get the ROC/AUC by making it think you have probabilistic verification sets. So, I would figure out how it works for those data and try to arrange your data in that manner. Someone else (Barb maybe) might be able to better explain what form your data need to be in than I.

There is definitely something awry with your ROC plot as I don't think there should be a loop in it.

Best,

Eric

View full answer

j-opatz · 2022-11-02T16:56:46Z

j-opatz
Nov 2, 2022
Collaborator

Hi Diego,

And thank you for your question. MET does calculate area under the ROC curve; however, this output is only available in the PSTD line type, so you would need to set up your input forecast data as probabilistic. The variable from the PSTD line type is called Area Under the ROC curve, or AUC; documentation can be found here in the MET user's guide. From the documentation, you can see the equation that MET uses to calculate AUC.

You could also get a PSTD line type output from a Stat-Analysis run; looking at the documentation for an aggregate_stat command here, you would need to pass in PCT line types (contingency table counts for probabilistic forecasts) or matched pair (MPR) line types. Note that if you use MPR, you'll additionally need to specify the -out_fcst_thresh and -out_obs_thresh (to get the data into the probabilistic realm).

3 replies

diegocarrio Nov 3, 2022
Author

Hi @j-opatz,

Thank you for your response. However, I do not fully understand your explanation. I am quite new using MET and it is not clear enough to me how to do that. Let me further explain how I compute ROC curves also and the weird results I get:

First, I compute 3h accumulated precipitation estimated by WRF model.
I compute 3h accumulated precipitation from observations (rain gauges).
I interpolate both fields to a common grid. Let's call the 3h accum. precipitation from the model "pcp_model.nc" and the one from the observations "pcp_obs.nc".
I use MET tool as follows:

./MET_TOOL/10.1.2/bin/grid_stat
pcp_model.nc
pcp_obs.nc
GridStatConfig_default
-outdir ${output_dir} -v 2

Are you saying that I cannot compute ROC curves in that way?

Results:

Have a look at the loop behavior of the red line... Are you saying that to compute ROC curves and area under ROC curves, I have to provide as inputs probabilistic fields instead of the raw precipitation fields I was using previously?

I do not know what PSTD, PCT and the other acronyms you used stand for... Could you please provide me a clear example of how to compute the ROC curves for my example if I would be interested in using a precipitation threshold of 10 mm?

Thanks in advance, and apologies for the inconveniences,

BR,
Diego

jprestop Nov 8, 2022
Maintainer

Hi @diegocarrio. I have also reached out to @ericgilleland and @bgbrowntollerud who can hopefully provide some further clarification.

ericgilleland Nov 11, 2022

To compute the ROC/AUC using MET, it appears that the user needs to translate the continuous stats to dichotomous. The two standard verification texts (Wilks/Jolliffe and Stephenson) only discuss ROC/AUC in terms of probabilistic forecasts but I don't think there is any reason you can't do it with deterministic sets as the components are just categorical scores.

Barb and I don't know how the points seemed to be out of order in the ROC plot that was included with his message. Hopefully a revised application to the dichotomous data will solve that.

j-opatz · 2022-11-07T17:41:20Z

j-opatz
Nov 7, 2022
Collaborator

Hi Diego,

I can certainly help with those questions.

PSTD is the contingency table statistics for probabilistic forecasts' line type in MET. It contains numerous statistics related to probabilistic forecasts, including the Brier Skill Score (BSS) and the Area Under the ROC curve (AUC). PCT is another line type for probabilistic forecasts, but for what you're trying to accomplish you can ignore it.

As I thought of how we can get your forecast precipitation data into a probabilistic range (0 to 1 or 0 to 100) to successfully request a PSTD line type, however, I realized this approach is probably not going to work in MET. There are methods for returning an uncalibrated ensemble probability forecast with the GenEnsProd tool, but none that would let you set thresholds for your forecast data and calculate probabilities for each grid point based on those thresholds. That would most likely require climatology data and a completely different set up.

Instead what I'll do is bring in @bikegeek to the discussion who might know a way to calculate the area under the ROC curve in our METcalcpy or METplotpy components. They should also be able to help you past the plotting behavior you're seeing.

2 replies

bikegeek Nov 7, 2022
Collaborator

Here is a link to generating a ROC diagram using METplotpy:
https://metplotpy.readthedocs.io/en/develop/Users_Guide/roc_diagram.html

We don't calculate the AOC so that won't be available in either the METcalcpy or METplotpy repositories.

I am not well-versed in MET, but there is some sample data that we used to develop the ROC diagram in the METplotpy repository:
https://github.com/dtcenter/METplotpy/tree/develop/test/roc_diagram

We used MET data for the CTC line type: CTC_ROC.data You can take a look at the data to get an idea of what you need to request from your MET runs.

diegocarrio Nov 14, 2022
Author

Thank you for all your responses. However, It is not clear enough to me if the method I was using to compute the ROC curve is right. In other words, I compute ROC curves in the following way:

First, I compute 3h accumulated precipitation estimated by WRF model. Then, I compute 3h accumulated precipitation from observations (rain gauges). After this step, I interpolate both fields to a common grid. Let's call the 3h accum. precipitation from the model "pcp_model.nc" and the one from the observations "pcp_obs.nc". Finally, I use MET tool as follows:

./MET_TOOL/10.1.2/bin/grid_stat
pcp_model.nc
pcp_obs.nc
GridStatConfig_default
-outdir ${output_dir} -v 2

From the grid_stat_000000L_19700101_000000V_nbrcts.txt file, I obtain the "PODY" and "POFD", which are the only scores needed to plot the ROC curve.

Are you saying that I cannot compute ROC curves in that way?

Thanks in advance,
Diego

ericgilleland · 2022-11-14T15:18:32Z

ericgilleland
Nov 14, 2022

Hi Diego,

I'm not exactly sure how this works in METplus, but my understanding is that you have to trick it to get the ROC/AUC by making it think you have probabilistic verification sets. So, I would figure out how it works for those data and try to arrange your data in that manner. Someone else (Barb maybe) might be able to better explain what form your data need to be in than I.

There is definitely something awry with your ROC plot as I don't think there should be a loop in it.

Best,

Eric

3 replies

JohnHalleyGotway Nov 15, 2022
Maintainer

@ericgilleland and @diegocarrio, just adding one more voice the list of responses here. I'm an engineer who works with @jprestop on the development and support of MET.

To put it simply, yes, MET can compute the area under a ROC curve. However, that is only done in the context of verifying probability forecasts. It's reported in the ROC_AUC column of the PSTD output line type. In addition, the points of that ROC curve are reported in the PRC output line type. This would apply, for example, when verifying a probability of precipitation forecast to gridded or point observations of precipitation accumulation amounts.

I see that you're running Grid-Stat to verify accumulated precipitation, not probabilities. However, I also see that you're applying the neighborhood verification methods. These details get very confusing very quickly.

I can think of 2 approaches that might interest you.

(Which I don't recommend) It is technically possible to compute ROC curves from 2x2 contingency tables. But practically speaking, doing so is very tedious. Namely, the forecast thresholds must vary while the observation thresholds remain fixed, as below:

fcst = {
   field = [ { name = "APCP"; level = "A6";  cat_thresh = [ >5, >10, >15, >20, >25 ]; } ]
}
obs = {
   field = [ { name = "APCP"; level = "A6";  cat_thresh = [ >15, >15, >15, >15, >15 ]; } ]
}

And the corresponding contingency table stats can be used to construct a ROC curve. And in fact, the METviewer database and display tool CAN derive ROC curves when these conditions are met.

Practically speaking, we've found that configuring the tools in this way is so tedious that it isn't worth doing. Instead, we use a variety of other categorical statistics to evaluate the quality of precipitation forecasts. Including multi-category contingency table counts (MCTC) and statistics (MCTS).

(Which you could try) When generating neighborhood statistics, you could configure Grid-Stat to write out the fractional coverage fields it computes in that process by setting nc_pairs_flag.nbrhd = TRUE;. That includes them in the NetCDF file written by Grid-Stat. Then you could run Grid-Stat a second time, passing that NetCDF file in as the forecast file. Setting prob = TRUE; tells Grid-Stat to interpret those fractional coverage values as being probabilities. And you could "verify" them against thresholded observations.

That would produce a ROC curve. The difficulty is figuring out how to interpret it. You'd be verifying a spatially derived probability against thresholded observations. So I'm pretty confident that we could get MET to compute the stats with 2 calls to Grid-Stat... but I'm less confident in their meaning.

JohnHalleyGotway Nov 15, 2022
Maintainer

It also occurs to me that this sounds a LOT LIKE the HiRA methodology described in this section of the MET User's Guide. HiRA is basically the neighborhood method applied when verifying against point observations. And it does compute a spatially derived probability forecast and verify it against observations at point locations.

So I suppose the interpretation of the Grid-Stat approach described above would match the interpretation of HiRA-derived probabilistic statistics, including the area under the ROC curve.

jprestop Nov 29, 2022
Maintainer

HI @diegocarrio. I just wanted to follow up and see if you had any further questions. If not, I’d like to mark this discussion as being answered and “lock it” to prevent future posts. That’s how we encourage users to create new discussions for new questions. But I wanted to give you an opportunity to comment on it before doing so. Please feel free to select one of the responses as being the best answer to your original question. That’ll help future users with similar questions find answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computing Area under the ROC curve #1909

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Computing Area under the ROC curve #1909

diegocarrio Nov 2, 2022

Replies: 3 comments · 8 replies

j-opatz Nov 2, 2022 Collaborator

diegocarrio Nov 3, 2022 Author

jprestop Nov 8, 2022 Maintainer

ericgilleland Nov 11, 2022

j-opatz Nov 7, 2022 Collaborator

bikegeek Nov 7, 2022 Collaborator

diegocarrio Nov 14, 2022 Author

ericgilleland Nov 14, 2022

JohnHalleyGotway Nov 15, 2022 Maintainer

JohnHalleyGotway Nov 15, 2022 Maintainer

jprestop Nov 29, 2022 Maintainer

diegocarrio
Nov 2, 2022

Replies: 3 comments 8 replies

j-opatz
Nov 2, 2022
Collaborator

diegocarrio Nov 3, 2022
Author

jprestop Nov 8, 2022
Maintainer

j-opatz
Nov 7, 2022
Collaborator

bikegeek Nov 7, 2022
Collaborator

diegocarrio Nov 14, 2022
Author

ericgilleland
Nov 14, 2022

JohnHalleyGotway Nov 15, 2022
Maintainer

JohnHalleyGotway Nov 15, 2022
Maintainer

jprestop Nov 29, 2022
Maintainer