Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models listed in the CVs without published data #1028

Closed
matthew-mizielinski opened this issue Jul 13, 2021 · 19 comments
Closed

Models listed in the CVs without published data #1028

matthew-mizielinski opened this issue Jul 13, 2021 · 19 comments

Comments

@matthew-mizielinski
Copy link
Collaborator

Following discussion on #512 I've scraped together data from the ESGF search pages (list of source ids) and the source id list within the CVs to pull out the following table of models where no data appears to be available at the time of writing (July 2021).

This includes a number of institutions where no data has been published for their models and one institution without any models

There are a total of 28 models in the table below with a further 4 registered in the last 12-18 months.

I'm not currently advocating purging all of these, but I think it worth a discussion as to how to handle this

Institution ID Source ID Release Year Activity Participation Notes
AWI AWI-ESM-2-1-LR 2019 CMIP PMIP
BNU BNU-ESM-1-1 2016 C4MIP CDRMIP CFMIP CMIP GMMIP GeoMIP OMIP RFMIP ScenarioMIP No data for institution
CNRM-CERFACS CNRM-ESM2-1-HR 2017 CMIP OMIP ScenarioMIP
CSIR-Wits-CSIRO VRESM-1-0 2016 CMIP DAMIP HighResMIP PMIP ScenarioMIP No data for institution
EC-Earth-Consortium EC-Earth3-GrIA 2019 CMIP ISMIP6 PMIP
EC-Earth3-HR 2019 CMIP DCPP HighResMIP
GFDL GFDL-GLOBAL-LBL 2019 RFMIP
INPE BESM-2-9 2019 CMIP DCPP ScenarioMIP
IPSL IPSL-CM7A-ATM-HR 2019 HighResMIP
IPSL-CM7A-ATM-LR 2019 HighResMIP
MESSy-Consortium EMAC-2-53-Vol 2017 CMIP VolMIP No data for institution
EMAC-2-54-AerChem 2018 AerChemMIP CMIP
MIROC MIROC-ES2H-NB 2019 AerChemMIP CMIP
NICAM16-9D-L78 2017 CFMIP CMIP
MOHC NERC UKESM1-0-MMh 2018 AerChemMIP C4MIP CMIP ScenarioMIP Data not expected
UKESM1-ice-LL 2019 ISMIP6 Processing in progress
MPI-M ICON-ESM-LR 2017 CMIP OMIP SIMIP
NASA-GISS GISS-E2-2-H 2021 CMIP SIMIP ScenarioMIP
NASA-GSFC No models registered (there is input4MIPs data)
NCAR CESM2-SE 2019 CMIP, HighResMIP
NCC NorESM2-HH 2018 CMIP HighResMIP
NorESM2-LME 2017 C4MIP CMIP GeoMIP LUMIP OMIP
NorESM2-LMEC 2017 AerChemMIP CMIP
NorESM2-MH 2017 AerChemMIP CFMIP CMIP DAMIP OMIP RFMIP ScenarioMIP
PCMDI PCMDI-test-1-0 Testing record
PNNL-WACCEM CAM-MPAS-HR 2018 HighResMIP No data for institution
CAM-MPAS-LR 2018 HighResMIP
UofT UofT-CCSM4 2014 CMIP PMIP No data for institution
UTAS CSIRO-Mk3L-1-3 2006 CMIP PMIP No data for institution

There are also the following recent additions (2020 and 2021 release years)

Institution ID Source ID Release Year Activity Participation Notes
CSIRO-COSIMA ACCESS-OM2 2020 OMIP No data for institution
ACCESS-OM2-025 2020 OMIP
IPSL IPSL-CM6A-MR025 2021 CMIP
IPSL-CM6A-MR1 2021 CMIP
@durack1
Copy link
Member

durack1 commented Jul 13, 2021

@matthew-mizielinski @taylor13 let's centralize discussions here. As I noted, I already have code that pulls info from the CMIP6 (or 5, 3) indexes and will return information such as that found in durack1/CMIPOcean/CMIP_ESGF.json

@taylor13
Copy link
Collaborator

I suggest not purging any registered source_ids or institution_ids at this time, but I think that, If practical, we should:

  1. update the "cohort" classification for each model. I would only allow three options for CMIP6 models at this time: "registered" or "DECK" or "CMIP, DECK". Models would be designated "DECK" if they have contributed results from the following 4 experiments: amip, abrupt_4xCO2, 1pctCO2, and piControl (or esm-piControl). They would be designated "CMIP, DECK" if in addition they have contributed results from historical (or esm-hist or historical-cmip5) Otherwise they would continue to be designated "registered".
  2. encourage users to consult the "ESGF CMIP6 Data Holdings" summary at https://pcmdi.llnl.gov/CMIP6/ArchiveStatistics/esgf_data_holdings/ , and advising that "models classified as only "registered" may not have completed all the simulations needed for a baseline assessment of their suitability for use in climate research.
  3. Eliminate "cohort" from the current ESGF search interface.

Regarding the update of "cohort" classification, we can be guided by the initially suggested policy on cohort designations (https://goo.gl/zDHUk7; 9 January 2018


The only choices permitted under the “Model Cohort'' category are the following: DECK, CMIP6, CMIP5, CMIP3, CMIP2, CMIP1, “CMIP6-fringe”, and “Registered”. A “Model Cohort'' limits a search to models that meet certain MIP criteria (for example, completion of 4 DECK experiments plus the historical simulation is usually required to be included in the “CMIP6” cohort). The CMIP panel will record and update the “Model Cohorts'' that each source_id (i.e., model) belongs to the reference source_id CV found at WCRP-CMIP/CMIP6_CVs/CMIP6_source_id.json. Only models that qualify for at least one “Model Cohort'' shall be considered for inclusion in a search result. The following define the cohorts:

  1. A model that has registered it’s intention to participate in CMIP6 (at WCRP-CMIP/CMIP6_CVs) but has not qualified for any other cohort belongs to the “Registered” cohort.
  2. A model that completes all the DECK simulations belongs to the "DECK" cohort.
  3. A model that completes the DECK and CMIP6 historical simulations belongs to the "CMIP6" cohort. The CMIP Panel may choose to relax this requirement on a case-by-case basis and designate some models as belonging to the CMIP6 cohort even if only a subset of the “required” simulations has been completed.
  4. A model that fails to qualify for the “DECK” or “CMIP6” cohorts but performs at least one of the CMIP6 experiments and does meet the criteria of at least one of the endorsed MIPs in CMIP6 belongs to the "CMIP6-fringe" cohort.
  5. A model that participated in CMIP5, CMIP3, CMIP2, and/or CMIP1, belongs to the correspondingly named cohort.
  6. A model may belong to multiple cohorts (e.g., “CMIP6, DECK”)

@MartinaSt
Copy link

I agree with Karl for 1. and 2.

  1. ESGF search facet:
    I would keep it. The updated "Model Cohort" information is a quality criteria for a model contribution in terms of completeness and compliance to the CMIP6 guidelines. ESGF users benefit from the possibility to restrict the search without having to consult an external resource. And more importantly, we can encourage modeling centers to comply to CMIP guidelines by displaying this kind of quality flag for model contributions in the ESGF portals.
  • We might consider to rename the ESGF search facet. All of cause, only if practical.

@taylor13
Copy link
Collaborator

Yes, I agree the Model Cohort could provide information of value to users. The reasons for possibly removing it as a search facet are:

  1. Software would need to be written to automatically update the ESGF database (Solr?) to reflect the changes to the source_id CV (json file) made when a model's cohort status changed.
  2. All index nodes would have to implement the updates.
  3. If 1 and 2 cannot be accomplished practically, then the information contained in the current "Model Cohort" list on ESGF will be incorrect (since all models are currently only designated as being "registered"). If the information is wrong, perhaps we should hide it from the users by removing the facet. (Of course that will require changes that would affect all tier 1 nodes at least and also may be impractical.)

Perhaps Sasha might say if any of the above is based on my misunderstanding ESGF.

@durack1
Copy link
Member

durack1 commented Jul 15, 2021

@sashakames there is a query above directed your way

@sashakames
Copy link

We can achieve it if its worth the effort. (1) is easier to do than (2). It could take weeks at LLNL for scripts to complete for all our 5M replica records.
I'm open to dropping the facet

@sashakames
Copy link

Sorry, I must have too much else on my mind... There is a simple command to update all records that match a query. So each site just needs to re-run a query/update operation periodically. If we have new records published in the correct cohort, we can drop the need to make the corrections.

@taylor13
Copy link
Collaborator

So I take it 2 is easier than 1?

@sashakames
Copy link

sashakames commented Jul 15, 2021

Other way around (2) involves herding cats, should also mention we need to check for the performance implications of doing updates in bulk which complicates things

@taylor13
Copy link
Collaborator

Got it. Executing (2) is technically trivial; getting folks to execute it could be difficult. On the other hand (1) requires some effort by PCMDI to write scripts: 1) to periodically check the ESG database and update the source_id CV so that it reflects the true "cohort" status for each model, and then 2) to transfer the updates from the CV to ESG and correct the ESG archive's database index.

(Again, @sashakames, I've probably not understood, so please correct, as needed, the above.)

@sashakames
Copy link

I was thinking of 1.2 (esgf index update phase) being not too challenging for me to implement. The query part of 1.1: doesn't ChrisM's "Big Table" have this already - experiments for each model? so we could leverage that, but performing the queries I wouldn't consider too challenging, if need be.

To clarify the concern, the bulk updates might time out if there are 100000s of records to process for each in bulk. If this is problematic we would need to play with the granularity of update (eg do one experiment at a time).

Ideally once a model has changed cohort, we ask them to update their publisher config to have the cohort value set correctly, then we don't need to correct them again until the next change. And same goes for replica publishing.

@durack1
Copy link
Member

durack1 commented Jul 27, 2021

A specific case that needs to be accounted for is #512

@taylor13
Copy link
Collaborator

From WIP meeting discussion:

  • Tidy up the CV maybe at the beginning of CMIP7, but not regularly
  • Possibly add a "contributor" category to the cohort options. Once data from at least one CMIP6 experiment has been published to ESGF, a model will be considered in the "contributor" cohort.

@durack1
Copy link
Member

durack1 commented May 18, 2022

As part of #1066 models that have no published data on ESGF have been left as "cohort" = ["Registered"], whereas models that have data have been updated to "cohort" = ["Published"].

It would be possible to contact the modeling groups of the non-published models, not sure we'd want to deregister any specific model

@durack1
Copy link
Member

durack1 commented May 24, 2022

All models that do not currently have data available anywhere on ESGF now have the entry "cohort": ["Registered"].

All models that have data have an updated entry as "Published" - as per #1066

@durack1
Copy link
Member

durack1 commented Jun 2, 2022

An email was sent out today requesting an update for the 28 models that currently have no data published on ESGF. The request was for data to be published, or for deregistration to occur - once we have intel from these contacts, we can amend as required and close out this issue

@durack1
Copy link
Member

durack1 commented Jun 9, 2022

@matthew-mizielinski I am closing this as a dupe (somewhat) of #1050, which includes the table of 28 models that are registered but missing data published on ESGF which are now down to 14 in the updated table below. The process of identifying these, and either deregistering or awaiting an update for imminent publication is already underway and noted in #1076, #1078, #1079, #1083, and #1086, and the NorESM2* deregistrations - see #1079/#1084.

Updated 220701 - last merged PR #1126

count status source_id MIPs LLNL files ESGF datasets contact status
#1076 awaiting publication EC-Earth3-GrIS CMIP ISMIP6 PMIP - none
#1076 awaiting publication EC-Earth3-HR CMIP DCPP HighResMIP - none
#1083 awaiting update GFDL-GLOBAL-LBL RFMIP - none
#1078 awaiting publication IPSL-CM6A-ATM-ICO-HR HighResMIP - none
#1078 awaiting publication IPSL-CM6A-ATM-ICO-LR HighResMIP - none
#1078 awaiting publication IPSL-CM6A-ATM-ICO-MR HighResMIP - none
#1078 awaiting publication IPSL-CM6A-ATM-ICO-VHR HighResMIP - none
#1078 awaiting publication IPSL-CM6A-ATM-LR-REPROBUS AerChemMIP - none
#1078 awaiting publication IPSL-CM6A-MR025 CMIP - none
#1078 awaiting publication IPSL-CM6A-MR1 CMIP - none
#1105 awaiting publication CAM-MPAS-HR HighResMIP - none Ruby, Bryce, Koichi emailed
#1105 awaiting publication CAM-MPAS-LR HighResMIP - none Ruby, Bryce, Koichi emailed
#1116 #1117 awaiting publication AWI-ESM-2-1-LR CMIP PMIP - none Tido, Christian, Gerrit, Martin, Paul and Christopher emailed
#1093 deregistered NICAM16-9D-L78 CFMIP CMIP - none
#1087 deregistered NorESM2-HH CMIP HighResMIP - none
#1100 #1103 deregistered BNU-ESM-1-1 C4MIP CDRMIP CFMIP CMIP GMMIP GeoMIP OMIP RFMIP ScenarioMIP - none Duoying emailed
#1102 #1106 deregistered CESM2-SE CMIP HighResMIP - none Gokhan & Gary emailed
#1101 #1104 deregistered CNRM-ESM2-1-HR CMIP OMIP ScenarioMIP - none David, Gaelle, Laurent and Marie-Pierre emailed
#1111 #1112 deregistered EMAC-2-53-Vol CMIP VolMIP - none 5 addresses emailed
#1111 #1112 deregistered EMAC-2-54-AerChem AerChemMIP CMIP - none 5 addresses emailed
#1122 #1123 deregistered VRESM-1-0 CMIP DAMIP HighResMIP PMIP ScenarioMIP - none Francois/Pedro emailed - deregister 220630
#1086 #1124 deregistered UofT-CCSM4 CMIP PMIP - none dchandan pinged - deregister 220630
#1120 #1125 deregistered BESM-2-9 CMIP DCPP ScenarioMIP - none Andre & Paulo emailed - deregister 220630
#1121 #1126 deregistered CSIRO-Mk3L-1-3 CMIP PMIP - none Steve emailed - deregister 220630

@durack1 durack1 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2022
@durack1
Copy link
Member

durack1 commented Jun 13, 2022

@matthew-mizielinski I realised that closing this wasn't the best idea, as we need somewhere to keep track of the remaining unresolved/deregistrations, so will reopen and update the table above as required. 12 remaining questions to answer.

@durack1
Copy link
Member

durack1 commented Jul 1, 2022

@matthew-mizielinski et al, all models with no data and no intention to publish data imminently have now been deregistered, so I can close out this issue, with the remaining license updates to be dealt with by #1113

@durack1 durack1 closed this as completed Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants