Models listed in the CVs without published data #1028

matthew-mizielinski · 2021-07-13T10:09:02Z

Following discussion on #512 I've scraped together data from the ESGF search pages (list of source ids) and the source id list within the CVs to pull out the following table of models where no data appears to be available at the time of writing (July 2021).

This includes a number of institutions where no data has been published for their models and one institution without any models

There are a total of 28 models in the table below with a further 4 registered in the last 12-18 months.

I'm not currently advocating purging all of these, but I think it worth a discussion as to how to handle this

Institution ID	Source ID	Release Year	Activity Participation	Notes
AWI	AWI-ESM-2-1-LR	2019	CMIP PMIP
BNU	BNU-ESM-1-1	2016	C4MIP CDRMIP CFMIP CMIP GMMIP GeoMIP OMIP RFMIP ScenarioMIP	No data for institution
CNRM-CERFACS	CNRM-ESM2-1-HR	2017	CMIP OMIP ScenarioMIP
CSIR-Wits-CSIRO	VRESM-1-0	2016	CMIP DAMIP HighResMIP PMIP ScenarioMIP	No data for institution
EC-Earth-Consortium	EC-Earth3-GrIA	2019	CMIP ISMIP6 PMIP
	EC-Earth3-HR	2019	CMIP DCPP HighResMIP
GFDL	GFDL-GLOBAL-LBL	2019	RFMIP
INPE	BESM-2-9	2019	CMIP DCPP ScenarioMIP
IPSL	IPSL-CM7A-ATM-HR	2019	HighResMIP
	IPSL-CM7A-ATM-LR	2019	HighResMIP
MESSy-Consortium	EMAC-2-53-Vol	2017	CMIP VolMIP	No data for institution
	EMAC-2-54-AerChem	2018	AerChemMIP CMIP
MIROC	MIROC-ES2H-NB	2019	AerChemMIP CMIP
	NICAM16-9D-L78	2017	CFMIP CMIP
MOHC NERC	UKESM1-0-MMh	2018	AerChemMIP C4MIP CMIP ScenarioMIP	Data not expected
	UKESM1-ice-LL	2019	ISMIP6	Processing in progress
MPI-M	ICON-ESM-LR	2017	CMIP OMIP SIMIP
NASA-GISS	GISS-E2-2-H	2021	CMIP SIMIP ScenarioMIP
NASA-GSFC				No models registered (there is input4MIPs data)
NCAR	CESM2-SE	2019	CMIP, HighResMIP
NCC	NorESM2-HH	2018	CMIP HighResMIP
	NorESM2-LME	2017	C4MIP CMIP GeoMIP LUMIP OMIP
	NorESM2-LMEC	2017	AerChemMIP CMIP
	NorESM2-MH	2017	AerChemMIP CFMIP CMIP DAMIP OMIP RFMIP ScenarioMIP
PCMDI	PCMDI-test-1-0			Testing record
PNNL-WACCEM	CAM-MPAS-HR	2018	HighResMIP	No data for institution
	CAM-MPAS-LR	2018	HighResMIP
UofT	UofT-CCSM4	2014	CMIP PMIP	No data for institution
UTAS	CSIRO-Mk3L-1-3	2006	CMIP PMIP	No data for institution

There are also the following recent additions (2020 and 2021 release years)

Institution ID	Source ID	Release Year	Activity Participation	Notes
CSIRO-COSIMA	ACCESS-OM2	2020	OMIP	No data for institution
	ACCESS-OM2-025	2020	OMIP
IPSL	IPSL-CM6A-MR025	2021	CMIP
	IPSL-CM6A-MR1	2021	CMIP

durack1 · 2021-07-13T17:20:31Z

@matthew-mizielinski @taylor13 let's centralize discussions here. As I noted, I already have code that pulls info from the CMIP6 (or 5, 3) indexes and will return information such as that found in durack1/CMIPOcean/CMIP_ESGF.json

taylor13 · 2021-07-13T20:54:16Z

I suggest not purging any registered source_ids or institution_ids at this time, but I think that, If practical, we should:

update the "cohort" classification for each model. I would only allow three options for CMIP6 models at this time: "registered" or "DECK" or "CMIP, DECK". Models would be designated "DECK" if they have contributed results from the following 4 experiments: amip, abrupt_4xCO2, 1pctCO2, and piControl (or esm-piControl). They would be designated "CMIP, DECK" if in addition they have contributed results from historical (or esm-hist or historical-cmip5) Otherwise they would continue to be designated "registered".
encourage users to consult the "ESGF CMIP6 Data Holdings" summary at https://pcmdi.llnl.gov/CMIP6/ArchiveStatistics/esgf_data_holdings/ , and advising that "models classified as only "registered" may not have completed all the simulations needed for a baseline assessment of their suitability for use in climate research.
Eliminate "cohort" from the current ESGF search interface.

Regarding the update of "cohort" classification, we can be guided by the initially suggested policy on cohort designations (https://goo.gl/zDHUk7; 9 January 2018

The only choices permitted under the “Model Cohort'' category are the following: DECK, CMIP6, CMIP5, CMIP3, CMIP2, CMIP1, “CMIP6-fringe”, and “Registered”. A “Model Cohort'' limits a search to models that meet certain MIP criteria (for example, completion of 4 DECK experiments plus the historical simulation is usually required to be included in the “CMIP6” cohort). The CMIP panel will record and update the “Model Cohorts'' that each source_id (i.e., model) belongs to the reference source_id CV found at WCRP-CMIP/CMIP6_CVs/CMIP6_source_id.json. Only models that qualify for at least one “Model Cohort'' shall be considered for inclusion in a search result. The following define the cohorts:

A model that has registered it’s intention to participate in CMIP6 (at WCRP-CMIP/CMIP6_CVs) but has not qualified for any other cohort belongs to the “Registered” cohort.
A model that completes all the DECK simulations belongs to the "DECK" cohort.
A model that completes the DECK and CMIP6 historical simulations belongs to the "CMIP6" cohort. The CMIP Panel may choose to relax this requirement on a case-by-case basis and designate some models as belonging to the CMIP6 cohort even if only a subset of the “required” simulations has been completed.
A model that fails to qualify for the “DECK” or “CMIP6” cohorts but performs at least one of the CMIP6 experiments and does meet the criteria of at least one of the endorsed MIPs in CMIP6 belongs to the "CMIP6-fringe" cohort.
A model that participated in CMIP5, CMIP3, CMIP2, and/or CMIP1, belongs to the correspondingly named cohort.
A model may belong to multiple cohorts (e.g., “CMIP6, DECK”)

MartinaSt · 2021-07-14T06:36:20Z

I agree with Karl for 1. and 2.

ESGF search facet:
I would keep it. The updated "Model Cohort" information is a quality criteria for a model contribution in terms of completeness and compliance to the CMIP6 guidelines. ESGF users benefit from the possibility to restrict the search without having to consult an external resource. And more importantly, we can encourage modeling centers to comply to CMIP guidelines by displaying this kind of quality flag for model contributions in the ESGF portals.

We might consider to rename the ESGF search facet. All of cause, only if practical.

taylor13 · 2021-07-14T15:23:08Z

Yes, I agree the Model Cohort could provide information of value to users. The reasons for possibly removing it as a search facet are:

Software would need to be written to automatically update the ESGF database (Solr?) to reflect the changes to the source_id CV (json file) made when a model's cohort status changed.
All index nodes would have to implement the updates.
If 1 and 2 cannot be accomplished practically, then the information contained in the current "Model Cohort" list on ESGF will be incorrect (since all models are currently only designated as being "registered"). If the information is wrong, perhaps we should hide it from the users by removing the facet. (Of course that will require changes that would affect all tier 1 nodes at least and also may be impractical.)

Perhaps Sasha might say if any of the above is based on my misunderstanding ESGF.

durack1 · 2021-07-15T18:52:52Z

@sashakames there is a query above directed your way

sashakames · 2021-07-15T19:06:00Z

We can achieve it if its worth the effort. (1) is easier to do than (2). It could take weeks at LLNL for scripts to complete for all our 5M replica records.
I'm open to dropping the facet

sashakames · 2021-07-15T19:13:46Z

Sorry, I must have too much else on my mind... There is a simple command to update all records that match a query. So each site just needs to re-run a query/update operation periodically. If we have new records published in the correct cohort, we can drop the need to make the corrections.

taylor13 · 2021-07-15T19:17:24Z

So I take it 2 is easier than 1?

sashakames · 2021-07-15T20:07:44Z

Other way around (2) involves herding cats, should also mention we need to check for the performance implications of doing updates in bulk which complicates things

taylor13 · 2021-07-15T21:21:06Z

Got it. Executing (2) is technically trivial; getting folks to execute it could be difficult. On the other hand (1) requires some effort by PCMDI to write scripts: 1) to periodically check the ESG database and update the source_id CV so that it reflects the true "cohort" status for each model, and then 2) to transfer the updates from the CV to ESG and correct the ESG archive's database index.

(Again, @sashakames, I've probably not understood, so please correct, as needed, the above.)

sashakames · 2021-07-15T21:30:25Z

I was thinking of 1.2 (esgf index update phase) being not too challenging for me to implement. The query part of 1.1: doesn't ChrisM's "Big Table" have this already - experiments for each model? so we could leverage that, but performing the queries I wouldn't consider too challenging, if need be.

To clarify the concern, the bulk updates might time out if there are 100000s of records to process for each in bulk. If this is problematic we would need to play with the granularity of update (eg do one experiment at a time).

Ideally once a model has changed cohort, we ask them to update their publisher config to have the cohort value set correctly, then we don't need to correct them again until the next change. And same goes for replica publishing.

durack1 · 2021-07-27T15:21:10Z

A specific case that needs to be accounted for is #512

taylor13 · 2021-07-27T18:02:41Z

From WIP meeting discussion:

Tidy up the CV maybe at the beginning of CMIP7, but not regularly
Possibly add a "contributor" category to the cohort options. Once data from at least one CMIP6 experiment has been published to ESGF, a model will be considered in the "contributor" cohort.

durack1 · 2022-05-18T18:11:46Z

As part of #1066 models that have no published data on ESGF have been left as "cohort" = ["Registered"], whereas models that have data have been updated to "cohort" = ["Published"].

It would be possible to contact the modeling groups of the non-published models, not sure we'd want to deregister any specific model

durack1 · 2022-05-24T18:47:32Z

All models that do not currently have data available anywhere on ESGF now have the entry "cohort": ["Registered"].

All models that have data have an updated entry as "Published" - as per #1066

durack1 · 2022-06-02T20:44:30Z

An email was sent out today requesting an update for the 28 models that currently have no data published on ESGF. The request was for data to be published, or for deregistration to occur - once we have intel from these contacts, we can amend as required and close out this issue

durack1 · 2022-06-09T04:33:13Z

@matthew-mizielinski I am closing this as a dupe (somewhat) of #1050, which includes the table of 28 models that are registered but missing data published on ESGF which are now down to 14 in the updated table below. The process of identifying these, and either deregistering or awaiting an update for imminent publication is already underway and noted in #1076, #1078, #1079, #1083, and #1086, and the NorESM2* deregistrations - see #1079/#1084.

Updated 220701 - last merged PR #1126

count	status	source_id	MIPs	LLNL files	ESGF datasets	contact status
#1076	awaiting publication	EC-Earth3-GrIS	CMIP ISMIP6 PMIP	-	none
#1076	awaiting publication	EC-Earth3-HR	CMIP DCPP HighResMIP	-	none
#1083	awaiting update	GFDL-GLOBAL-LBL	RFMIP	-	none
#1078	awaiting publication	IPSL-CM6A-ATM-ICO-HR	HighResMIP	-	none
#1078	awaiting publication	IPSL-CM6A-ATM-ICO-LR	HighResMIP	-	none
#1078	awaiting publication	IPSL-CM6A-ATM-ICO-MR	HighResMIP	-	none
#1078	awaiting publication	IPSL-CM6A-ATM-ICO-VHR	HighResMIP	-	none
#1078	awaiting publication	IPSL-CM6A-ATM-LR-REPROBUS	AerChemMIP	-	none
#1078	awaiting publication	IPSL-CM6A-MR025	CMIP	-	none
#1078	awaiting publication	IPSL-CM6A-MR1	CMIP	-	none
#1105	awaiting publication	CAM-MPAS-HR	HighResMIP	-	none	Ruby, Bryce, Koichi emailed
#1105	awaiting publication	CAM-MPAS-LR	HighResMIP	-	none	Ruby, Bryce, Koichi emailed
#1116 #1117	awaiting publication	AWI-ESM-2-1-LR	CMIP PMIP	-	none	Tido, Christian, Gerrit, Martin, Paul and Christopher emailed

#1093	deregistered	NICAM16-9D-L78	CFMIP CMIP	-	none
#1087	deregistered	NorESM2-HH	CMIP HighResMIP	-	none
#1100 #1103	deregistered	BNU-ESM-1-1	C4MIP CDRMIP CFMIP CMIP GMMIP GeoMIP OMIP RFMIP ScenarioMIP	-	none	Duoying emailed
#1102 #1106	deregistered	CESM2-SE	CMIP HighResMIP	-	none	Gokhan & Gary emailed
#1101 #1104	deregistered	CNRM-ESM2-1-HR	CMIP OMIP ScenarioMIP	-	none	David, Gaelle, Laurent and Marie-Pierre emailed
#1111 #1112	deregistered	EMAC-2-53-Vol	CMIP VolMIP	-	none	5 addresses emailed
#1111 #1112	deregistered	EMAC-2-54-AerChem	AerChemMIP CMIP	-	none	5 addresses emailed
#1122 #1123	deregistered	VRESM-1-0	CMIP DAMIP HighResMIP PMIP ScenarioMIP	-	none	Francois/Pedro emailed - deregister 220630
#1086 #1124	deregistered	UofT-CCSM4	CMIP PMIP	-	none	dchandan pinged - deregister 220630
#1120 #1125	deregistered	BESM-2-9	CMIP DCPP ScenarioMIP	-	none	Andre & Paulo emailed - deregister 220630
#1121 #1126	deregistered	CSIRO-Mk3L-1-3	CMIP PMIP	-	none	Steve emailed - deregister 220630

durack1 · 2022-06-13T22:06:52Z

@matthew-mizielinski I realised that closing this wasn't the best idea, as we need somewhere to keep track of the remaining unresolved/deregistrations, so will reopen and update the table above as required. 12 remaining questions to answer.

durack1 · 2022-07-01T19:32:29Z

@matthew-mizielinski et al, all models with no data and no intention to publish data imminently have now been deregistered, so I can close out this issue, with the remaining license updates to be dealt with by #1113

durack1 mentioned this issue Apr 29, 2022

Add CMIP6 license/rights information per source_id #1050

Closed

durack1 mentioned this issue May 18, 2022

Add CMIP6 license per source #1066

Merged

durack1 added the Awaiting info label Jun 2, 2022

durack1 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2022

durack1 removed the Awaiting info label Jun 9, 2022

durack1 mentioned this issue Jun 9, 2022

Update published source_id entries with rights; augment license #1070

Closed

durack1 reopened this Jun 13, 2022

This was referenced Jun 21, 2022

Models yet to relax to CC BY 4.0 license #1113

Closed

Revised 5 AWI* source_id license histories #1117

Merged

This was referenced Jul 1, 2022

Deregistered source_id UofT-CCSM4 and institution_id UofT #1124

Merged

Deregistered source_id BESM-2-9 and institution_id INPE #1125

Merged

Deregistered source_id CSIRO-Mk3L-1-3 and institution_id UTAS #1126

Merged

durack1 closed this as completed Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models listed in the CVs without published data #1028

Models listed in the CVs without published data #1028

matthew-mizielinski commented Jul 13, 2021

durack1 commented Jul 13, 2021

taylor13 commented Jul 13, 2021

MartinaSt commented Jul 14, 2021

taylor13 commented Jul 14, 2021

durack1 commented Jul 15, 2021 •

edited

Loading

sashakames commented Jul 15, 2021

sashakames commented Jul 15, 2021

taylor13 commented Jul 15, 2021

sashakames commented Jul 15, 2021 •

edited

Loading

taylor13 commented Jul 15, 2021

sashakames commented Jul 15, 2021

durack1 commented Jul 27, 2021

taylor13 commented Jul 27, 2021

durack1 commented May 18, 2022 •

edited

Loading

durack1 commented May 24, 2022

durack1 commented Jun 2, 2022

durack1 commented Jun 9, 2022 •

edited

Loading

durack1 commented Jun 13, 2022

durack1 commented Jul 1, 2022 •

edited

Loading

Models listed in the CVs without published data #1028

Models listed in the CVs without published data #1028

Comments

matthew-mizielinski commented Jul 13, 2021

durack1 commented Jul 13, 2021

taylor13 commented Jul 13, 2021

MartinaSt commented Jul 14, 2021

taylor13 commented Jul 14, 2021

durack1 commented Jul 15, 2021 • edited Loading

sashakames commented Jul 15, 2021

sashakames commented Jul 15, 2021

taylor13 commented Jul 15, 2021

sashakames commented Jul 15, 2021 • edited Loading

taylor13 commented Jul 15, 2021

sashakames commented Jul 15, 2021

durack1 commented Jul 27, 2021

taylor13 commented Jul 27, 2021

durack1 commented May 18, 2022 • edited Loading

durack1 commented May 24, 2022

durack1 commented Jun 2, 2022

durack1 commented Jun 9, 2022 • edited Loading

durack1 commented Jun 13, 2022

durack1 commented Jul 1, 2022 • edited Loading

durack1 commented Jul 15, 2021 •

edited

Loading

sashakames commented Jul 15, 2021 •

edited

Loading

durack1 commented May 18, 2022 •

edited

Loading

durack1 commented Jun 9, 2022 •

edited

Loading

durack1 commented Jul 1, 2022 •

edited

Loading