-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models listed in the CVs without published data #1028
Comments
@matthew-mizielinski @taylor13 let's centralize discussions here. As I noted, I already have code that pulls info from the CMIP6 (or 5, 3) indexes and will return information such as that found in durack1/CMIPOcean/CMIP_ESGF.json |
I suggest not purging any registered source_ids or institution_ids at this time, but I think that, If practical, we should:
Regarding the update of "cohort" classification, we can be guided by the initially suggested policy on cohort designations (https://goo.gl/zDHUk7; 9 January 2018 The only choices permitted under the “Model Cohort'' category are the following: DECK, CMIP6, CMIP5, CMIP3, CMIP2, CMIP1, “CMIP6-fringe”, and “Registered”. A “Model Cohort'' limits a search to models that meet certain MIP criteria (for example, completion of 4 DECK experiments plus the historical simulation is usually required to be included in the “CMIP6” cohort). The CMIP panel will record and update the “Model Cohorts'' that each source_id (i.e., model) belongs to the reference source_id CV found at WCRP-CMIP/CMIP6_CVs/CMIP6_source_id.json. Only models that qualify for at least one “Model Cohort'' shall be considered for inclusion in a search result. The following define the cohorts:
|
I agree with Karl for 1. and 2.
|
Yes, I agree the Model Cohort could provide information of value to users. The reasons for possibly removing it as a search facet are:
Perhaps Sasha might say if any of the above is based on my misunderstanding ESGF. |
@sashakames there is a query above directed your way |
We can achieve it if its worth the effort. (1) is easier to do than (2). It could take weeks at LLNL for scripts to complete for all our 5M replica records. |
Sorry, I must have too much else on my mind... There is a simple command to update all records that match a query. So each site just needs to re-run a query/update operation periodically. If we have new records published in the correct cohort, we can drop the need to make the corrections. |
So I take it 2 is easier than 1? |
Other way around (2) involves herding cats, should also mention we need to check for the performance implications of doing updates in bulk which complicates things |
Got it. Executing (2) is technically trivial; getting folks to execute it could be difficult. On the other hand (1) requires some effort by PCMDI to write scripts: 1) to periodically check the ESG database and update the source_id CV so that it reflects the true "cohort" status for each model, and then 2) to transfer the updates from the CV to ESG and correct the ESG archive's database index. (Again, @sashakames, I've probably not understood, so please correct, as needed, the above.) |
I was thinking of 1.2 (esgf index update phase) being not too challenging for me to implement. The query part of 1.1: doesn't ChrisM's "Big Table" have this already - experiments for each model? so we could leverage that, but performing the queries I wouldn't consider too challenging, if need be. To clarify the concern, the bulk updates might time out if there are 100000s of records to process for each in bulk. If this is problematic we would need to play with the granularity of update (eg do one experiment at a time). Ideally once a model has changed cohort, we ask them to update their publisher config to have the cohort value set correctly, then we don't need to correct them again until the next change. And same goes for replica publishing. |
A specific case that needs to be accounted for is #512 |
From WIP meeting discussion:
|
As part of #1066 models that have no published data on ESGF have been left as It would be possible to contact the modeling groups of the non-published models, not sure we'd want to deregister any specific model |
An email was sent out today requesting an update for the 28 models that currently have no data published on ESGF. The request was for data to be published, or for deregistration to occur - once we have intel from these contacts, we can amend as required and close out this issue |
@matthew-mizielinski I am closing this as a dupe (somewhat) of #1050, which includes the table of 28 models that are registered but missing data published on ESGF which are now down to 14 in the updated table below. The process of identifying these, and either deregistering or awaiting an update for imminent publication is already underway and noted in #1076, #1078, #1079, #1083, and #1086, and the NorESM2* deregistrations - see #1079/#1084. Updated 220701 - last merged PR #1126
|
@matthew-mizielinski I realised that closing this wasn't the best idea, as we need somewhere to keep track of the remaining unresolved/deregistrations, so will reopen and update the table above as required. 12 remaining questions to answer. |
@matthew-mizielinski et al, all models with no data and no intention to publish data imminently have now been deregistered, so I can close out this issue, with the remaining license updates to be dealt with by #1113 |
Following discussion on #512 I've scraped together data from the ESGF search pages (list of source ids) and the source id list within the CVs to pull out the following table of models where no data appears to be available at the time of writing (July 2021).
This includes a number of institutions where no data has been published for their models and one institution without any models
There are a total of 28 models in the table below with a further 4 registered in the last 12-18 months.
I'm not currently advocating purging all of these, but I think it worth a discussion as to how to handle this
There are also the following recent additions (2020 and 2021 release years)
The text was updated successfully, but these errors were encountered: