Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(When) should Crossref metadata take precedence over already existing metadata? #94

Open
cbroschinski opened this issue Feb 19, 2016 · 0 comments

Comments

@cbroschinski
Copy link
Member

Stumbled upon an interesting case today with the latest contribution from TUHH (#93). As it turned out, the file was already enriched with all relevant metadata (in the old schema, though)
Here are the first 3 entries from the original file:

institution period euro doi is_hybrid publisher journal_full_title issn issn_print issn_electronic indexed_in_CrossRef pmid pmcid DOAJ license_ref
Hamburg TUHH 2015 694,76 10.1155/2015/898651 FALSE Hindawi Publishing Corporation Advances in Fuzzy Systems 1687-711X 1687-7101 1687-711X TRUE NA NA TRUE http://creativecommons.org/licenses/by/3.0/
Hamburg TUHH 2015 1248,46 10.3390/ma8010285 FALSE MPID Materials 1996-1944 NA 1996-1944 TRUE NA NA TRUE http://creativecommons.org/licenses/by/4.0/
Hamburg TUHH 2015 1874,25 10.1186/s13568-015-0122-7 FALSE Springer Open AMB Express 2191-0855 NA 2191-0855 TRUE 26054736 PMC4460186 TRUE http://creativecommons.org/licenses/by/4.0/

And here is the result from the automated metadata enrichment. As you can see, there are several differences in the publisher and issn columns (journal_title also often differs, although not in these first 3). On the other hand, the original file contributes much more information in the license column, where crossref will often report "NA" values.

"institution" "period" "euro" "doi" "is_hybrid" "publisher" "journal_full_title" "issn" "issn_print" "issn_electronic" "license_ref" "indexed_in_crossref" "pmid" "pmcid" "ut" "url" "doaj"
"Hamburg TUHH" 2015 694.76 "10.1155/2015/898651" FALSE "Hindawi Publishing Corporation" "Advances in Fuzzy Systems" "1687-7101" "1687-7101" "1687-711X" "http://creativecommons.org/licenses/by/3.0/" TRUE NA NA NA NA TRUE
"Hamburg TUHH" 2015 1248.46 "10.3390/ma8010285" FALSE "MDPI AG" "Materials" "1996-1944" NA "1996-1944" NA TRUE NA NA NA NA TRUE
"Hamburg TUHH" 2015 1874.25 "10.1186/s13568-015-0122-7 " FALSE "Springer Science + Business Media" "AMB Express" "2191-0855" NA "2191-0855" NA TRUE "26054736" "PMC4460186" NA NA TRUE

Question is: How should we proceed in such cases? Where should crossref (or Pubmed or DOAJ) imports overwrite existing data?
As for the current task, I decided to replace the journal_title, publisher and ISSNs with crossref imports but keep the license column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant