You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The nested data structure of the eupmc_results.json output makes it a little tricky to get human-readable summaries of the results. Particularly the journal title per result, which is nested within journalInfo, and further nested within journal -> title (title incidentally is also a non-unique key, this key is also used to describe the article title). I'm not suggesting a change to the structuring of the results.json's, just that a simpler overview csv could be created for people who find JSON hard/intimidating, as a non-default option within getpapers.
As a workaround I have created a short R script to create this non-interactively from the JSON, although it's far from ideal as it requires the installation of an R package (jsonlite) which users probably won't have.
Script below:
#!/usr/bin/env Rscriptargs= commandArgs(trailingOnly=TRUE)
if (length(args)==0) {
stop("At least one argument must be supplied (input file).n", call.=FALSE)
} elseif (length(args)==1) {
# default output fileargs[2] ="summary.csv"
}
#install.packages('jsonlite')
library(jsonlite)
mymatrix<- fromJSON(args[1])
journals<-data.frame(rep(NA,dim(mymatrix)[1]))
for (iin1:dim(mymatrix)[1]) {
if (is.null(mymatrix$journalInfo[[i]]$journal[[1]]$title) ==TRUE) {
journals[i,1] <-"not published in a journal"
} else {
journals[i,1] <- (mymatrix$journalInfo[[i]]$journal[[1]]$title)
}
}
zzz<- cbind(as.character(mymatrix$pmcid),as.character(mymatrix$title),journals[1],as.character(mymatrix$pubYear),as.character(mymatrix$authorString),as.character(mymatrix$doi),as.character(mymatrix$hasPDF),as.character(mymatrix$hasSuppl),as.character(mymatrix$isOpenAccess),as.character(mymatrix$citedByCount),as.character(mymatrix$electronicPublicationDate))
colnames(zzz) <- c("pmcid","article.title","journal","pubYear","authorString","doi","hasPDF","hasSuppl","isOpenAccess","citedByCount","electronicPublicationDate")
write.csv(zzz,file=args[2])
This creates an overview csv file with these (much reduced) fields of information, including all the things that 90% of users are most likely to want to know e.g. journal, article title, year of publication - the basics
Right now we just take the eupmc API response object and serialise it to JSON. My personal opinion is that it might be out of scope for getpapers to do more with it, and that there's space for more tools in the ecosystem that do more. We could link to other tools that handle the output, including linking to this issue.
I've made a minor update to my script with a is.null in the for loop. Patents do not have a journal title and were creating NULLs that broke my simple for loop.
The nested data structure of the
eupmc_results.json
output makes it a little tricky to get human-readable summaries of the results. Particularly the journal title per result, which is nested withinjournalInfo
, and further nested withinjournal
->title
(title
incidentally is also a non-unique key, this key is also used to describe the article title). I'm not suggesting a change to the structuring of the results.json's, just that a simpler overview csv could be created for people who find JSON hard/intimidating, as a non-default option withingetpapers
.As a workaround I have created a short R script to create this non-interactively from the JSON, although it's far from ideal as it requires the installation of an R package (
jsonlite
) which users probably won't have.Script below:
Example command-line usage:
This creates an overview csv file with these (much reduced) fields of information, including all the things that 90% of users are most likely to want to know e.g. journal, article title, year of publication - the basics
The text was updated successfully, but these errors were encountered: