Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epmc_search returns fewer fields than available in the API #57

Open
arvi1000 opened this issue Jun 11, 2024 · 1 comment
Open

epmc_search returns fewer fields than available in the API #57

arvi1000 opened this issue Jun 11, 2024 · 1 comment

Comments

@arvi1000
Copy link

Thank you for this package, maintainers!

I notice that epmc_search doesn't return some of the useful fields that are available in the API. I think it would would be valuable to return all fields. For example, the API returns both the boolean hasTMAaccessionNumbers but also the accessionType, while the package returns only the former.

Example of different fields returned:

library(europepmc)
library(httr)

# get results for one id from the package and the api
package_result <- epmc_search("PMC10669250")
direct_api_result <-
  GET('https://www.ebi.ac.uk/europepmc/webservices/rest/search?', 
          query = list(query='PMC10669250',
                       resultType='lite',
                       format='json')
      ) |>
  content()

# compare fields returned
package_result |> names()
direct_api_result$resultList$result[[1]] |> unlist() |> names()

from the package:

 [1] "id"                    "source"                "pmcid"                 "title"                 "authorString"          "journalTitle"          "issue"                
 [8] "journalVolume"         "pubYear"               "journalIssn"           "pubType"               "isOpenAccess"          "inEPMC"                "inPMC"                
[15] "hasPDF"                "hasBook"               "hasSuppl"              "citedByCount"          "hasReferences"         "hasTextMinedTerms"     "hasDbCrossReferences" 
[22] "hasLabsLinks"          "hasTMAccessionNumbers" "firstIndexDate"        "firstPublicationDate" 

from the API:

 [1] "id"                                "source"                            "pmcid"                             "fullTextIdList.fullTextId"        
 [5] "title"                             "authorString"                      "journalTitle"                      "issue"                            
 [9] "journalVolume"                     "pubYear"                           "journalIssn"                       "pubType"                          
[13] "isOpenAccess"                      "inEPMC"                            "inPMC"                             "hasPDF"                           
[17] "hasBook"                           "hasSuppl"                          "citedByCount"                      "hasReferences"                    
[21] "hasTextMinedTerms"                 "hasDbCrossReferences"              "hasLabsLinks"                      "hasTMAccessionNumbers"            
[25] "tmAccessionTypeList.accessionType" "firstIndexDate"                    "firstPublicationDate"    
arvi1000 added a commit to arvi1000/make-data-count that referenced this issue Jun 12, 2024
@njahn82
Copy link
Member

njahn82 commented Jun 12, 2024

Hi @arvi1000,
You're right, the default method only returns a subset of Europe PMC data. To access all data, use the raw option. Here's an example parser for your query:

library(europepmc)
library(tidyverse)
my_epmc_data <- epmc_search("PMC10669250", output = "raw")
#> 1 records found, returning 1

tibble::tibble(
  id = map_chr(my_epmc_data, "id"),
  tm_accession_type = map(my_epmc_data, "tmAccessionTypeList") |>
    map_chr("accessionType")
)
#> # A tibble: 1 × 2
#>   id          tm_accession_type
#>   <chr>       <chr>            
#> 1 PMC10669250 chebi

Created on 2024-06-12 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants