epmc_search returns fewer fields than available in the API #57

arvi1000 · 2024-06-11T16:30:38Z

Thank you for this package, maintainers!

I notice that epmc_search doesn't return some of the useful fields that are available in the API. I think it would would be valuable to return all fields. For example, the API returns both the boolean hasTMAaccessionNumbers but also the accessionType, while the package returns only the former.

Example of different fields returned:

library(europepmc)
library(httr)

# get results for one id from the package and the api
package_result <- epmc_search("PMC10669250")
direct_api_result <-
  GET('https://www.ebi.ac.uk/europepmc/webservices/rest/search?', 
          query = list(query='PMC10669250',
                       resultType='lite',
                       format='json')
      ) |>
  content()

# compare fields returned
package_result |> names()
direct_api_result$resultList$result[[1]] |> unlist() |> names()

from the package:

 [1] "id"                    "source"                "pmcid"                 "title"                 "authorString"          "journalTitle"          "issue"                
 [8] "journalVolume"         "pubYear"               "journalIssn"           "pubType"               "isOpenAccess"          "inEPMC"                "inPMC"                
[15] "hasPDF"                "hasBook"               "hasSuppl"              "citedByCount"          "hasReferences"         "hasTextMinedTerms"     "hasDbCrossReferences" 
[22] "hasLabsLinks"          "hasTMAccessionNumbers" "firstIndexDate"        "firstPublicationDate"

from the API:

 [1] "id"                                "source"                            "pmcid"                             "fullTextIdList.fullTextId"        
 [5] "title"                             "authorString"                      "journalTitle"                      "issue"                            
 [9] "journalVolume"                     "pubYear"                           "journalIssn"                       "pubType"                          
[13] "isOpenAccess"                      "inEPMC"                            "inPMC"                             "hasPDF"                           
[17] "hasBook"                           "hasSuppl"                          "citedByCount"                      "hasReferences"                    
[21] "hasTextMinedTerms"                 "hasDbCrossReferences"              "hasLabsLinks"                      "hasTMAccessionNumbers"            
[25] "tmAccessionTypeList.accessionType" "firstIndexDate"                    "firstPublicationDate"

The text was updated successfully, but these errors were encountered:

njahn82 · 2024-06-12T15:35:10Z

Hi @arvi1000,
You're right, the default method only returns a subset of Europe PMC data. To access all data, use the raw option. Here's an example parser for your query:

library(europepmc)
library(tidyverse)
my_epmc_data <- epmc_search("PMC10669250", output = "raw")
#> 1 records found, returning 1

tibble::tibble(
  id = map_chr(my_epmc_data, "id"),
  tm_accession_type = map(my_epmc_data, "tmAccessionTypeList") |>
    map_chr("accessionType")
)
#> # A tibble: 1 × 2
#>   id          tm_accession_type
#>   <chr>       <chr>            
#> 1 PMC10669250 chebi

^{Created on 2024-06-12 with reprex v2.1.0}

arvi1000 added a commit to arvi1000/make-data-count that referenced this issue Jun 12, 2024

compare package and api. filed ropensci/europepmc#57

b39ff3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epmc_search returns fewer fields than available in the API #57

epmc_search returns fewer fields than available in the API #57

arvi1000 commented Jun 11, 2024

njahn82 commented Jun 12, 2024

epmc_search returns fewer fields than available in the API #57

epmc_search returns fewer fields than available in the API #57

Comments

arvi1000 commented Jun 11, 2024

njahn82 commented Jun 12, 2024