Function parse_medline_xml
only return the first affiliation if author has multiple affiliations.
#160
Labels
parse_medline_xml
only return the first affiliation if author has multiple affiliations.
#160
Hi team,
Describe the bug
I encountered another issue when using the package to extract PubMed affiliation information from XML files. When author has multiple affiliations, the
parse_medline_xml
function will only extract the first affiliation.To Reproduce
An example of this issue is PMID
39029952
. In the XML file, the section is structured as follows. Each author has multiple affinationsThe
medline_parser.parse_author_affiliation
useauthor.find("AffiliationInfo/Affiliation")
to find the affilation infromation. However, the find will only return one object (i.e. first element). Thus, the first affiliation is returned.Expected behavior
I expect the parser to return all author's affiliations as a list. Might consider changing the
author.find("AffiliationInfo/Affiliation").text
withlist(chain(*([c.text] for c in author.findall("AffiliationInfo/Affiliation"))))
?Screenshots
XML file example
pmid_39029952.txt
Thank you all for the great support.
The text was updated successfully, but these errors were encountered: