Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function parse_medline_xml only return the first affiliation if author has multiple affiliations. #160

Closed
ZhangWoW123 opened this issue Oct 25, 2024 · 3 comments
Labels

Comments

@ZhangWoW123
Copy link
Contributor

Hi team,

Describe the bug
I encountered another issue when using the package to extract PubMed affiliation information from XML files. When author has multiple affiliations, the parse_medline_xml function will only extract the first affiliation.

To Reproduce
An example of this issue is PMID 39029952. In the XML file, the section is structured as follows. Each author has multiple affinations

<AuthorList CompleteYN="Y">
  <Author ValidYN="Y">
    <LastName>Kim</LastName>
    <ForeName>Jennifer</ForeName>
    <Initials>J</Initials>
    <AffiliationInfo>
      <Affiliation>Graduate Program in Neuroscience, University of British Columbia, Vancouver, Canada.</Affiliation>
    </AffiliationInfo>
    <AffiliationInfo>
      <Affiliation>Djavad Mowafaghian Centre for Brain Health, Vancouver, Canada.</Affiliation>
    </AffiliationInfo>
  </Author>
  ...
</AuthorList>

The medline_parser.parse_author_affiliation use author.find("AffiliationInfo/Affiliation") to find the affilation infromation. However, the find will only return one object (i.e. first element). Thus, the first affiliation is returned.

Expected behavior
I expect the parser to return all author's affiliations as a list. Might consider changing the author.find("AffiliationInfo/Affiliation").text with list(chain(*([c.text] for c in author.findall("AffiliationInfo/Affiliation"))))?

Screenshots
Screenshot 2024-10-24 at 11 04 33 PM

XML file example
pmid_39029952.txt

Thank you all for the great support.

@Michael-E-Rose
Copy link
Collaborator

I expect the parser to return all author's affiliations as a list. Might consider changing the author.find("AffiliationInfo/Affiliation").text with list(chain(*([c.text] for c in author.findall("AffiliationInfo/Affiliation"))))?

This looks like the solution to me. Do you want to provide a PR? Ideally with an additional test case in https://github.com/titipata/pubmed_parser/blob/master/tests/test_medline_parser.py#L33.

@ZhangWoW123
Copy link
Contributor Author

@Michael-E-Rose,

Sure, I created the PR #162

@Michael-E-Rose
Copy link
Collaborator

Thank you for your service!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants