Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop argument all_paragraph from parse_pubmed_paragraph() #153

Merged
merged 1 commit into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 3 additions & 10 deletions pubmed_parser/pubmed_oa_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ def parse_pubmed_references(path):
return dict_refs


def parse_pubmed_paragraph(path, all_paragraph=False):
def parse_pubmed_paragraph(path):
"""
Give path to a given PubMed OA file, parse and return
a dictionary of all paragraphs, section that it belongs to,
Expand All @@ -377,13 +377,6 @@ def parse_pubmed_paragraph(path, all_paragraph=False):
----------
path: str
A string to an XML path.
all_paragraph: bool
By default, this function will only append a paragraph if there is at least
one reference made in a paragraph (to aviod noisy parsed text).
A boolean indicating if you want to include paragraph with no references made or not
if True, include all paragraphs
if False, include only paragraphs that have references
default: False

Return
------
Expand Down Expand Up @@ -421,8 +414,8 @@ def parse_pubmed_paragraph(path, all_paragraph=False):
"section": section,
"text": paragraph_text,
}
if len(ref_ids) >= 1 or all_paragraph:
dict_pars.append(dict_par)

dict_pars.append(dict_par)

return dict_pars

Expand Down
2 changes: 1 addition & 1 deletion tests/test_pubmed_oa_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def test_parse_pubmed_paragraph():
paragraphs = pp.parse_pubmed_paragraph(pubmed_xml_3460867)
assert isinstance(paragraphs, list)
assert isinstance(paragraphs[0], dict)
assert len(paragraphs) == 29, "Expected number of paragraphs to be 29"
assert len(paragraphs) == 58, "Expected number of paragraphs to be 58"
assert (
len(paragraphs[0]["reference_ids"]) == 11
), "Expected number of references in the first paragraph to be 11"
Expand Down
Loading