Skip to content

Commit

Permalink
#15 update example CSV file + fix take into account maxclaims + ignor…
Browse files Browse the repository at this point in the history
…e local dev file
  • Loading branch information
schuellersa committed Jul 26, 2021
1 parent e7cd5d1 commit d1eab02
Show file tree
Hide file tree
Showing 4 changed files with 86 additions and 32 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@ VishvasnewsFactCheckingSiteExtractor_extraction_failed.log
SnopesFactCheckingSiteExtractor_extraction_failed.log
EuvsdisinfoFactCheckingSiteExtractor_extraction_failed.log
PolitifactFactCheckingSiteExtractor_extraction_failed.log
TruthorfictionFactCheckingSiteExtractor_extraction_failed.log
CheckyourfactFactCheckingSiteExtractor_extraction_failed.log
output_dev_fatabyyano.csv
output_dev_vishvasnews.csv
output_dev_aap.csv
output_dev_fullfact.csv
output_dev_snopes.csv
output_dev_politifact.csv
TruthorfictionFactCheckingSiteExtractor_extraction_failed.log
output_dev_truthorfiction.csv
output_dev_checkyourfact.csv
5 changes: 4 additions & 1 deletion claim_extractor/extractors/checkyourfact.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,14 @@ def retrieve_listing_page_urls(self) -> List[str]:
return ["https://checkyourfact.com/page/1/"]

def find_page_count(self, parsed_listing_page: BeautifulSoup) -> int:
count = 26
count = 1
url = "https://checkyourfact.com/page/" + str(count + 1)
result = caching.get(url, headers=self.headers, timeout=10)
if result:
while result:
# each page 20 articles:
if (((count+1)*20)-20 >= self.configuration.maxClaims):
break
count += 1
url = "https://checkyourfact.com/page/" + str(count)
result = caching.get(url, headers=self.headers, timeout=10)
Expand Down
Loading

0 comments on commit d1eab02

Please sign in to comment.