Skip to content

Commit

Permalink
Increase scroll duration
Browse files Browse the repository at this point in the history
  • Loading branch information
janheinrichmerker committed Nov 21, 2023
1 parent 83677bd commit c26951a
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion archive_query_log/downloaders/warc.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,14 @@ def download_serps_warc(config: Config) -> None:
echo("No new/changed captures.")
return

changed_serps: Iterable[Serp] = changed_serps_search.scan()
changed_serps: Iterable[Serp] = (
changed_serps_search
# Downloading WARCs is very slow, so we keep track
# of the Elasticsearch query for a full day, assuming that
# 1000 WARCs can be downloaded in 24h.
.params(scroll="24h")
.scan()
)
changed_serps = safe_iter_scan(changed_serps)
# noinspection PyTypeChecker
changed_serps = tqdm(changed_serps, total=num_changed_serps,
Expand Down

0 comments on commit c26951a

Please sign in to comment.