Skip to content

Commit

Permalink
Increase DiskSpaceMonitor default pause threshold to 8 GiB
Browse files Browse the repository at this point in the history
When there's less than 5 GiB free BDB throws DiskLimitException which
Heritrix will likely be unable to handle gracefully and crawl job will
break in various ways. internetarchive#499
  • Loading branch information
ato committed Jan 23, 2023
1 parent d9f83bb commit 3986b14
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public class DiskSpaceMonitor implements ApplicationListener<ApplicationEvent> {
private static final Logger logger = Logger.getLogger(DiskSpaceMonitor.class.getName());

protected List<String> monitorPaths = new ArrayList<String>();
protected long pauseThresholdMiB = 500;
protected long pauseThresholdMiB = 8192;
protected CrawlController controller;
protected ConfigPathConfigurer configPathConfigurer;
protected boolean monitorConfigPaths = true;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -624,10 +624,13 @@ http://example.example/example
-->

<!-- DISK SPACE MONITOR:
Pauses the crawl if disk space at monitored paths falls below minimum threshold -->
Pauses the crawl if disk space at monitored paths falls below minimum threshold
Note: If there's less than 5 GiB free for state directory BDB will throw
an error which the crawl job will likely not be to fully recover from.
-->
<!--
<bean id="diskSpaceMonitor" class="org.archive.crawler.monitor.DiskSpaceMonitor">
<property name="pauseThresholdMiB" value="500" />
<property name="pauseThresholdMiB" value="8192" />
<property name="monitorConfigPaths" value="true" />
<property name="monitorPaths">
<list>
Expand Down

0 comments on commit 3986b14

Please sign in to comment.