You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm crawling with a little bit less politeness configuration than the default and I'm frequently getting (1971 times in the 12 hours I've been crawling):
Mar 27, 2023 10:43:53 AM org.archive.modules.CrawlURI getPolitenessDelay
WARNING: politessDelay unset, returning default 5000 for https://www.unidavi.edu.br/fiqueAtento/2023/3/pedidos-vagas-1-2023-fora-do-prazo-07 (in thread 'ToeThread #163: https://www.unidavi.edu.br/fiqueAtento/2023/3/pedidos-vagas-1-2023-fora-do-prazo-07')
Is this expected? The configuration rules I've modified and that are related to politeness are:
<beanid="fetchHttp"class="org.archive.modules.fetcher.FetchHTTP">
<!-- <property name="timeoutSeconds" value="1200" /> -->
<propertyname="timeoutSeconds"value="300" /> <!-- 5 min -->
</bean>
<beanid="disposition"class="org.archive.crawler.postprocessor.DispositionProcessor">
<!-- <property name="delayFactor" value="5.0" /> -->
<propertyname="delayFactor"value="2.0" />
<!-- <property name="minDelayMs" value="3000" /> -->
<propertyname="minDelayMs"value="1000" /> <!-- 1 sec --><!-- <property name="respectCrawlDelayUpToSeconds" value="300" /> -->
<propertyname="respectCrawlDelayUpToSeconds"value="100" />
<!-- <property name="maxDelayMs" value="30000" /> -->
<propertyname="maxDelayMs"value="10000" /> <!-- 10 sec -->
</bean>
<beanid="frontier"class="org.archive.crawler.frontier.BdbFrontier">
<!-- <property name="snoozeLongMs" value="300000" /> -->
<propertyname="snoozeLongMs"value="250000" /> <!-- 2.5 min --><!-- <property name="retryDelaySeconds" value="900" /> -->
<propertyname="retryDelaySeconds"value="300" /> <!-- 5 min --><!-- <property name="maxRetries" value="30" /> -->
<propertyname="maxRetries"value="3" /> <!-- It should be incresed in case of large crawls (e.g. months) -->
</bean>
Thank you!
The text was updated successfully, but these errors were encountered:
Hi!
I'm crawling with a little bit less politeness configuration than the default and I'm frequently getting (1971 times in the 12 hours I've been crawling):
Is this expected? The configuration rules I've modified and that are related to politeness are:
Thank you!
The text was updated successfully, but these errors were encountered: