You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jensmeichler opened this issue
Jan 9, 2025
· 2 comments
Labels
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/memory-storage
Feature
Sometimes we want to test our crawler using static html files (for test mode). Therefore it would be benefitial to be able to not only use 'http' and 'https' but also 'file' as protocols.
I guess there are more usecases for this feature, but this is the only one I have 😅
Motivation
I am currently trying to build a crawler that crawls a static html page. It would make the testing easier to just call the file:// instead of having to serve it on http or https.
Ideal solution or implementation, and any additional constraints
Could be easily changed in packages/memory-storage/src/resource-clients/request-queue.ts:22.
Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
It's a bit more complicated, since the HTTP client we use by default (got-scraping) won't work with a file:// URL either.
The usual solution to this is to start a local web server, which you can easily do e.g. via npx http-server -o /path/to/static/content or similar dependencies, and scraping from the localhost instead.
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/memory-storage
Feature
Sometimes we want to test our crawler using static html files (for test mode). Therefore it would be benefitial to be able to not only use 'http' and 'https' but also 'file' as protocols.
I guess there are more usecases for this feature, but this is the only one I have 😅
Motivation
I am currently trying to build a crawler that crawls a static html page. It would make the testing easier to just call the file:// instead of having to serve it on http or https.
Ideal solution or implementation, and any additional constraints
Could be easily changed in
packages/memory-storage/src/resource-clients/request-queue.ts:22
.Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered: