Skip to content

Commit

Permalink
feat: ignore HTTPS certificate errors (#72)
Browse files Browse the repository at this point in the history
* feat: ignore SSL and HTTPS errors

* docs: update CHANGELOG.md
  • Loading branch information
Patai5 authored Aug 12, 2024
1 parent 60077dd commit 5ff9938
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
4 changes: 2 additions & 2 deletions code/src/crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ export const createCrawler = async (config: Config) => {
const crawler = new PlaywrightCrawler({
launchContext: {
launchOptions: {
// TODO: Just headless
headless: true,
/** We intentionally ignore these errors, because some broken websites would otherwise not be scraped */
args: ['--ignore-certificate-errors'],
},
},
/**
Expand Down
3 changes: 2 additions & 1 deletion shared/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
This changelog tracks updates to both GTP Scraper and Extended GPT Scraper actors.

# 2024-07-30
# 2024-08-12
*Features*
- Added support for GPT-4o-mini model. (Extended GPT scraper)
- Set this model as the default one for the the *Pay Per Result* scraper with a set token limit.
- With this, the maximum token limit for the *Pay Per Result* scraper was increased by 150%.
- Ignore HTTPS errors, which will allow the scraper to work on broken websites with invalid certificates.

*Fixes*
- Fixed concurrency scaling issues that were causing the Actor to fail due to scaling too quickly.
Expand Down

0 comments on commit 5ff9938

Please sign in to comment.