-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the scraper actually support basic auth? #78
Comments
Basically I'd expect that the scraper would:
I would also hope redirects would be respected. For example, the domain I'm trying to reach is internally redirecting to opaque domain (Cloudflare Worker). So I'd need for these headers to be sent there as well. |
The documentation is correct and the authentication is actually working - it's just happening behind the scenes through Scrapy's built-in The When you set these properties on a spider: spider.http_user = "testuser"
spider.http_pass = "testpass"
spider.http_auth_domain = "example.com" Scrapy's
You don't see this explicitly in the codebase because it's handled by Scrapy's middleware pipeline. The environment variables are read and set as spider attributes, then Scrapy's built-in auth middleware uses them to add the proper I've also taken the time to introduce a request interceptor middleware to debug the headers of each request, so you can check it out yourself. So your authentication should work as expected - the recent test addition Let me know if you need any clarification or have additional questions. |
Thank you for the clarification 👍 |
The documentation mentions two env variables:
DOCSEARCH_BASICAUTH_USERNAME
DOCSEARCH_BASICAUTH_PASSWORD
I was looking through the code to figure out how they are encoded into
Authorization
headers so I can set up my private internal site correctly. However, the only mention I see isdocumentation_spider.py
file.This file reads the environment variables but does not seem to do anything with them. I see these are class properties assigned to
http_user
andhttp_pass
so I tried searching the codebase for them, but did not find anything.Am I right to assume that this does not actually work? Is the documentation lying or did I miss some important piece?
Any clarification would help. Meanwhile, I hope #76 gets merged, that way I could specify
Authorization
header directly without relying on implementation.The text was updated successfully, but these errors were encountered: