-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arXiv.org PDFs denying access to User-Agent "getpapers/(TDM Crawler [email protected])" #167
Comments
We have clearly made a mistake here. I imagine that we're hammering them too hard/not following a delay between requests etc... Probably the answer is to
Also: did we get an email to [email protected] |
...or simply spoof the UserAgent header with some innocent string and move on. :-) |
I also bumped into this issue just now. A pity... |
Same problem here. It's important to maintain arXiv downloader working, thus I suggest
For starters, crawl delays at least of 15 seconds must be introduced.
Yes and yes. It is a bit strange that arXiv discourages automated access to /api, but this is probably (?) a bug. |
OK I will write to Paul. |
@petermr actually, you might be better-off emailing the lead software architect at arxiv (Erick Peirson). I've found him to be quite helpful & communicative: [email protected] https://erickpeirson.github.io/ |
Thanks Ross. |
Just to re-iterate that arXiv will block your IP (or your employer's) if you use the getpapers API as is to try and download PDFs. May I suggest switching off the PDF download for now until a refactoring to conform with the current API guidelines is in place? |
On Fri, Aug 9, 2019 at 10:39 AM Stephan Druskat ***@***.***> wrote:
Just to re-iterate that arXiv will block your IP (or your employer's) if
you use the getpapers API as is to try and download PDFs.
Thank you. I'll add a comment
… May I suggest switching off the PDF download for now until a refactoring
to conform with the current API guidelines is in place?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#167?email_source=notifications&email_token=AAFTCS3WPOGO2VNC33QO3ZTQDU3OVA5CNFSM4D7XGVWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36FS2A#issuecomment-519854440>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCSYJXCODE4VA3FT5J7TQDU3OVANCNFSM4D7XGVWA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
See #166 (comment). Note that
User-Agent: getpapers/TDM
seems to be working (for me) again (for now).The text was updated successfully, but these errors were encountered: