-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gist Searching in GitHub Provider is now Rate Limited (and doesn't appear to be affected by OAuth authentication) #823
Comments
This is a critical issue for all social identity discovery system. |
Sustainable solution is to switch to using the Github gist API, and using This means the graph has to be extended with metadata like the |
Technically this could be defined by the |
I think GitHub is basically preventing scraping the gists. |
We should also take into account this information as part of the metadata of any given provider, so we can track how often we hit the API: https://docs.github.com/en/rest/rate-limit/rate-limit?apiVersion=2022-11-28 |
To get around this atm, you can also do |
Since the provider requires "state" to keep track of things, the interface needs to be updated to support arbitrary state for the providers. This can be as simple as supporting an arbitrary JSON value that the provider provides the type for. |
So, I need to implement a new field 'state' for the github identity provider. In this case, it would store the 'since' as a cursor of kinds, and we can use that combined with the gists REST API as a robust solution for searching for gists. This 'since' can become a field for the state, and the identity provider must be able to validate with any provided state. Did I understand this correctly, or am I missing something? |
Well I think it needs to be made generic, so that other providers in the future can be developed along the similar guidelines. That's why I'm saying you'd want to have a |
And then the provider abstract would have to provide default implementation methods of persisting the state. |
Remember this is like a "resumable" task if you think about it. |
Describe the bug
In our discovery system, we use
getIdentityData
method from github provider which will look up gists viahttps://gist.github.com/search
.It seems recently this now has some secondary rate limit applied (https://docs.github.com/en/rest/using-the-rest-api/troubleshooting-the-rest-api?apiVersion=2022-11-28), which is not solvable even with authenticated requests. Atm it is done unauthenticated, because it's basically a public page that we index over.
Gists are not currently searchable via the official GitHub API, so it seems that gist search has basically become impossible to index programmatically now. This is pretty bad. Especially because it's a secondary rate limit.
I tried doing things like:
But no use, it's just 429 too many requests.
The only other option right now is to change using the API for gists, and because there's no search feature, you have to basically index over all gists via the API, but we could use
since
to do this efficiently without having to repeat. https://docs.github.com/en/rest/gists/gists?apiVersion=2022-11-28#list-gists-for-a-user. Effectively only going over the new gists representing new claims. The timestamp acts like a cursor.To Reproduce
WARN:polykey.PolykeyAgent.task v0pocinl3mpo0195g4m2kd1t8k0:Failed - Reason: ErrorProviderCall: Provider responded with 429 Too Many Requests
show up in the agent logs.Expected behavior
It needs to work just like normal and discover without problems.
Screenshots
Notify maintainers
@tegefaulkes
The text was updated successfully, but these errors were encountered: