Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use streaming request & parser API in validate_manifest_uris #38

Open
mtratsiuk opened this issue Jun 1, 2020 · 1 comment
Open

Use streaming request & parser API in validate_manifest_uris #38

mtratsiuk opened this issue Jun 1, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mtratsiuk
Copy link
Contributor

validate_manifest_uris is used to validate file formats pointed by groundtruth_uri and taskdata_uri fields in manifest. Currently it fetches full file first and only then applies validation.

Those files could be quite large and we can improve validation performance/mem consumption by using streaming request and passing chunks into streaming json parser. Here is potential solution using ijson lib: https://github.com/hCaptcha/hmt-basemodels/blob/30-add-gt-models/basemodels/streaming_json.py

@gaieges

@mtratsiuk mtratsiuk added the enhancement New feature or request label Jun 1, 2020
@mtratsiuk mtratsiuk self-assigned this Jun 1, 2020
@gaieges
Copy link
Contributor

gaieges commented Jun 1, 2020

Neat. Lets get the validation rolled out and see how much of a pain point the non-streaming approach is.

cc: @e271828- as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants