Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata download fails on particular file #10

Open
justingist opened this issue Oct 13, 2016 · 13 comments
Open

Metadata download fails on particular file #10

justingist opened this issue Oct 13, 2016 · 13 comments

Comments

@justingist
Copy link

After downloading 123 metadata folders, it's just failing on end on the next one... I know I should have a lot more than this. Is there a way I can bypass the one that is failing or go back to it later? I cleared out the directory and started again from scratch and it produced the same result. It will stay on this for hours on end, with no end in sight.

@justingist
Copy link
Author

Update: I created some dummy files and now it is going along nicely again... will have to see what happens when I try and download the photos. I figured 123 was a bit low when I have 1,857 moments (372,500 photos)...

@deadcyclo
Copy link
Owner

Several people are having similar issues. Obviously the data quality narrative has isn't very good. Seems like a lot of people have pointers to missing (I'm guessing deleted) data. Great to hear you got it working again. Alternatively you could have opened moments/page-1.json file and found the reference to the failing moment and deleted it.

What was the name of the files you created dummy files? I'll try to determine if you should delete the dummy files, the whole directory or nothing at all to avoid issues when downloading the photos.

@justingist
Copy link
Author

I should add I think I got a 404 on it when I tried to open file in chrome, but wasn't sure if it was just because I wasn't using a rest client...

@deadcyclo
Copy link
Owner

Ah.. Position files shouldn't be an issue at all. They aren't used during image download at all...

@thederan
Copy link

I potentially have tons of broken moments (1.3 mil photos, not sure how many moments). I tried deleting the entries from page-1.json but there are just more and more moments that it keeps retrying forever, so trying to manually delete them seem futile. Can we change the script in a way that it gives up trying and move on to the next moment?

@deadcyclo
Copy link
Owner

Is it always the position that fails?

@deadcyclo
Copy link
Owner

@thederan Could you try the code in the allow-failure branch? I tried to make a version that would skip after some retries, but I can't test it right now. Test it, and let me know if it works.

@justingist
Copy link
Author

Just to update: I only had an issue with the 1 file, so the rest of the metadata completed just fine. Currently now downloading using an 8 core Azure VM... so far I think I've got 40GB of photos in!

@thederan
Copy link

@deadcyclo Yes, always failing at /moments/.../positions/?limit=1500

Thanks for the other branch! I reduced the retries from 10 to 2 and then after a number of more failures it finally got to a point where the moments weren't failing anymore.

Does the 1500 limit mean, if I have more moments it will get cut off?

@deadcyclo
Copy link
Owner

@thederan Great.

No. Basically, you can tell them how many results you want for a single request, and 1500 is the maximum amount of replies allowed in a single request on their side. So if you have more than 1500 moments, it will be split into multiple requests and multiple files, and you will get a moments/page-1.json, moments/page-2.json etc. depending on how many moments you actually have.

@thederan
Copy link

Now without error message the ripper seem to be stuck in an endless looping over the same /moments/.../positions/?cursor=... URL. Restarting the script it ends up looping over the same URL again eventually.

@deadcyclo
Copy link
Owner

@thederan I saw this potential issue when I was creating the script, but it didn't hit me since I didn't have any moments with more than 1500 positions. Basically, there is an error with pagination on the locations bit of the service.

Could you try replacing line 75 get_multiple(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-{cnt}.json")

with:
get_from_file_or_service(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-1.json")

and see if that fixes the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants