Metadata download fails on particular file #10

justingist · 2016-10-13T14:15:33Z

After downloading 123 metadata folders, it's just failing on end on the next one... I know I should have a lot more than this. Is there a way I can bypass the one that is failing or go back to it later? I cleared out the directory and started again from scratch and it produced the same result. It will stay on this for hours on end, with no end in sight.

justingist · 2016-10-13T14:46:03Z

Update: I created some dummy files and now it is going along nicely again... will have to see what happens when I try and download the photos. I figured 123 was a bit low when I have 1,857 moments (372,500 photos)...

deadcyclo · 2016-10-13T14:54:32Z

Several people are having similar issues. Obviously the data quality narrative has isn't very good. Seems like a lot of people have pointers to missing (I'm guessing deleted) data. Great to hear you got it working again. Alternatively you could have opened moments/page-1.json file and found the reference to the failing moment and deleted it.

What was the name of the files you created dummy files? I'll try to determine if you should delete the dummy files, the whole directory or nothing at all to avoid issues when downloading the photos.

justingist · 2016-10-13T14:56:12Z

I think it was mostly this positions file:

Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second.
Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second.
Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second.
Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second.
Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second.
Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500

justingist · 2016-10-13T14:56:57Z

I should add I think I got a 404 on it when I tried to open file in chrome, but wasn't sure if it was just because I wasn't using a rest client...

deadcyclo · 2016-10-13T15:07:58Z

Ah.. Position files shouldn't be an issue at all. They aren't used during image download at all...

thederan · 2016-10-13T16:30:27Z

I potentially have tons of broken moments (1.3 mil photos, not sure how many moments). I tried deleting the entries from page-1.json but there are just more and more moments that it keeps retrying forever, so trying to manually delete them seem futile. Can we change the script in a way that it gives up trying and move on to the next moment?

deadcyclo · 2016-10-13T16:36:07Z

Is it always the position that fails?

deadcyclo · 2016-10-13T17:10:36Z

@thederan Could you try the code in the allow-failure branch? I tried to make a version that would skip after some retries, but I can't test it right now. Test it, and let me know if it works.

justingist · 2016-10-13T20:14:23Z

Just to update: I only had an issue with the 1 file, so the rest of the metadata completed just fine. Currently now downloading using an 8 core Azure VM... so far I think I've got 40GB of photos in!

thederan · 2016-10-14T00:23:22Z

@deadcyclo Yes, always failing at /moments/.../positions/?limit=1500

Thanks for the other branch! I reduced the retries from 10 to 2 and then after a number of more failures it finally got to a point where the moments weren't failing anymore.

Does the 1500 limit mean, if I have more moments it will get cut off?

deadcyclo · 2016-10-14T08:09:58Z

@thederan Great.

No. Basically, you can tell them how many results you want for a single request, and 1500 is the maximum amount of replies allowed in a single request on their side. So if you have more than 1500 moments, it will be split into multiple requests and multiple files, and you will get a moments/page-1.json, moments/page-2.json etc. depending on how many moments you actually have.

thederan · 2016-10-16T16:55:23Z

Now without error message the ripper seem to be stuck in an endless looping over the same /moments/.../positions/?cursor=... URL. Restarting the script it ends up looping over the same URL again eventually.

deadcyclo · 2016-10-18T12:46:03Z

@thederan I saw this potential issue when I was creating the script, but it didn't hit me since I didn't have any moments with more than 1500 positions. Basically, there is an error with pagination on the locations bit of the service.

Could you try replacing line 75 get_multiple(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-{cnt}.json")

with:
get_from_file_or_service(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-1.json")

and see if that fixes the issue for you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata download fails on particular file #10

Metadata download fails on particular file #10

justingist commented Oct 13, 2016

justingist commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

justingist commented Oct 13, 2016

justingist commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

thederan commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

justingist commented Oct 13, 2016

thederan commented Oct 14, 2016

deadcyclo commented Oct 14, 2016

thederan commented Oct 16, 2016

deadcyclo commented Oct 18, 2016

Metadata download fails on particular file #10

Metadata download fails on particular file #10

Comments

justingist commented Oct 13, 2016

justingist commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

justingist commented Oct 13, 2016

justingist commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

thederan commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

deadcyclo commented Oct 13, 2016

justingist commented Oct 13, 2016

thederan commented Oct 14, 2016

deadcyclo commented Oct 14, 2016

thederan commented Oct 16, 2016

deadcyclo commented Oct 18, 2016