Skip to content
This repository has been archived by the owner on Nov 26, 2024. It is now read-only.

get_captures.py returns no ids #11

Open
gpip opened this issue Sep 28, 2016 · 1 comment
Open

get_captures.py returns no ids #11

gpip opened this issue Sep 28, 2016 · 1 comment

Comments

@gpip
Copy link

gpip commented Sep 28, 2016

Hello,

I'm trying out the utilities you provide and noticed that get_captures.py is not returning the capture ids, maybe the json format changed since that was initially created? I ran it as python get_captures.py data-and-utilities/items/pd_items_1.ndjson captures_pd_items_1.json (data-and-utilities is your repo) and got:

50000 items with no captures
0 items with invalid captures
Wrote 50000 lines to captures_pd_items_1.json

In order to get it to return the IDs I patched get_captures as shown next:

--- get_captures.py 2016-09-28 19:06:39.000000000 -0300
+++ get_captures_gp.py  2016-09-28 19:17:36.000000000 -0300
@@ -32,8 +32,10 @@

     # Retrieve capture ids of item's first capture
     captureId = ""
-    if "captureIds" in item and len(item["captureIds"]) > 0:
-        captureId = item["captureIds"][0].strip()
+    if "captures" in item and len(item["captures"]) > 0:
+        match = re.match(imageURLPattern, item['captures'][0])
+        if match:
+            captureId = match.groups()[0]
     else:
         noCaptureCount += 1

New results:

$ python get_captures_gp.py data-and-utilities/items/pd_items_1.ndjson captures_pd_items_1.json
0 items with no captures
0 items with invalid captures
Wrote 50000 lines to captures_pd_items_1.json
beefoo added a commit that referenced this issue Sep 29, 2016
@beefoo
Copy link
Contributor

beefoo commented Sep 29, 2016

Hey, yes, it does seem like the json format has changed. As a short-term fix, I updated the data dump link to the format that I used.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants