Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] export annotation set [Round 2] #2704

Merged
merged 37 commits into from
Jan 31, 2025
Merged

Conversation

panaC
Copy link
Member

@panaC panaC commented Dec 11, 2024

This PR allows :

For the CFI Selector : vivliostyle-cfi doesn't support CFI Range with the comma separator cf (only position supported)

This PR is on a blocking state as long as the CFI library issue is not resolved.

We leave the choice to use CFI for the moment
for the benefice to use a CssSelector refined by TextPositionSelector

Cf the spec made by Laurent : https://github.com/readium/annotations

I reintroduced the ProgressionSelector to keep this information across export/import in thorium annotation format. Progression is generated from the r2-navigator to chrome ui and not from the ui to the r2-navigator.

@panaC panaC self-assigned this Dec 11, 2024
@danielweck
Copy link
Member

yes I would also use the HTML body as the DOM tree root to compute text position and quote.
though interestingly, in principle anyway, the head could be omitted and the body in CFI and other indexed node path notations would be shifted ... which of course is not an issue in real world EPUB publications where the head pretty much exists all the time, especially as that's where the reading system injects stylesheets etc. :)

@panaC panaC requested a review from danielweck December 18, 2024 14:21
@panaC
Copy link
Member Author

panaC commented Dec 18, 2024

Hello @danielweck, I let you review this PR for the import/export annotationSet part

@panaC panaC marked this pull request as ready for review December 30, 2024 12:53
@danielweck
Copy link
Member

So, I have been running a bunch of tests and I found this bug when attempting to import 4 saved annotations in the same document, after deleting 2 of the original ones:

importAnnotationSet TypeError: Cannot read properties of undefined (reading 'type')
at http://localhost:8191/index_reader.js:13477:16
at createMatcherWithRefinement (http://localhost:8191/index_reader.js:22351:21)
at convertSelectorTargetToLocatorExtended (http://localhost:8191/index_reader.js:13496:22)

tests.annotation:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "urn:uuid:f53528cd-31fa-42b7-8f51-99afe1a4fae2",
  "type": "AnnotationSet",
  "generator": {
    "id": "https://github.com/edrlab/thorium-reader/releases/tag/v3.1.0-alpha.1",
    "type": "Software",
    "name": "Thorium 3.1.0-alpha.1",
    "homepage": "https://thorium.edrlab.org"
  },
  "generated": "2025-01-29T20:13:46.805Z",
  "title": "tests",
  "about": {
    "dc:identifier": [
      "urn:thorium:5e8bad02-6ac3-4ebd-a8d7-a68b7359eed2",
      "978-1-135-30619-9"
    ],
    "dc:format": "application/epub+zip",
    "dc:title": "LEARNING DISABILITIES",
    "dc:publisher": [
      "Routledge"
    ],
    "dc:creator": [
      "Bryant J.Cratty and Richard L.Goldman"
    ],
    "dc:date": ""
  },
  "items": [
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:de5c7924-82ac-4fba-abd4-113570d2c191",
      "created": "2025-01-29T20:12:06.393Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test4",
        "format": "text/plain",
        "color": "#D4C4FB",
        "tag": "test2",
        "highlight": "outline"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "OPS/xhtml/03_Title01.xhtml",
        "meta": {
          "headings": [],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "body",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 243,
              "end": 245
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 243,
            "end": 245
          },
          {
            "type": "ProgressionSelector",
            "value": 0.544386877457405
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:5fbb5eb5-d8de-4418-8c0a-7e4bda864b4e",
      "created": "2025-01-29T20:11:29.752Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test3",
        "format": "text/plain",
        "color": "#C1EAC5",
        "tag": "test1",
        "highlight": "strikethrough"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "OPS/xhtml/03_Title01.xhtml",
        "meta": {
          "headings": [],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "body",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 135,
              "end": 154
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 135,
            "end": 154
          },
          {
            "type": "ProgressionSelector",
            "value": 0.32963016055045874
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:061e66d7-0201-4572-9bdf-9e89efc932e4",
      "created": "2025-01-29T20:10:57.738Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test2",
        "format": "text/plain",
        "color": "#FEF3BD",
        "tag": "test2",
        "highlight": "underline"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "OPS/xhtml/03_Title01.xhtml",
        "meta": {
          "headings": [],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "body",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 73,
              "end": 132
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 73,
            "end": 132
          },
          {
            "type": "ProgressionSelector",
            "value": 0.20932994757536041
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:1d195e51-b3af-49d5-9ff8-21fc0262b9d2",
      "created": "2025-01-29T20:10:39.507Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test1",
        "format": "text/plain",
        "color": "#EB9694",
        "tag": "test1",
        "highlight": "solid"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "OPS/xhtml/03_Title01.xhtml",
        "meta": {
          "headings": [],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": ".title1a > i",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 1,
              "end": 8
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 48,
            "end": 55
          },
          {
            "type": "ProgressionSelector",
            "value": 0.12891136959370905
          }
        ]
      }
    }
  ]
}

@danielweck
Copy link
Member

Note that I successfully ran the same scenario in a different publication (4 annotations in a single document, obviously with different DOM Ranges):

test.annotation

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "urn:uuid:9f6178bd-b016-4c18-a8a2-bc9f58f3a70f",
  "type": "AnnotationSet",
  "generator": {
    "id": "https://github.com/edrlab/thorium-reader/releases/tag/v3.1.0-alpha.1",
    "type": "Software",
    "name": "Thorium 3.1.0-alpha.1",
    "homepage": "https://thorium.edrlab.org"
  },
  "generated": "2025-01-29T20:23:48.671Z",
  "title": "daniel",
  "about": {
    "dc:identifier": [
      "urn:thorium:839a4c73-4fdf-4437-99fb-599e6b97a5af",
      "urn:isbn:9781449328030"
    ],
    "dc:format": "application/epub+zip",
    "dc:title": "Accessible EPUB 3",
    "dc:publisher": [
      "O’Reilly Media, Inc."
    ],
    "dc:creator": [
      "Matt Garrish"
    ],
    "dc:date": "2012-02-20T00:00:00.000Z"
  },
  "items": [
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:5d67dcc9-6cd6-487e-bb32-c96e40f252a0",
      "created": "2025-01-29T20:23:23.274Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test4",
        "format": "text/plain",
        "color": "#FEF3BD",
        "tag": "test1",
        "highlight": "strikethrough"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "EPUB/index.xhtml",
        "meta": {
          "headings": [
            {
              "txt": "Accessible EPUB 3",
              "level": 1
            }
          ],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "#id2602563 > p:nth-child(2)",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 227,
              "end": 234
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 986,
            "end": 993
          },
          {
            "type": "TextQuoteSelector",
            "exact": "rademar",
            "prefix": "a t",
            "suffix": "k claim,"
          },
          {
            "type": "ProgressionSelector",
            "value": 0.5571244266055045
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:f6d61ee6-dab0-466d-b346-f4785c9e9ef2",
      "created": "2025-01-29T20:22:56.067Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test3",
        "format": "text/plain",
        "color": "#D4C4FB",
        "tag": "test2",
        "highlight": "outline"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "EPUB/index.xhtml",
        "meta": {
          "headings": [
            {
              "txt": "Accessible EPUB 3",
              "level": 1
            }
          ],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "#I_book_d1e1",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 1356,
              "end": 1431
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 1359,
            "end": 1434
          },
          {
            "type": "TextQuoteSelector",
            "exact": "O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol CA 95472\n\t\t\t",
            "prefix": "",
            "suffix": ""
          },
          {
            "type": "ProgressionSelector",
            "value": 0.7799393840104849
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:437b65f3-e5c6-4b09-a776-cd6625ebdf53",
      "created": "2025-01-29T20:22:35.483Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test2",
        "format": "text/plain",
        "color": "#C1EAC5",
        "tag": "test2",
        "highlight": "underline"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "EPUB/index.xhtml",
        "meta": {
          "headings": [
            {
              "txt": "Accessible EPUB 3",
              "level": 1
            }
          ],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": "#I_book_d1e1",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 492,
              "end": 518
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 495,
            "end": 521
          },
          {
            "type": "TextQuoteSelector",
            "exact": ".\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\tN",
            "prefix": "[email protected]",
            "suffix": "utshell"
          },
          {
            "type": "ProgressionSelector",
            "value": 0.46238122542595017
          }
        ]
      }
    },
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "urn:uuid:d81e21a5-8128-4e6c-8f2f-6d9c2b8167f1",
      "created": "2025-01-29T20:22:13.499Z",
      "type": "Annotation",
      "body": {
        "type": "TextualBody",
        "value": "test1",
        "format": "text/plain",
        "color": "#EB9694",
        "tag": "test1",
        "highlight": "solid"
      },
      "creator": {
        "id": "urn:uuid:395b3760-e3c6-4ea5-b9c5-35dae31e0d19",
        "type": "Organization",
        "name": ""
      },
      "target": {
        "source": "EPUB/index.xhtml",
        "meta": {
          "headings": [
            {
              "txt": "Accessible EPUB 3",
              "level": 1
            }
          ],
          "page": ""
        },
        "selector": [
          {
            "type": "CssSelector",
            "value": ".copyright",
            "refinedBy": {
              "type": "TextPositionSelector",
              "start": 9,
              "end": 17
            }
          },
          {
            "type": "TextPositionSelector",
            "start": 131,
            "end": 139
          },
          {
            "type": "TextQuoteSelector",
            "exact": " © 2012 ",
            "prefix": "Copyright",
            "suffix": "O’Reilly"
          },
          {
            "type": "ProgressionSelector",
            "value": 0.30229153014416776
          }
        ]
      }
    }
  ]
}

@danielweck
Copy link
Member

The EPUB which causes the problem is protected by LCP, and I notice that the annotation set does not contain annotations with selector TextQuoteSelector ... whereas the non-DRM EPUB does. @panaC does this ring a bell?

… import (LCP protected EPUB?), also added DOM Range normalisation prior to convertRange() to guard against ranges created outside the navigator
@danielweck
Copy link
Member

danielweck commented Jan 29, 2025

Adding if (textQuoteSelector) { seems to have fixed the import bug, but @panaC could you please confirm that this is because with LCP publications, this particular selector is intentionally left out?

EDIT: I found if (!isLcp) { ... now we just have to make sure that there are no other places where textQuoteSelector is used without guarding against undefined/null. Can you help @panaC ?

@danielweck
Copy link
Member

I found bugs in "creator" filtering, this affected export as well because of hidden / not-shown annotations in the list. I will merge this PR as-is and we can fix this later.

@danielweck danielweck merged commit b9afc76 into develop Jan 31, 2025
8 checks passed
@danielweck
Copy link
Member

Well done @panaC great work as usual! :)
I will now file a bug report in the issue tracker regarding the broken creator filtering

@danielweck danielweck deleted the feat/export-annotation-2 branch January 31, 2025 13:12
@danielweck
Copy link
Member

Follow-up issue #2758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants