Metadata injection in epub3-to-epub3 does not update dc:identifier and title in content files #22

martinpub · 2021-05-20T09:59:35Z

Currently, the metadata injection is able to update dc:identifier and dc:title in package.opf, however, it is not doing anything to content files with regards to the corresponding <title> and <meta name="dc:identifier" content="xxx"/> values. This causes a mismatch in package.opf metadata and content file metadata, and it also causes non-valid EPUB 3 files according to the Nordic Guidelines.

Could this be easily added @bertfrees? I thought we discussed it at the spec stage, but perhaps we didn't. Also, it apparently hasn't been tested enough until now.

The text was updated successfully, but these errors were encountered:

bertfrees · 2021-05-20T10:57:14Z

This is because the EPUB specification doesn't say anything about metadata in content documents.

It could be a bit tricky to implement this because you need to detect that the metadata fields in content and package documents indeed correspond. For example we could say a title element in a content document corresponds with a dc:title element in the package document if

there is exactly one title element (not more than one is allowed in HTML)
there is exactly one dc:title element
their text matches

and something similar for dc:identifier vs. <meta name="dc:identifier">. We would then only update metadata fields in content documents that correspond to metadata fields in the package document that were updated (and maybe it should only be done when enabled with an option).

Alternatively, we could have an option similar to update-lang-attributes that synchronizes the metadata in the content documents with the package document:

derive <meta name="dc:identifier"> element from the dc:identifier element marked as the "unique identifier", and remove any existing <meta name="dc:identifier"> elements
derive title element from the dc:title element if there is exactly one (or from the first)

martinpub · 2021-05-20T11:59:13Z

Thanks @bertfrees, yes I see the implementation design issues here. I like the second option for its simplicity and similarity as a kind of "force" option like "update-lang-attributes" toggle. It could be named "update-contentdoc-meta"?

bertfrees · 2021-05-20T12:07:16Z

Yes I think I agree.

martinpub · 2021-05-21T09:43:04Z

That's great. Could you prioritise this over #18? This one is earlier in our production line. Also an estimate for when it could be done is very much appreciated, even though I know it's not always possible.

martinpub · 2021-06-01T11:24:29Z

Hi @bertfrees, any progress here? Just checking in.

bertfrees · 2021-06-01T11:59:37Z

Not yet.

bertfrees · 2021-06-21T14:51:41Z

I have implemented this. It's available on the master branch of daisy/pipeline-modules: daisy/pipeline-modules@78a2e6d.

martinpub · 2021-06-21T15:20:43Z

Great, thank you very much @bertfrees!

@kalaspuffar can I kindly request a cherry-pick to the fork?

kalaspuffar · 2021-06-22T06:41:40Z

Hi @bertfrees and @martinpub

Of course, I could help with this, but wouldn't it be safer if Bert creates a PR with this change, as usual, ensuring that all code will be merged.

Best regards
Daniel

martinpub · 2021-06-22T07:19:37Z

Sure @kalaspuffar. @bertfrees is that possible?

bertfrees · 2021-06-22T10:22:25Z

It is possible, and I'll do it, but I wish I wouldn't have to. I think we need to talk about how to organize this in the future.

ensuring that all code will be merged.

IMO, the best way to make sure that everything is there is to always base your fork onto the latest upstream. My suggestion is to maintain a branch that you always keep up to date with the latest upstream version of Pipeline (releases or development version) by doing git rebases. The rebases can be done on the level of the "super project" to avoid the extra technical burden of working with git-subrepo. A consequence of this is that some of the git tags that you create may become dangling commits, but that should be fine.

martinpub added high-prio metadata injection labels May 20, 2021

bertfrees self-assigned this May 21, 2021

bertfrees closed this as completed in daisy/pipeline-modules@78a2e6d Jun 21, 2021

bertfrees reopened this Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata injection in epub3-to-epub3 does not update dc:identifier and title in content files #22

Metadata injection in epub3-to-epub3 does not update dc:identifier and title in content files #22

martinpub commented May 20, 2021

bertfrees commented May 20, 2021

martinpub commented May 20, 2021

bertfrees commented May 20, 2021

martinpub commented May 21, 2021

martinpub commented Jun 1, 2021

bertfrees commented Jun 1, 2021

bertfrees commented Jun 21, 2021

martinpub commented Jun 21, 2021

kalaspuffar commented Jun 22, 2021

martinpub commented Jun 22, 2021

bertfrees commented Jun 22, 2021

Metadata injection in epub3-to-epub3 does not update dc:identifier and title in content files #22

Metadata injection in epub3-to-epub3 does not update dc:identifier and title in content files #22

Comments

martinpub commented May 20, 2021

bertfrees commented May 20, 2021

martinpub commented May 20, 2021

bertfrees commented May 20, 2021

martinpub commented May 21, 2021

martinpub commented Jun 1, 2021

bertfrees commented Jun 1, 2021

bertfrees commented Jun 21, 2021

martinpub commented Jun 21, 2021

kalaspuffar commented Jun 22, 2021

martinpub commented Jun 22, 2021

bertfrees commented Jun 22, 2021