Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion Error: found 0 References section(s) but expected one and only one #65

Open
jkrybicki opened this issue Apr 13, 2018 · 3 comments

Comments

@jkrybicki
Copy link

Uploading an odt file (attached in zip) produces the above error; uploading a docx file (attached) produces this error: ERROR: nu.xom.ParsingException: cvc-datatype-valid.1.2.1: '30j0zll' is not a valid value for 'NCName'.
Rybicki_Polysystem.docx
Rybicki_Polysystem.zip

@dodinh
Copy link
Contributor

dodinh commented Sep 28, 2018

I also stumbled upon this problem (at least the DOCX part). The problem can easily be solved on the user side, as follows:

  1. Open the DOCX in Word
  2. Click somewhere editable
  3. Ribbon-band > Insert > Bookmarks
  4. Activate "Show hidden bookmarks"
  5. Remove all bookmarks
  6. Save and upload

On the dev side: I have no idea how hard it is to exclude bookmarks from validation, or if they even produce problems with the Convalidator.

@jkrybicki
Copy link
Author

jkrybicki commented Sep 29, 2018 via email

@mpetris
Copy link
Contributor

mpetris commented Oct 11, 2018

On the dev side: I have no idea how hard it is to exclude bookmarks from validation, or if they even produce problems with the Convalidator.

Since the error occurs during validation I think it is not a conversion problem. But the bookmarks should be excluded from the results nevertheless.
There are three places where this could happen:

  1. in the docx XSLT of the conversion profile of the dhconvalidator in the TEI Stylesheets
  2. in the DocxInputConverter
  3. in the DocxOutputConverter

For an experienced XSLT programmer 1. would be a good option. The advantage is that one would get the bookmark handling even when working directly with the Stylesheets or the Oxgarage API. The disadvantage is that this XSLT development can get pretty complex and one would need strong tool support like the Oxygen debugger to handle it. My impression was that the DOCX Stylesheet is designed to convert as much as possible from the original DOCX to TEI. Validation was not a major concern. This has the advantage that even seldom used stuff gets converted but the disadvantage that all unwanted stuff even when seldom used needs to be excluded afterwards.
To decide between 2. and 3. one would need to look at the XML to decide which conversion state would be easier to tweak. With option 2. one would work on the DOCX XML directly which can be awful. With option 3. one would work on the TEI XML which is generally easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants