Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tagging-status.yml #801

Merged
merged 2 commits into from
Feb 15, 2025
Merged

Update tagging-status.yml #801

merged 2 commits into from
Feb 15, 2025

Conversation

John02139
Copy link
Contributor

Update status of asmeconf

Update status of asmeconf
included-in:
priority: 9
supported-through: [latest]
comments: "Use class option [captionpatch] with pdftex and do not load [spanish] option"
comments: "Use class option [captionpatch] with pdftex and do not load [spanish] option. With these settings,
asmeconf pdfs validate as PDF 2.0/A-4F. PDF 2.0/UA-2 validation is not possible at present."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that correct? I mean if you don't use subscaptions would it be UA-2 or what else fails? I would suggest to list what doesn't work if that can be easily enumerated (andd is not too much).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subcaptions throw several errors in the validator with UA-2, mainly something about captions needing to be the last item. Without subcaptions, one other error is still raised in the UA-2 validator, but I don't know precisely what causes that error. I might be able to isolate it with a couple hours work, but for now I did not want to speculate on the cause without exact information.

If partial information is helpful, I can mention subcaptions (but note - I am not sure if the issue is subcaptioning per se or something about the subcaption package).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the remaining error you get from verapdf here; perhaps @u-fischer or I can easily identify what is the cause.

Sub-captioning is something that is probably a bigger task. There is already #85 on that and that package needs to get proper tagging support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do let me know if you or Ulrike discover the cause! I can only ask chatGPT what the PDF codes mean :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-02-12 at 2 45 41 PM

@davidcarlisle
Copy link
Member

I realise the file isn't claiming UA/2 conformace yet but some notes on that, might come in useful later?

I tried the asmeconf test file with texlive 20205 pdflatex

there were severeal

Package hyperref Warning: Counter 'SECTION' or the representation '\theHSECTION `                                                                             
(hyperref)                don't exist. No target created. on input line 616.    

presumably somthing being uppercased at the wrong time.

If I look at the extracted xml structure tree with show_pdf_-tags, then it gets

a.xml:4993:3: error: element http://iso.org/pdf2/ssn^Caption not allowed

image

where the second Caption is not allowed in /Aside (if each of the Caption was in a Div, then it would validate)

verapdf also complains about the caption

<errorMessage>Structure element 761 0 obj contains Caption as its 2 child instead of first or last one</errorMessage>

but also several instances of bad Unicode mapping

 <context>root/document[0]/pages[5](1192 0 obj PDPage)/contentStream[0](1427 0 obj PDSemanticContentStream)/operators[2059]/usedGlyphs[11](RBHSNX+txexas RBHSNX+txexas 122 0 1264941544 5 true)</context>
              <errorMessage>The glyph has Unicode value 0, U+FEFF or U+FFFE, which is invalid by Unicode standard</errorMessage>

The math display seems off which seems a bit odd? there were no missing character warnings in the log.

image

@John02139
Copy link
Contributor Author

John02139 commented Feb 12, 2025

Thanks, David.

The uppercase results from the bug in the section tagging code reported here: 787.

The caption issues match what I see, and I believe the unicode mapping may account for the other error I saw (?). I believe those mappings are all traceable to newtx (which provides most of the math).

However, I cannot reproduce the missing characters you see in math (your screenshot). What I am seeing is this:

Screenshot 2025-02-12 at 4 25 14 PM

Is there an obvious way to figure out why our results differ? These example equations have been in use for many years without change.

@FrankMittelbach
Copy link
Member

Is there an obvious way to figure out why our results differ? These example equations have been in use for many years without change.

@davidcarlisle used the TL 2025 pretest, if that gives this output then I think we need to investigate and perhaps file a bug ... there have been changes in pdftex, but this looks as if some font is missing in the output, perhaps something wrong in the pretest setup?

@John02139
Copy link
Contributor Author

On a closer look, I am actually not seeing the glyph errors that David observes in the XML report from https://dev.verapdf-rest.duallab.com/. (I'm running pdftex-dev + UA-2.) The XML error I get is:

<rule specification="ISO 14289-2:2024" clause="8.2.2" testNumber="1" status="failed" failedChecks="191" tags="artifact">
            <description>Content that is not considered real shall be an artifact</description>
            <object>SESimpleContentItem</object>
            <test>isTaggedContent == true || parentsTags.contains('Artifact') == true</test>
            <check status="failed">
              <context>root/document[0]/pages[0](21 0 obj PDPage)/contentStream[0](274 0 obj PDSemanticContentStream)/content[126]/contentItem[0]</context>
            </check>
          </rule>

The others are the caption errors already noted.

@davidcarlisle
Copy link
Member

ah with tl2024 I get (from verapdf)

<context>root/document[0]/pages[2](511 0 obj PDPage)/contentStream[0](856 0 obj PDSemanticContentStream)/content[133]/contentItem[0]</context>    
              <errorMessage>Content is neither marked as Artifact nor tagged as real content</errorMessage>

and the math looks like math if I view the pdf.

so something broken with my tl2025 setup, that's probably an issue in the texlive pretest not an issue with your document

@davidcarlisle
Copy link
Member

@John02139 the glyph issues are a feature in the texlive 2025 test release, we can see the same with article class, so ignore that aspect, the change is unrelated to your test file. we'll raise that elsewhere

@John02139
Copy link
Contributor Author

With additional digging, the 13 occurrences of "Content is neither marked as Artifact nor tagged as real content" are caused by the following:

9 cases: \toprule, \midrule, and \bottomrule from the booktabs package (note: I have "firstaid" loaded in testphase which I believe patches booktabs)
3 cases: \footnote in text (basically all of them - two sequential footnotes produce just one error). I'm not sure what that's happening, since footnotes validate fine in other contexts.
1 case: \footnotemark associated with title block (I have not drilled down on this one enough to separate it from other code)

So, a short summary would be "UA-2 validation fails as a result of subcaptions, the booktabs package, and issues related to footnotes". Does that make sense?

@davidcarlisle
Copy link
Member

. Does that make sense?

Yes thanks although (at least for the table rules) probably they are not issues with this class and you'd get the same problem with that table in article class+booktabs? (we should check, and perhaps copy it to a booktabs package test if that's the case.)

@John02139
Copy link
Contributor Author

Using article class, pdftex, and UA-2, I see the same errors for the booktab rules AND for \footnote or \footnotetext. The error in footnotes occurs for the first footnote (or footnotetext) on a page but does not seem to repeat for subsequent footnotes on the same page.

Additionally, I find that the problem with subcaptions is linked to the [captionpatch] loaded, which is code from issue 720. Subcapations and the subcaption package seem to be ok if one does not change the caption fonts using the caption package (tagging breaks font change as discussed in #720). The patch is necessary in asmeconf to obtain the desired fonts for captions.

@u-fischer
Copy link
Member

Various rules (both tables and footnotes) are not correctly handled yet with pdftex. I would recommend to use lualatex, which automatically tags them as artifacts.

The caption problem is probably long term (but I didn't look yet at the details). But I will try to resolve the uppercase problem soon, I think I already have something, but it slipped my mind after I marked it as a bug.

@John02139
Copy link
Contributor Author

Lualatex is a longer term goal for asmeconf, which currently relies on newtxmath and does not load unicode-math. (The code uses a variety of fonts that pdftex can be expected to find in TeX Live without user intervention.)

For the purpose of updating the yml status file, I guess the following is accurate: "PDF 2.0/A-4F validation is possible using pdftex with the class option [captionpatch] and omitting the [spanish] option. PDF 2.0/UA-2 validation fails due to external issues related to the caption package and tagging of rules."

update tag status of asmeconf
@John02139
Copy link
Contributor Author

I've updated the pull request to account for the above discussion. I think it's ready to go.

@FrankMittelbach FrankMittelbach merged commit 5796f57 into latex3:main Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants