Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAT extractor: do not extract page title from embedded SVG images #37

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

sebastian-nagel
Copy link

Address #36:

  • do not use <title> elements embedded in <svg> as page/document title
  • use the first non-empty <title> element to set the page/document title. This is required for documents where the <title> is not enclosed in the <head> element.
    Note: HTML5 allows the <head> element to be ommitted, see https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
  • overwrite the page/document title by the content of a <title> element inside the <head> element
  • for text extraction: define the title element as block element
  • add unit test that correct title is extracted from a document which includes an embedded SVG image containing a title element
  • extend existing unit tests to test for proper title extraction

- add unit test that correct title is extracted from a document
  which includes an embedded SVG image containing a title element
- extend existing unit tests to test for proper title extraction
- do not use <title> elements embedded in <svg> as page/document title
- use the first non-empty <title> element to set the page/document
  title. This is required for documents where the <title> is not
  enclosed in the <head> element. Note: HTML5 allows the <head> element
  to be ommitted, see
   https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
- overwrite the page/document title by the content of a <title> element
  inside the <head> element
- for text extraction: define the title element as block element
@sebastian-nagel sebastian-nagel merged commit da324f9 into master Oct 18, 2024
5 checks passed
@sebastian-nagel sebastian-nagel deleted the ia-web-commons-36-title-embedded-svg branch October 18, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant