Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bibliography citeproc tests #273

Merged
merged 12 commits into from
Feb 5, 2025
Merged

Support bibliography citeproc tests #273

merged 12 commits into from
Feb 5, 2025

Conversation

PgBiel
Copy link
Contributor

@PgBiel PgBiel commented Feb 3, 2025

Fixes #229.

As a drive-by, affixes around bibliography <layout> are now included in the output. (This should be the only visible change of this PR to users.)

Initial inspiration was taken from #228, and includes its drive-by fix of affixes in bibliography layout, but the approach taken for test comparison is entirely different: I implemented code to generate HTML similar to that generated by citeproc for the citeproc tests. The generated HTML is not indented, and so indentation in the original test result is stripped for comparison.

Future work

This revealed a bug, it appears that formatting is always being applied to affixes by hayagriva (cf sort_VariousNameMacros1.txt, where bold is being applied to spaces around a bold group), but this can be fixed later.

Also, there are various test failures to investigate, for example "bugreports_parenthesis seems to have some extraneous spacing to the left on hayagriva's output, which is weird. Perhaps citeproc is compressing two consecutive spaces in a way we didn't think of.

In addition, I considered moving affix application to layout::render to apply it to anywhere else using bibliography, but it appears this is already manually handled by citation, so this breaks things.

Scripts used

Nushell file (parse.nu):

# Converts citeproc test in stdin into record with one column per section:
# { MODE: "citation | bibliography | ...", RESULT: "...", ... }
def parse-citeproc-test [] {
  parse --regex '(?:>>=+ (?P<section>[\w\-]+) =+>>
(?P<body>[\s\S]+?)
<<=+ [\w\-]+ =+<<)' | transpose --header-row -d
}

# Parse all tests in a folder (or current folder)
# Returns a table with columns name (test filename), MODE (citation / bibliography), RESULT, ...
def parse-citeproc-test-dir [dir: string = "."] {
  ls $dir | filter { $in.type == "file" } | get name | each { |name| { name: $name, results: ($"(open --raw $name)" | parse-citeproc-test) } } | flatten
}

# Converts parsed XML or HTML (from '$row.RESULT | from xml', which is a good approximation for our case)
# into a flattened list of element tag / attribute / text information. 
# Good to extract all kinds of elements.
def html-flattener []: record<tag: string, attributes: record, content: any> -> list<record<tag: string, attributes: record, text: any>> {
  let rec = $in
  mut queue = [$rec]
  mut results = []
  while $queue != [] {
    let next = $queue | last
    $queue = $queue | drop
    let desc = $next.content | describe

    let text = if $desc == "string" { $next.content }
    let attributes = if $next.attributes == null { {} } else { $next.attributes }
    $results = $results | append {tag: $next.tag, attributes: $attributes, text: $text}

    if $desc != "nothing" and $desc != "string" {
      # Content is a list of elements
      $queue = $queue | append $next.content
    }
  }

  return $results
}

# Receives a result from 'parse-citeproc-test-dir' and returns parsed bibliography tests only.
# The parsed_res column contains parsed HTML.
# The flattened_res column contains the flattened parsed HTML (from html-flattener), which is a list
# of element tag / attributes / text.
def parse-citeproc-bib-tests []: table -> table { 
  filter { $in.MODE? == "bibliography" }
  | insert parsed_res { if $in.RESULT? != null { $in.RESULT | from xml } }
  | insert flattened_res { if $in.parsed_res? == null { [] } else { get parsed_res | html-flattener } }
}

You can use this at processor-tests/humans with

source parse.nu
let parsed_bibs = parse-citeproc-test-dir | parse-citeproc-bib-tests

Which provides a table with columns name (filename), MODE (citation / bibliography), RESULT (test result), ..., as well as parsed_res (RESULT parsed as HTML (actually XML, but here that works) into a table) and flattened_res (flattened list of HTML element information)

The command below provides the list of HTML elements used:

$parsed_bibs.flattened_res | flatten | filter { $in.tag != null } | uniq
Output

╭────┬──────┬──────────────────────────────────────┬──────╮
│  # │ tag  │              attributes              │ text │
├────┼──────┼──────────────────────────────────────┼──────┤
│  0 │ div  │ ╭───────┬──────────────╮             │      │
│    │      │ │ class │ csl-bib-body │             │      │
│    │      │ ╰───────┴──────────────╯             │      │
│  1 │ div  │ ╭───────┬───────────╮                │      │
│    │      │ │ class │ csl-entry │                │      │
│    │      │ ╰───────┴───────────╯                │      │
│  2 │ i    │ {record 0 fields}                    │      │
│  3 │ span │ ╭───────┬──────────────────────────╮ │      │
│    │      │ │ style │ font-variant:small-caps; │ │      │
│    │      │ ╰───────┴──────────────────────────╯ │      │
│  4 │ div  │ ╭───────┬──────────────────╮         │      │
│    │      │ │ class │ csl-right-inline │         │      │
│    │      │ ╰───────┴──────────────────╯         │      │
│  5 │ b    │ {record 0 fields}                    │      │
│  6 │ div  │ ╭───────┬─────────────────╮          │      │
│    │      │ │ class │ csl-left-margin │          │      │
│    │      │ ╰───────┴─────────────────╯          │      │
│  7 │ div  │ ╭───────┬───────────╮                │      │
│    │      │ │ class │ csl-block │                │      │
│    │      │ ╰───────┴───────────╯                │      │
│  8 │ sup  │ {record 0 fields}                    │      │
│  9 │ span │ ╭───────┬──────────╮                 │      │
│    │      │ │ style │ baseline │                 │      │
│    │      │ ╰───────┴──────────╯                 │      │
│ 10 │ div  │ ╭───────┬────────────╮               │      │
│    │      │ │ class │ csl-indent │               │      │
│    │      │ ╰───────┴────────────╯               │      │
│ 11 │ span │ ╭───────┬────────────────────╮       │      │
│    │      │ │ style │ font-style:normal; │       │      │
│    │      │ ╰───────┴────────────────────╯       │      │
│ 12 │ span │ ╭───────┬──────────────────────╮     │      │
│    │      │ │ style │ font-variant:normal; │     │      │
│    │      │ ╰───────┴──────────────────────╯     │      │
│ 13 │ sub  │ {record 0 fields}                    │      │
╰────┴──────┴──────────────────────────────────────┴──────╯

TODO

  • Fix some bugs and differences on some failing tests (e.g. no formatting when the text is just whitespace...)
    • Actually, this formatting difference (from the sort_VariousNameMacros1.txt test) is probably a bug in hayagriva and best fixed separately. I think the problem is that formatting is being applied to affixes when it shouldn't.
  • Share some of the scripts and commands I used perhaps

tests/citeproc.rs Outdated Show resolved Hide resolved
tests/citeproc.rs Outdated Show resolved Hide resolved
tests/citeproc.rs Outdated Show resolved Hide resolved
@PgBiel PgBiel marked this pull request as ready for review February 4, 2025 23:59
@PgBiel PgBiel merged commit 1058be7 into main Feb 5, 2025
4 checks passed
@PgBiel PgBiel deleted the bib-citeproc-tests branch February 5, 2025 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Citeproc bibliography tests
3 participants