Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty-print all String-like formats as strings, rather than sequences of characters #88

Open
archaephyrryx opened this issue Nov 2, 2023 · 2 comments

Comments

@archaephyrryx
Copy link
Contributor

archaephyrryx commented Nov 2, 2023

Currently, the ASCII string types we want to display as ascii are:

  • Any compound Format whose leaf-values are all base.ascii-char or base.ascii-char.* (glob, not regex)
    • (With the following exceptions: ???)
  • "base.asciiz-string" :=
    record([("string", repeat(not_byte(0x00))), ("null", is_byte(0x00))])
  • "tar.ascii-string" :=
    record([("string", repeat(not_byte(0x00))), ("padding", repeat1(is_byte(0x00)))])
  • "tar.ascii-string.opt0" :=
    record([("string", repeat(not_byte(0x00))), ("padding", repeat(is_byte(0x00)))])
  • "tar.ascii-string.opt0.nonempty" :=
    record([("string", repeat1(not_byte(0x00))), ("padding", repeat(is_byte(0x00)))])

Possible classifications

  • By direct properties of leaf elements
  • By name of interior elements (e.g. Seq, Repeat, Tuple, or Record whose non-implicit elements are all base.ascii-char(.*)?)
  • By name pattern-match (e.g. /ascii/ && /string/ or /ascii-string/ || /asciiz-string/)
  • By structural pattern match (e.g. simply-nested or top-level record with a field named "string" satisfying certain ascii-like properties)

Possible workarounds:
- Move format-specific string types to Base and enforce similar prefixing logic to base.ascii-char and derivatives. This may require renaming base.asciiz-string to base.ascii-string.cstr or similar

EDIT:

This should extend to Unicode strings as well, as utf8.string = repeat utf8.char is currently handled in sequences rather than string-form. This may be good enough for now, but if text formats are ever unified (i.e. ascii as a subset of UTF-8), this may be more relevant than it is now.

@mikeday
Copy link
Contributor

mikeday commented Nov 2, 2023

I'd like to see how far we can get by leveraging ascii-char and building up from there, the only complicated part is that some strings require a nul terminator, some have an optional nul terminator, and some are a fixed width and may include arbitrary nul characters. Perhaps ascii-char, ascii-nul, and ascii-non-nul would be sufficient?

@archaephyrryx archaephyrryx changed the title Handle all ASCII-String-like formats as first-class ASCII strings Pretty-print all String-like formats as strings, rather than sequences of characters Nov 8, 2023
@archaephyrryx
Copy link
Contributor Author

Now that we have tentative support of UTF-8 based on #99, we might consider broadening the criteria for 'string-treatment' a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants