Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling process flags in OBFL #27

Open
kalaspuffar opened this issue Jul 3, 2020 · 9 comments
Open

Handling process flags in OBFL #27

kalaspuffar opened this issue Jul 3, 2020 · 9 comments
Labels

Comments

@kalaspuffar
Copy link
Collaborator

kalaspuffar commented Jul 3, 2020

Hi @bertfrees and @PaulRambags

As we discussed in our meeting, I propose we add some metadata to handle specific processing instructions that we only can add as parameters to the tooling today.

All parameters added in the metadata definition below using prefix proc (processing instructions) can be found in https://github.com/mtmse/dotify.api/blob/master/src/org/daisy/dotify/api/formatter/FormatterConfiguration.java

<meta xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:proc="http://www.daisy.org/ns/2020/dotify/processing-instruction">
    <dc:title>Competitive Marketing via Subtle Messaging</dc:title>
    <dc:creator>Persson, Daniel</dc:creator>
    <dc:language>sv</dc:language>
    <dc:publisher>MTM</dc:publisher>
    <dc:source>DTB31337</dc:source>
    <proc:translationMode>uncontracted</proc:translationMode>
    <proc:locale>sv-SE</proc:locale>
    <proc:allowsTextOverflowTrimming>false</proc:allowsTextOverflowTrimming>
    <proc:allowsEndingVolumeOnHyphen>true</proc:allowsEndingVolumeOnHyphen>
    <proc:hyphenating>true</proc:hyphenating>
    <proc:markCapitalLetters>true</proc:markCapitalLetters>
    <proc:ignoredStyles>em,strong</proc:ignoredStyles>
</meta>

In the current implementation, only the mode, locale, markCapitalLetters, and ignoredStyle are variables added in Studio via code, configurable by the user, and available in different output templates.

This change ensures that the dotify library can produce a PEF document anytime later, just using the OBFL as input, which will improve the usefulness of the OBFL document.

My suggestion for the implementation later is to use these variables as the default variables but enable the option to override them during execution if you want to change the output after you've created the OBFL document.

Best regards
Daniel

@kalaspuffar kalaspuffar added the enhancement New feature or request label Jul 3, 2020
@PaulRambags
Copy link

Just some quick remarks:

  • The proc prefix needs to be defined as a namespace
  • Please explain what these processing instructions do. For instance, what is the meaning of proc:locale?

In the past I have been testing with the maximum number of iterations, which is currently a fixed value. This would also be a nice processing instruction. proc:maxNumberOfIterations

@bertfrees
Copy link

bertfrees commented Jul 3, 2020

I see, I thought the idea was more to standardize these options in OBFL itself. The change that you propose here could be a good intermediary solution. We don't even need to change OBFL for it. You can already include any metadata you want. And if you make sure "Dotify" is in the "proc" namespace URI, it is immediately obvious what the metadata is for.

@kalaspuffar
Copy link
Collaborator Author

And if you make sure "Dotify" is in the "proc" namespace URI, it is immediately obvious what the metadata is for.

Hi @bertfrees

I'm sorry but I don't really understand this paragraph. Can you elaborate?

Best regards
Daniel

@bertfrees
Copy link

You have not specified what the namespace URI of these metadata elements will be. I'm just saying that whatever you choose, you should make sure it has the word "Dotify" in it somewhere. This makes it immediately clear that they are Dotify settings.

@kalaspuffar
Copy link
Collaborator Author

Please explain what these processing instructions do. For instance, what is the meaning of proc:locale?

Hi @PaulRambags

The locale is the locale of the book, for instance this book is written in Swedish for people in Sweden. There is a different one that is Swedish for people in Finland.

When you run a book through the Dotify library you need to give it the language it should be processed plus if it should be contracted in any way. All other parameters are optional but those are mandatory.

The language in the locale is used to select the correct translation table.

The difference between the book language and the processing locale is that the book could be in a different language but still use the translation table for Swedish.

Best regards
Daniel

@kalaspuffar
Copy link
Collaborator Author

kalaspuffar commented Jul 3, 2020

Hi @bertfrees

What do you think about:

<meta xmlns:proc="http://www.daisy.org/ns/2020/dotify/processing-instruction">

Best regards
Daniel

@bertfrees
Copy link

bertfrees commented Jul 3, 2020

Please explain what these processing instructions do

@PaulRambags I think it's clear that these metadata elements map one to one to Dotify settings. So as far as I'm concerned they do not need to be explained, at least not in the OBFL specification. If the settings are not documented anywhere you could consider creating a web page specifically for it. Maybe there is already something in https://mtmse.github.io/dotify.api/latest/javadoc/index.html, or perhaps on the wiki?

@kalaspuffar I'm not sure about the "processing-instruction" because in XML land that means something else. Coming to think of it, using processing instructions could actually also be a solution to this problem. But since the meta element is available in OBFL, it's probably more appropriate to use that.

The form of the URI you chose is common for XML namespaces, but not for vocabularies. It's two different things. For one, I would drop the "ns". For metadata one usually doesn't use the term "namespace". I would also drop the year because it doesn't mean anything in this case. Also I've looked at some other existing URIs and they always seem to end with / or # or /#. This is probably related with how RDF works. I think it's a good idea to follow that pattern.

Finally, it's also not so common to use namespaced elements to specify metadata (dc is an exception). It's more common to have something like <meta name="proc:locale" content="sv-SE"/> (HTML) or <meta property="proc:locale">sv-SE</meta> (RDFa) or <meta property="proc:locale" content="sv-SE" /> (also RDFa). However the issue with this is that OBFL's meta element is what most other formats call metadata or head.

@bertfrees
Copy link

bertfrees commented Jul 10, 2020

Including Dotify instructions in OBFL is a good short term solution, however what we are trying to solve here is (partly) an important limitation of OBFL which I think should be properly addressed at some point, namely that it not provide enough detailed control over translation. Whether it is something that needs to be solved within OBFL itself is not clear yet, but it probably should.

There is a disclaimer in the text that says that OBFL is for controlling formatting (the F in OBFL), not translation:

Scope Limitations

OBFL is a braille formatting language. This includes areas specific to braille formatting, such as volume splitting. However, there are many other issues in braille production that OBFL neither can nor should solve. For example, issues involving controlling text to braille translations (except by means of formatting related abstractions).

This last "except" is an important nuance. Indeed, OBFL already has the style element and the translate attribute, which do not fully determine the braille translation, but have an influence on it.

These features are great however I think there are some issues with them:

  • OBFL does not provide a vocabulary for the style attribute, and if you want to use braille CSS as the standard, like I do, there is no way to specify that in OBFL. See Provide a way to specify the syntax/vocabulary of the styles used in the OBFL braillespecs/obfl#84.
  • OBFL does not provide many options for the translate attribute. Dotify handles much more values, but this is not reflected in the OBFL spec yet. For Pipeline I currently still use my own syntax. See Update allowed values for translate braillespecs/obfl#83. An idea that came up before is to allow specifying an IDREF as value if translate, which would reference a translator previously defined in XML. This comes down to the same thing as specifying the translator in the attribute, but it is more XML-y, easier to validate also, and allows you to define a translator once and use it many times.
  • The specification discourages setting the translate attribute on the obfl element. I think we should reconsider this. Since the xml:lang and translate attributes are used to select different translators to process different portions of the document (at least that's how I understand it - it's somewhat unclear, see Not clear which translator is used in which cases brailleapps/dotify.formatter.impl#116), I don't see why specifying these on the root element would be less justified than specifying them on other elements. It would be a different situation if the main translator would be selected through Dotify options, and sub-translators would be queried from this main translator on encountering xml:lang and translate attributes (but this is not how it works).

@kalaspuffar
Copy link
Collaborator Author

Hi @bertfrees and @PaulRambags

After reviewing the code, I realized that hyphenation was handled in a totally different way so it could be handled all through the document. I thought this was a good solution and very similar to the way mark-capital-letters should work as they both change the output text in different ways. So I've added some PRs for this, both an addition to the OBFL specification, API, and formatter.

#28
mtmse/dotify.api#14
mtmse/dotify.formatter.impl#33

I hope this is a good solution as it was in line with earlier implementations and was a small change to the code to handle this extra parameter.

Best regards
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants