Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liblouis table for Dutch without capsign #9

Closed
Tracked by #6
bertfrees opened this issue Jun 1, 2015 · 17 comments
Closed
Tracked by #6

Liblouis table for Dutch without capsign #9

bertfrees opened this issue Jun 1, 2015 · 17 comments
Assignees
Labels
Milestone

Comments

@bertfrees
Copy link
Member

No description provided.

@bertfrees bertfrees added this to the dutch (1) milestone Jun 1, 2015
@bertfrees bertfrees mentioned this issue Jun 1, 2015
7 tasks
@dkager
Copy link

dkager commented Jun 2, 2015

In addition to capsign, the begcaps, endcaps and emphasis signs should also be suppressed.
Ideally what I'd like to do is include nl-NL-g1.ctb and undo the cap/emphasis signs. Can you "unset" opcodes? Or use pass2 to pick up the capsign dots and remove them? But that sounds dangerous.

@bertfrees
Copy link
Member Author

No you can't unset. Just move all the things you don't want to a separate table, include it in the main table, and don't include it in the alternative table.

@dkager
Copy link

dkager commented Jun 2, 2015

This implies that for the stock nl-NL table we'll have to split off caps and emphasis signs into their own file. Is that acceptable? I'm reluctant to split off the entire nl-NL table into a package specific to Dedicon because of the extra maintenance that will cause. IMO standard printing should be done with the standard (stock) table, and only exceptions should be split off into a different package.

@bertfrees
Copy link
Member Author

Sounds acceptable to me. Splitting the table into modules doesn't affect the behavior of the "root" table. I think we can even include this table without capsign in the official liblouis distribution, in addition to the main table. Sounds like something generally useful.

@bertfrees
Copy link
Member Author

Then we'd have 3 new tables: nl-caps.cti, nl-g0-nocaps.utb and nl-NL-g0-nocaps.utb.

@dkager
Copy link

dkager commented Jun 2, 2015

Sounds good. So we'll also rename to g0.utb? For 3rd parties using liblouis it's probably best to do all the file renames at once so they don't have to update their products all the time.

@bertfrees
Copy link
Member Author

I have already renamed to g0. See commit c224f06.

@dkager
Copy link

dkager commented Jun 5, 2015

I wanted an ordered list of possible solutions to this problem, so here goes:

  1. Add two translationModes: noCaps and noEmphasis, that will cause liblouis to ignore the relevant opcodes. This is probably the least error-prone and most general-purpose if done right.
  2. Use the swapping trick as in da-dk-nocaps.uti. I don't like this because you have to make sure you include every capital letter in Unicode in order to guarantee it works for all inputs.
  3. Refactor the Dutch tables into yet more files in the hopes that they will work again and keep working across upgrades. This means we end up with about eight files for two tables. It is also very fragile and the resulting "includable files" can't actually freely be included because of class order. So one important secondary requirement is better handling of class names, i.e. allow for a disjunction of a user-defined class with a built-in class such as %myclass|$l.
  4. See if the patches by @MikeGray-APH improve the situation. This is a bit of a blank for me still.
  5. Because the simplest solutions are often the best: make sure the input has no capitals and emphasis. Or, implement solution 1 except instead of ignoring opcodes just transform the input string to lowercase before processing it and set the typeforms to 0. This is really a shift of responsibilities from liblouis to 3rd party code.

My preferred solution is 5.

@bertfrees
Copy link
Member Author

Thanks for the analysis. Some remarks:

  1. Could you elaborate on what that trick is?
  2. A lot of module files isn't necessarily a problem I think. The other thing with the fragileness of $wl is a problem of course, but we should treat as a separate issue.
  3. Yes, simple solutions are often the best, and your proposed solution would indeed be easy to implement. If you prefer this I'm OK with it, at least we can do it this way until a better solution comes up.

I like to think of liblouis tables (or possibly the combination table/mode) as complete and portable representations of a braille code. So I like to avoid pre-processing as much as possible. That brings me to my preferred solution 3 (or 1).

@dkager
Copy link

dkager commented Jun 8, 2015

  1. That table appears to use swapcc, i.e. ABCDEabcde and then correct those. Semantically this isn't intuitive: you aren't correcting the original text, but you are given some text and choose to represent it in some way.
  2. Can you explain why we're only getting 66% success on dkager_dutch_ueb? I haven't looked at which rules these tests apply but I think the failure is due to the order of including tables.

I feel (1) is the cleanest solution because (2) is "semantically unpleasant", (4) isn't really a solution in and of itself, and (5) alters input text which is the same problem as (2). My second preferred option is (3), but maybe the concept of a "table without caps" isn't very useful to most people. Plus, (1) should work for any table. I just hope it won't get too hacky dealing with all these different opcodes.

Regardless of how we implement this, I was wondering how/where to configure this. Could be a configuration parameter or (more flexible) a braille CSS element. I.e. on the cover of children's books we use caps but inside we don't, so some granularity would be good. Is this integrated yet/should I look for a sample document (which also uses double line-spacing)?

@dkager
Copy link

dkager commented Jun 8, 2015

OK so tests are likely failing due to @MikeGray-APH's patches. That having been said, I still really like solution (1) so am going to do at least a casual code review to see how hard this would be to do.

@bertfrees
Copy link
Member Author

I was going to ask you about the configuration granularity. First I thought it was going to be a top-level setting, i.e. either caps everywhere or caps nowhere. In that case the most appropriate solution would have been to add e.g. (nocaps) to the "transformer query" (see Braille modules design document for what that means). The transformer could then just pass on the nocaps feature to lou_findTable (solution 2/3), it could change the "nocaps" mode bit in case of solution (1), or it could change the input text to lowercase (5).

Things change slightly if configuration is requirement on a finer level, because then we need Braille CSS. The sections where no caps should be used would get a text-transform:nocaps (or text-transform:lowercase) property. It is then up to the transformer to decide what to do with this property. It could delegate to a "sub-transformer" with the (nocaps) feature. Or it can -- more directly -- convert fragments of the input to lowercase (solution 5).

All options are still open in both cases, but this background info might change your view a bit.

Go ahead if you want to explore solution 1. Remember that we can just do what's easiest to implement, and record issues for things that can possibly be improved later.

@dkager
Copy link

dkager commented Jun 8, 2015

I think text-transform (with value lowercase) is the right one. This does make solution (5) more appropriate. I'll ask around to find out how much fine-grained control we want/need.

Is this handled in Java or in XML/XSLT? A simple String.toLowerCase() would definitely be quicker than a change in liblouis. That having been said, I do see potential value in having the flags as per (1). For instance, my current screen reader has a "suppress capital signs" option that may be useful to beginner readers. So maybe a feature request for upstream, outside of this project?

We currently have (3) implemented in dkager_dutch, so I'm tempted to stick with that. However:

  • The Dutch tables are quite cluttered right now and therefore not so easy to maintain.
  • The patches by @MikeGray-APH break...something, see dkager_dutch_ueb. It would be easier to verify what's going on if we didn't have 3 different include files to worry about.
  • We'll have to add a configuration parameter or braille CSS property anyway, so might as well go all the way and implement (5).

Thoughts? I can update the CSS spec and if the implementation is mostly in Java I also look at that. Merging the tables back together isn't a big deal either so that shouldn't be a reason to hold off on implementing (5).

@bertfrees
Copy link
Member Author

OK, do what you think is the right thing to do. The text-transform:lowercase can be handled in either Java or XSLT, but I would prefer Java.

I will take care of updating the Braille CSS spec, I have been preparing to do that anyway. The text-transform property is already largely implemented. See also braillespecs/braille-css#12 and daisy/pipeline-mod-braille#23.

@dkager
Copy link

dkager commented Jun 15, 2015

Seems that joining the nocaps tables back together was a good idea as now Mike's patches don't cause issues anymore. I can use that branch for further development, but we'll have to decide if/when this makes its way into liblouis (upstream) and when this will be used for pipeline-mod-braille. I'll spend some time on it today unless something else comes up.

@dkager
Copy link

dkager commented Jun 18, 2015

This is now implemented in liblouis-core (aef6204). Can we tick this issue off, or is there more to do?

@bertfrees
Copy link
Member Author

Yes, this is done. Thanks Davy. Will close as soon as I merge the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants