Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend feature compile can be a long pole #456

Open
rsheeter opened this issue Sep 25, 2023 · 7 comments
Open

backend feature compile can be a long pole #456

rsheeter opened this issue Sep 25, 2023 · 7 comments

Comments

@rsheeter
Copy link
Contributor

rsheeter commented Sep 25, 2023

#443 reveals features can be the long pole. There are two non-exclusive options to improve this:

  1. Make it faster
    • We write fea for content (and soon marks) for fea-rs to parse; we could build some other form, e.g. the AST, directly
  2. Make it parallel
    • The more cores you have the more fea-rs pops out as long pole (ref Add a visualization of thread activity #443)
    • Could we break up the compile, for example process GSUB and GPOS separately?
      • Requires splitting up things like feature foo { pos a b -5; sub x by y; } (courtesy of @cmyr)
      • Some parts, e.g. classes, are common

We should start by profiling. flamegraphs of the overall build are fairly noisy, perhaps we can rig fea-rs to be run on just the debug feature file?

$ rm -rf build/ && cargo run -p fontc --  --source ../OswaldFont/sources/Oswald.glyphs --emit-debug
# produces build/debug/features.fea which is the complete combined feature file
@anthrotype
Copy link
Member

may I also suggest:

  1. Start it sooner rather than later

in #443 you wrote:

We can see any speedup to glyph order production is a big win as it's delaying everything.

since auto-FEA generation and FEA compilation are blocked on glyph order, we might also try to speed the latter so that they get an early start?

@rsheeter
Copy link
Contributor Author

since auto-FEA generation and FEA compilation are blocked on glyph order, we might also try to speed the latter so that they get an early start?

+1, but it looks to me like on a machine with lots of cores fea will still be a problem. #443 (comment) has a plot on a 72-way system that seesm to show this.

@rsheeter
Copy link
Contributor Author

#458 for glyph order

@cmyr
Copy link
Member

cmyr commented Oct 12, 2023

I've dipped into this a little bit, and have some useful preliminary findings, based on some profiling of the compilation of Google Sans (running fea-rs directly, using as input the fea we generate internally, dumped with --emit-debug):

  • ~20-30% of our time is spent parsing.
    • There may be some room for small optimizations here, by reducing the number of nodes (in which we group tokens) in certain cases (specifically I'm thinking about parsing GPOS ValueRecords)
    • In cases where features are split across multiple files, we may be able to parse multiple files in parallel, which might offer significant improvements. A simple sketch of how I think this might work:
      • use a queue to store 'sources to parse'
      • when a source is loaded, in addition to going onto the work queue we also do a strcmp for include statements
      • parse the files separately, and then stitch them together at the end.
    • There are other options; for instance it would be simpler/faster if the client could just provide a list of relevant feature files ahead of time?
    • If the bottleneck is parsing a single large file we could also look into breaking the input into chunks and parsing those in parallel.
  • ~20% of our time is spent in fea-rs's compilation pass. Google Sans doesn't make heavy use of GSUB, so there would be no gains here to splitting our implementation, although this might be relevant for other fonts?
  • ~10% of our time is spent dropping Arcs!? this needs some further investigation
  • ~8% on validation
  • ~20% on actually compiling to binary, with the vast majority of that time spent in table packing.
    • I think there is also room for some wins in our table packing impl. In particular, I've noticed that in cases where we split subtables we always then spend a bunch of time deciding which of those subtables should get promoted to extension space, but we know that if we had to split a subtable it is very likely that it will need to be promoted; so I think that we could be doing something smarter here, and pre-promoting any subtables that have been split.
    • there are likely some other small gains to be made here just by some general optimizations, like implementing a good integer set (Implement a good integer set ala (hb_set) fontations#192)

These are my preliminary thoughts; I'll dig a bit more and see if anything else jumps out at me.

edit:

We're creating a lot of nodes for GPOS, and this is why dropping Arcs is showing up in our trace, and is likely also significantly impacting parsing time? see cmyr/fea-rs#240

@rsheeter
Copy link
Contributor Author

~20-30% of our time is spent parsing.

So if we generated data structures directly rather than writing fea to parse it would make a significant difference?

@cmyr
Copy link
Member

cmyr commented Oct 12, 2023

let's see how much I can improve the situation by addressing cmyr/fea-rs#240? But using fea as an IR definitely introduces some overhead, although we'd have to think pretty carefully about what the alternative would look like.

@rsheeter
Copy link
Contributor Author

To keep the thread updated, we now generate backend structures directly which definitely helps. It remains observably true that kerning & features end up long pole and could start sooner.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants