backend feature compile can be a long pole #456

rsheeter · 2023-09-25T17:50:49Z

#443 reveals features can be the long pole. There are two non-exclusive options to improve this:

Make it faster
- We write fea for content (and soon marks) for fea-rs to parse; we could build some other form, e.g. the AST, directly
Make it parallel
- The more cores you have the more fea-rs pops out as long pole (ref Add a visualization of thread activity #443)
- Could we break up the compile, for example process GSUB and GPOS separately?
  - Requires splitting up things like feature foo { pos a b -5; sub x by y; } (courtesy of @cmyr)
  - Some parts, e.g. classes, are common

We should start by profiling. flamegraphs of the overall build are fairly noisy, perhaps we can rig fea-rs to be run on just the debug feature file?

$ rm -rf build/ && cargo run -p fontc --  --source ../OswaldFont/sources/Oswald.glyphs --emit-debug
# produces build/debug/features.fea which is the complete combined feature file

The text was updated successfully, but these errors were encountered:

anthrotype · 2023-09-25T18:01:34Z

may I also suggest:

Start it sooner rather than later

in #443 you wrote:

We can see any speedup to glyph order production is a big win as it's delaying everything.

since auto-FEA generation and FEA compilation are blocked on glyph order, we might also try to speed the latter so that they get an early start?

rsheeter · 2023-09-25T18:06:17Z

since auto-FEA generation and FEA compilation are blocked on glyph order, we might also try to speed the latter so that they get an early start?

+1, but it looks to me like on a machine with lots of cores fea will still be a problem. #443 (comment) has a plot on a 72-way system that seesm to show this.

rsheeter · 2023-09-25T18:07:37Z

#458 for glyph order

cmyr · 2023-10-12T17:16:36Z

I've dipped into this a little bit, and have some useful preliminary findings, based on some profiling of the compilation of Google Sans (running fea-rs directly, using as input the fea we generate internally, dumped with --emit-debug):

~20-30% of our time is spent parsing.
- There may be some room for small optimizations here, by reducing the number of nodes (in which we group tokens) in certain cases (specifically I'm thinking about parsing GPOS ValueRecords)
- In cases where features are split across multiple files, we may be able to parse multiple files in parallel, which might offer significant improvements. A simple sketch of how I think this might work:
  - use a queue to store 'sources to parse'
  - when a source is loaded, in addition to going onto the work queue we also do a strcmp for include statements
  - parse the files separately, and then stitch them together at the end.
- There are other options; for instance it would be simpler/faster if the client could just provide a list of relevant feature files ahead of time?
- If the bottleneck is parsing a single large file we could also look into breaking the input into chunks and parsing those in parallel.
~20% of our time is spent in fea-rs's compilation pass. Google Sans doesn't make heavy use of GSUB, so there would be no gains here to splitting our implementation, although this might be relevant for other fonts?
~10% of our time is spent dropping Arcs!? this needs some further investigation
~8% on validation
~20% on actually compiling to binary, with the vast majority of that time spent in table packing.
- I think there is also room for some wins in our table packing impl. In particular, I've noticed that in cases where we split subtables we always then spend a bunch of time deciding which of those subtables should get promoted to extension space, but we know that if we had to split a subtable it is very likely that it will need to be promoted; so I think that we could be doing something smarter here, and pre-promoting any subtables that have been split.
- there are likely some other small gains to be made here just by some general optimizations, like implementing a good integer set (Implement a good integer set ala (hb_set) fontations#192)

These are my preliminary thoughts; I'll dig a bit more and see if anything else jumps out at me.

edit:

We're creating a lot of nodes for GPOS, and this is why dropping Arcs is showing up in our trace, and is likely also significantly impacting parsing time? see cmyr/fea-rs#240

rsheeter · 2023-10-12T17:50:50Z

~20-30% of our time is spent parsing.

So if we generated data structures directly rather than writing fea to parse it would make a significant difference?

cmyr · 2023-10-12T21:24:33Z

let's see how much I can improve the situation by addressing cmyr/fea-rs#240? But using fea as an IR definitely introduces some overhead, although we'd have to think pretty carefully about what the alternative would look like.

rsheeter · 2023-11-10T17:18:29Z

To keep the thread updated, we now generate backend structures directly which definitely helps. It remains observably true that kerning & features end up long pole and could start sooner.

rsheeter added the performance label Sep 25, 2023

This was referenced Sep 25, 2023

Tell me how to provide the glyph order cmyr/fea-rs#230

Closed

Capture instructions to narrow a flamegraph #460

Merged

First swing at task prioritization #466

Closed

This was referenced Nov 11, 2023

Don't block on glyph order if you don't have to #565

Merged

[WIP] Move conversion to contours out of glyph order #568

Draft

Move construction of kerning and mark structures to feed to fea-rs into separate jobs that can run in parallel #567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend feature compile can be a long pole #456

backend feature compile can be a long pole #456

rsheeter commented Sep 25, 2023 •

edited

Loading

anthrotype commented Sep 25, 2023

rsheeter commented Sep 25, 2023

rsheeter commented Sep 25, 2023

cmyr commented Oct 12, 2023 •

edited

Loading

rsheeter commented Oct 12, 2023

cmyr commented Oct 12, 2023

rsheeter commented Nov 10, 2023

backend feature compile can be a long pole #456

backend feature compile can be a long pole #456

Comments

rsheeter commented Sep 25, 2023 • edited Loading

anthrotype commented Sep 25, 2023

rsheeter commented Sep 25, 2023

rsheeter commented Sep 25, 2023

cmyr commented Oct 12, 2023 • edited Loading

rsheeter commented Oct 12, 2023

cmyr commented Oct 12, 2023

rsheeter commented Nov 10, 2023

rsheeter commented Sep 25, 2023 •

edited

Loading

cmyr commented Oct 12, 2023 •

edited

Loading