Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment layering/blending #4550

Open
blazoncek opened this issue Feb 13, 2025 · 7 comments
Open

Segment layering/blending #4550

blazoncek opened this issue Feb 13, 2025 · 7 comments

Comments

@blazoncek
Copy link
Collaborator

With 0.14 we've got proper blending of color and palettes across all segments, 0.15 improved on that and allowed effects to be blended as well. Recent blending styles (#4158) introduced effect transitions with different styles.

What is missing IMO is correct segment blending/layering with different blending modes known to Photoshop users, i.e. lighten, darken, multiply, add, subtract, difference, etc.

I've recently come across a snippet that I find useful and relatively simple to implement in WLED. Unfortunately it would require re-implementation of pixel drawing functions.

Let's first see the code and then discuss the nuances:

void WS2812FX::blendSegments(const Segment &top, const Segment &bottom) {
  constexpr auto top        = [](uint8_t a, uint8_t b){ return a; };
  constexpr auto bottom     = [](uint8_t a, uint8_t b){ return b; };
  constexpr auto add        = [](uint8_t a, uint8_t b){ return a + b; };
  constexpr auto subtract   = [](uint8_t a, uint8_t b){ return b > a ? (b - a) : 0; };
  constexpr auto difference = [](uint8_t a, uint8_t b){ return b > a ? (b - a) : (a - b); };
  constexpr auto average    = [](uint8_t a, uint8_t b){ return (a + b) >> 1; };
  constexpr auto multiply   = [](uint8_t a, uint8_t b){ return a * b; };
  constexpr auto divide     = [](uint8_t a, uint8_t b){ return b != 0 ? a / b : 0; };
  constexpr auto lighten    = [](uint8_t a, uint8_t b){ return a>b ? a : b; };
  constexpr auto darken     = [](uint8_t a, uint8_t b){ return a<b ? a : b; };
  constexpr auto screen     = [](uint8_t a, uint8_t b){ return uint8_t(0xFF - ~a * ~b); }; // 255 - (255-a)*(255-b)
  constexpr auto overlay    = [](uint8_t a, uint8_t b){ return uint8_t(a<0x80 ? 2*a*b : (0xFF - 2 * ~a * ~b)); };

  using FuncType = uint8_t(*)(uint8_t, uint8_t);
  FuncType funcs[] = {
    top, bottom, add, subtract, difference, average, multiply, divide, lighten, darken, screen, overlay
  };

  const uint8_t blendMode = top.blendMode;
  auto func = funcs[blendMode % (sizeof(funcs) / sizeof(FuncType))];

  for (auto i = 0; i<top.length(); i++) {
    uint32_t c_a = top.getPixelColor(i);
    uint32_t c_b = bottom.getPixelColor(i);
    auto r_a = R(c_a);
    auto g_a = G(c_a);
    auto b_a = B(c_a);
    auto w_a = W(c_a);
    auto r_b = R(c_b);
    auto g_b = G(c_b);
    auto b_b = B(c_b);
    auto w_b = W(c_b);
    pixels[i] = RGBW32(func(r_a,r_b), func(g_a,g_b), func(b_a,b_b), func(w_a,w_b));
  }
}

This assumes that Segment::getPixelColor() returns unmodified value set by Segment::setPixelColor() during effect execution. To achieve that, each segment must maintain its own pixel drawing buffer (also known in the past as setUpLeds())
Next it also assumes WS2812FX instance will maintain its entire canvas buffer (called pixels[]; similar to global buffer).

The process with which segments/layers would be blended is:

  • render each effect in its segment buffer (entirely independent and unaware of other segments)
  • blend all segment buffers into canvas buffer starting with black canvas and adding segments from bottom to top
  • transfer canvas buffer to LEDs

This process does not handle color/palette blending (which does not change from current) or effect transition (from one effect to another) but only allows users to stack segments one atop another which would allow mixing of content of two segments even if effect function does not support layering.

The price is of course memory and (possibly) speed degradation as there will be more operations per pixel. However, segment's setPixelColor()/getPixelColor() could be simplified and there would be no need for WS2812FX::setPixelColor().

I would like to get some feedback and thoughts about layering and the implementation. An if it is worth the effort even if speed would be impaired.

@willmmiles
Copy link
Member

Your proposed pipeline is more-or-less how I'd've expected WLED to be implemented, before I got in to the actual code. I'd thought of it more of "segment" renders to a texture buffer; textures are then rendered to the frame buffer (with grouping, spacing, any other stretching/scaling transformations going on here); and finally the frame buffer is mapped to pixels on the outputs bus(es).

Memory costs aside: we've observed in prototype implementations that having the FX write directly to local texture buffers (setUpLeds() style) can often yield an FPS speedup. There's potential that pipelining the other aspects (texture to frame buffer, and frame to physical) may also yield net speed improvements as the code for each phase becomes simpler -- less indirection, fewer conditionals, and easier to fit in cache. Some ESP models (S3, I'm looking at you!) also offer SIMD instructions that can also further accelerate pipelined simple transforms.

Lastly I think the pipelined approach will also make the code simpler and easier to follow. The key is breaking "Segment" down into the component parts -- the FX renderer, the texture->frame renderer, and the physical mapper can all be pulled into individual stages and components instead of packing everything into one giant class.

@blazoncek
Copy link
Collaborator Author

textures are then rendered to the frame buffer (with grouping, spacing, any other stretching/scaling transformations going on here);

Thanks. This is an interesting approach which indeed simplifies single segment processing (and reduces memory requirements) but will inevitably create blending into "frame buffer" more complex (imagine two partially overlapping segments with spacing and grouping set).

I would leave frame buffer to physical mapping in the bus level logic but that would make anything but Cartesian coordinate system difficult to implement (mapping arbitrary 3D location of pixels into frame rendering logic, especially sparse set-up). But we can leave this mapping out of the scope of this discussion.

@DedeHai
Copy link
Collaborator

DedeHai commented Feb 14, 2025

In general I think this is a good approach!
Some thoughts:

  • this would greatly simplify the code logic and offer potential speed improvements (as we saw in Soap FX optimization #4543)
  • the question of white-channel use arises: we discussed using it as a transparency/opacity channel before which would allow FX to be used as masks to underlying segments
  • should the buffers be 24bit, 32bit (or maybe even just 16bit, reducing color accuracy and add some shift-logic overhead)
  • you already mentioned partial segment overlap: how to handle layering in that case? I ran into the same issue in the PS where I chose the simple approach to just use "color_add()" when detecting partial overlay, which is not the cleanest solution: ideally, "color_add()" should be used if there is a underlying segment, "set_color()" if not. This requires a buffer-mask or per-pixel scanning of all segment boarders (or a more clever approach I could not think of)
  • what would be the fallback-solution if buffer allocation fails? for example on large, sparse-mapped setups, the buffer is huge
  • should the segment buffers be allocated in FX-data memory (like I have done in the PS)?

just an idea: use a "mask" checkmark in segment settings to treat an FX as a transparency mask instead of using its colors, the mask could be R+G+B or even something more elaborate.

@willmmiles
Copy link
Member

Thanks. This is an interesting approach which indeed simplifies single segment processing (and reduces memory requirements) but will inevitably create blending into "frame buffer" more complex (imagine two partially overlapping segments with spacing and grouping set).

By "frame buffer", I mean what you are calling a "canvas buffer": a single global buffer of all LEDs in the virtual space to which all segments are rendered, top to bottom, for each output frame (eg. show() call). I would expect that segment spacing and grouping would be best implemented as part of the segment->canvas render process -- if output canvas coordinates overlap from one segment to the next, you blend; if not, they'll end up interleaved as expected.

Mostly I was trying to highlight that I had expected an implementation with the same sequence of concepts, but I'd've used different names -- I don't think there's any significant difference between what I described and what you proposed.

@willmmiles
Copy link
Member

  • the question of white-channel use arises: we discussed using it as a transparency/opacity channel before which would allow FX to be used as masks to underlying segments
  • should the buffers be 24bit, 32bit (or maybe even just 16bit, reducing color accuracy and add some shift-logic overhead)

RGBA at the FX level would be very practical if we're serious about considering blending, I think...

If we really want to get radical, I'd also float the idea of struct-of-arrays instead of array-of-structs, eg. struct pixel_buf { vector<uint8_t> R, G, B, A; }; I've heard tell that some algorithms can operate much faster in this space. Loop unrolling can make it quite efficient, and it vectorizes to SIMD instructions well.

  • what would be the fallback-solution if buffer allocation fails? for example on large, sparse-mapped setups, the buffer is huge

This is the true crux of this approach. If we move to this architecture, the solution might be to explicitly develop a sparse canvas buffer abstraction. Sparse matrices have a lot of prior art in numeric processing code, I'm sure there's some insights there that could be helpful.

@blazoncek
Copy link
Collaborator Author

If we introduce alpha channel (instead of W in effect functions) I would assume it only applies to that segment in regards to blending it with lower segment. Right?
Would it only be used when blending into frame/canvas buffer alongside segment's opacity? Even though it would work on pixel level that's sort of duplicate functionality IMO.

I do admit that it will open new possibilities when writing new effects, but none of current effects are written to utilise it so there will be no immediate gain.

So, if I summarize what we've defined so far:

  • segment will contain pixel buffer (uint32_t; RGBA, but A my be interpreted as W for some effects)
  • segment's setPixelColor() operates on "condensed" segment without grouping and spacing (current virtualLength() or virtualWidth()/virtualHeight()) writing only to pixel buffer
  • effect function uses W channel as alpha channel used in blending (we'll need to make exception for Solid effect, perhaps also for all single pixel effects)
  • strip will feature canvas/frame buffer large enough to encompass all virtual pixels (including pixels that are actually missing, it also needs to take into account combination of matrix+strip)
  • canvas/frame buffer might be RGB only, if it includes W it is not alpha but White
  • segments are blended in strip.show() function
  • segments are blended into canvas/frame buffer using segment opacity and pixel alpha channel
  • segment's blending mode is applied from segment to the canvas, expanding, skipping, reversing and mirroring pixels if needed (lowest segment, usually 0, is blended with black canvas/frame first followed by segments above it)
  • W channel will need to be maintained for some effects (clash with alpha channel)
  • when all segments are processed canvas/frame buffer is sent to bus manager for display (converting RGB into RGBW if necessary)
  • bus manager will handle virtual-to-physical mapping (we will need to move mapping functions from WS2812FX class)
  • handling sparse set-ups will need a follow-up

Caveats:

  • pixel buffer memory reallocation when segment dimensions change
  • canvas/frame buffer memory reallocation when LED count or 2D geometry changes
  • some effects may benefit using W channel instead of alpha
  • do we share pixel buffer when segment is in transition (copy/move constructors & operator=)
  • do we share pixel buffer when segment is copied
  • prevent segment changes during async JSON calls (this one is very important) or cancel any effect drawing immediately
  • moving ledmap handling into bus manager may be challenging

@willmmiles
Copy link
Member

If we introduce alpha channel (instead of W in effect functions) I would assume it only applies to that segment in regards to blending it with lower segment. Right? Would it only be used when blending into frame/canvas buffer alongside segment's opacity? Even though it would work on pixel level that's sort of duplicate functionality IMO.

Yes -- I'd expect segment-level opacity to apply "on top" of the FX computed alpha channel, much the same way segment-level brightness applies "on top" of the FX computed colors. IIRC the computation is something like blend_A = pixel[index].A * segment.A / 255 -- we don't have to blend twice.

  • canvas/frame buffer might be RGB only, if it includes W it is not alpha but White
  • W channel will need to be maintained for some effects (clash with alpha channel)

Maybe Segment wants a "buffer type" indicator, set or carried by the FX selection, to inform the blending layer how to composite the segment to the canvas.

  • bus manager will handle virtual-to-physical mapping (we will need to move mapping functions from WS2812FX class)
  • moving ledmap handling into bus manager may be challenging

Bikeshedding a bit, but I'd probably put the canvas to bus functionality in its own free function(s) to start off with. Some renderToBus(Bus&, canvas, ledmap) type call. This operation is not really involved in managing buses, so IMHO it doesn't belong to BusManager.

  • canvas/frame buffer memory reallocation when LED count or 2D geometry changes

I think it's reasonable to flush everything (segments, transitions, etc.) and start clean if the bus configuration changes. If we can't re-allocate everything from a clean slate, the config is in trouble anyways...

  • pixel buffer memory reallocation when segment dimensions change
  • do we share pixel buffer when segment is in transition (copy/move constructors & operator=)
  • do we share pixel buffer when segment is copied

One neat feature of this architecture is that transitions can be implemented as part of canvas composition -- each segment need only contain enough information for the compositor to know which pixels to draw. So I'd suggest trying a broad approach of "don't re-use segments or buffers at all" as the place to start.

  • Move "outgoing" segments to an "active transition" list, and store their local transition properties; they are kept there while their transition counts go down
  • Attempt to allocate new segments for the new FX selection or size definition (pixel buffers and all)
  • If allocation fails, fall back by purging something from the transition list
  • If the active transition list is empty, you're just SOL anyways; engage next fallback behaviour

From there we can explore progressively more sophisticated fallbacks to handle cases where memory is tight. Some ideas:

  • Migrating segment pixel buffers to SPI RAM if needed;
  • A custom allocator for pixel buffers, independent from the system heap; this could even permit re-arranging buffers after creation to support more advantageous physical layouts
  • Using the MMU to allow for discontiguous buffer allocation - see esp_mmu_map()

The segment object itself shouldn't be big enough to matter (compared to the pixel buffers), so we can just allocate new ones whenever it's convenient.

  • prevent segment changes during async JSON calls (this one is very important) or cancel any effect drawing immediately

This is on my todo list already -- the render loop really needs to be mutexed with any source of segment state changes. Every platform we support has multiple tasks, even ESP8266! I believe this may be responsible for some crash cases in some configurations in the current code. I haven't had time to look at it yet -- either we want a new "render state lock", or we can expand the scope of the current JSON buffer lock to cover any case of reading/writing the core state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants