Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in 0.1.66 #988

Open
Julusian opened this issue Feb 4, 2025 · 5 comments
Open

segfault in 0.1.66 #988

Julusian opened this issue Feb 4, 2025 · 5 comments

Comments

@Julusian
Copy link
Contributor

Julusian commented Feb 4, 2025

I can't say much about this yet, other than it is happening every maybe hour in our application, the stack trace I am getting is:

PID 11246 received SIGSEGV for address: 0x0
/opt/companion/prebuilds/julusian_segfault_handler-linux-x64/node-napi-v9.node(+0x1b10e)[0x7661e914b10e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7661e9bfb140]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x4098fb)[0x7661c06098fb]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x7745f9)[0x7661c09745f9]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0xe7471)[0x7661c02e7471]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x125492)[0x7661c0325492]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0xcbf12)[0x7661c02cbf12]
/opt/companion/node-runtimes/main/bin/node(_ZZN4node14ThreadPoolWork12ScheduleWorkEvENUlP9uv_work_sE_4_FUNES2_+0x59)[0xf534d9]
/opt/companion/node-runtimes/main/bin/node[0x1d487c0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7661e9befea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7661e9b0facf]

Are there debug symbols I can drop in to get a better stack trace, or will I need to build it myself for that?

I will try and dig into this later to try and figure out what triggers this.

@Julusian
Copy link
Contributor Author

Julusian commented Feb 4, 2025

Hmm.. I produced a build with yarn build:debug, and it hasnt really added any more detail..

/opt/companion/prebuilds/julusian_segfault_handler-linux-x64/node-napi-v9.node(+0x1b10e)[0x75109af5410e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x75109ba43140]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x3d345b)[0x751071bd345b]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x73e64d)[0x751071f3e64d]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x102a16)[0x751071902a16]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x1cd7d2)[0x7510719cd7d2]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0x1d1fe7)[0x7510719d1fe7]
/opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node(+0xf2e60)[0x7510718f2e60]
/opt/companion/node-runtimes/main/bin/node(_ZZN4node14ThreadPoolWork12ScheduleWorkEvENUlP9uv_work_sE_4_FUNES2_+0x59)[0xf534d9]
/opt/companion/node-runtimes/main/bin/node[0x1d487c0]

The binary is 82MB (compared to 29MB) of the npm package, so it definitely did build differently

@Julusian
Copy link
Contributor Author

Julusian commented Feb 4, 2025

Re-running with that debug binary through gdb gives:

[Switching to Thread 0x7fffc6000700 (LWP 18898)]
0x00007fffcdbd345b in SkSurface::makeImageSnapshot() () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
(gdb) backtrace
#0  0x00007fffcdbd345b in SkSurface::makeImageSnapshot() () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#1  0x00007fffcdf3e64d in skiac_surface_png_data () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#2  0x00007fffcd902a16 in canvas::sk::SurfaceRef::png_data () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#3  0x00007fffcd9cd7d2 in canvas::get_data_ref () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#4  0x00007fffcd9d1fe7 in <canvas::AsyncDataUrl as napi::task::Task>::compute () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#5  0x00007fffcd8f2e60 in napi::async_work::execute () from /opt/companion/node_modules/@napi-rs/canvas-linux-x64-gnu/skia.linux-x64-gnu.node
#6  0x0000000000f534d9 in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*)#1}::_FUN(uv_work_s*) ()
#7  0x0000000001d487c0 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#8  0x00007ffff7c77ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9  0x00007ffff7b97acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@Julusian
Copy link
Contributor Author

Julusian commented Feb 4, 2025

This looks to be a concurrency issue, the following script crashes within 2s without completing any renders:

const { createCanvas, loadImage } = require("@napi-rs/canvas");

let completed = 0;

const generatePng = () => {
  const canvas = createCanvas(200, 200);

  canvas
    .toDataURLAsync("image/png")
    .then(() => {
      completed++;

      console.log("complete", completed);

      setImmediate(() => {
        generatePng();
      });
    })
    .catch((err) => {
      console.log("gen failed", err);
    });
};

const CONCURRENCY = 50;

for (let i = 0; i < CONCURRENCY; i++) {
  generatePng();
}

reducing CONCURRENCY increases the amount of canvases it will process before crashing, below 5 appears to avoid crashing (likely just reduces the risk massively)

I sampled this with a few other random versions of the library going back as far as 0.1.40 and it occurred in all of them.
This was tested on x64 linux (fedora, originally seen on ubuntu 24.04). I haven't yet tried to reproduce on other oses.

Strangely, I only started seeing this the other day, with it happening on a very regular interval (~30 mins), but am not aware of seeing it before this for the 6 months of using this library. The concurrency we do hasn't changed since switching to this library

@Brooooooklyn
Copy link
Owner

I haven't changed the relevant code recently, it should be introduced by the skia upgrade, I need to find time to study it in depth.

@Julusian
Copy link
Contributor Author

Julusian commented Feb 5, 2025

My workaround of using toDataURL instead of toDataURLAsync appears to avoid this crash.
Its been running for 11 hours without a crash, compared to the 20-30 minute crash interval it had before.

I don't think this will have a negative performance impact for my usage, so I am satisfied that this workaround is good enough, so no rush on a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants