Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: kernel NULL pointer dereference and BUG: workqueue lookups at sleep and hibernate with LRNG. #27

Open
dreirund opened this issue Feb 25, 2023 · 33 comments

Comments

@dreirund
Copy link

dreirund commented Feb 25, 2023

As a suggestion by @pfactum, the maintainer of the -pf Linux kernel (patchset), I report here that I had issues at sleep and hibernation with the LRNG patchset.

Affected kernels

I had it with

  • Linux 6.1-pf2 and
  • Linux 6.1-pf5 (I did not try the intermediates), and
  • it was gone with Linux 6.1-pf6, where LRNG was dropped, and
  • I did not had it with 6.0-pf5.

I had it with

  • self-compiled pf-kernels where I had many LRNG options enabled, but
  • not with pre-compiled pf-kernel where far less LRNG options were set.
  • I also did not have it with self-compiled vanilla kernel which does not have LRNG.

Problem description

I got
BUG: kernel NULL pointer dereference, address: 0000000000000088
and then subsequent
BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 stuck for [number]s!
at suspend-to-disk and at sleep.

When doing suspend-to-disk, the machine did not power off and showed the kernel errors. However, when I forced power-off, I could resume, and got an unstable working environment (programmes started to stall or not to execute at all) spitting out dmesg logs.

When doing sleep, I had to force power off to get the machine usable again.

Attached error logs

I attach two dmesg logs I could capture after resume (the first log was captured after longer time use, then suspend, then resume, and then having it sit and tried to beeing used for some time; the second log after suspend shortly after bootup and not so much usage time after resume):

Attached kernel configs

I attach:

(Note that kernels i., ii. and iii. did not show the problem, but iv. does.)

Notes.

I have not the capacity to assist debugging (recompiling kernel needs a lot of time; and for my productivity I heavily rely on suspend-to-disk, so each time I reboot to another kernel I have a considerable productivity impact).

I am already running a kernel not anymore affected by this.

I report this here in the hope that it might help you nevertheless.

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 25, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 25, 2023 via email

@dreirund
Copy link
Author

Using linux 6.1-pf5 with the kernel config 2 from your list on my system:

[...]

I see no bugs, hangs or weird behavior so far.

As I wrote, kernel config 2 works fine (and is from a precompiled kernel from some repository); kernel config 4 makes the problems.

What kind of system are you using?

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 26, 2023 via email

@dreirund
Copy link
Author

dreirund commented Feb 26, 2023

With all of these considerations, could you please help me how you concluded that the LRNG patch series is the culprit of the issues you see?

Pecause 6.1-pf6 had dropped LRNG and it worked again.

I also attach a config of that custom compiled kernel I used: .config_6.1-pf6-custom.kconf.txt.

[...]
[ 2192.917882] Workqueue: events_long ucsi_resume_work [typec_ucsi]
[ 2192.926040] RIP: 0010:ucsi_resume_work+0x2d/0x90 [typec_ucsi]

[...]
This function is in drivers/usb/typec/ucsi/ucsi.c - as the function is not in the stock kernel, it is added with a patch.

But maybe the problem appears in the combination of ucsi and LRNG (and the real bug might be in ucsi, or in LRNG, or somewhere completely elsewhere)?

Anyway, I find both ucsi.c and ucsi_resume_work() are in fact in the vanilla kernels 6.1, 6.1.10, 6.1.12, 6.1.14 and 6.2.1 from kernel.org (others I have not tested), I cannot see how they come from another patch. Can you clarify on this if you still think they come from somewhere else?

How I came to the conclusion that they are in the vanilla kernels:
I directly downloaded said kernels from kernel.org, unpacked them, and then
find linux-6.{1,2}* -name 'ucsi.c':

linux-6.1/drivers/usb/typec/ucsi/ucsi.c
linux-6.1.10/drivers/usb/typec/ucsi/ucsi.c
linux-6.1.12/drivers/usb/typec/ucsi/ucsi.c
linux-6.1.14/drivers/usb/typec/ucsi/ucsi.c
linux-6.2.1/drivers/usb/typec/ucsi/ucsi.c

as well as
grep -r 'ucsi_resume_work' linux-6.{1,2}*:

linux-6.1.10/drivers/usb/typec/ucsi/ucsi.c:static void ucsi_resume_work(struct work_struct *work)
linux-6.1.10/drivers/usb/typec/ucsi/ucsi.c:	INIT_WORK(&ucsi->resume_work, ucsi_resume_work);
linux-6.1.12/drivers/usb/typec/ucsi/ucsi.c:static void ucsi_resume_work(struct work_struct *work)
linux-6.1.12/drivers/usb/typec/ucsi/ucsi.c:	INIT_WORK(&ucsi->resume_work, ucsi_resume_work);
linux-6.1.14/drivers/usb/typec/ucsi/ucsi.c:static void ucsi_resume_work(struct work_struct *work)
linux-6.1.14/drivers/usb/typec/ucsi/ucsi.c:	INIT_WORK(&ucsi->resume_work, ucsi_resume_work);
linux-6.2.1/drivers/usb/typec/ucsi/ucsi.c:static void ucsi_resume_work(struct work_struct *work)
linux-6.2.1/drivers/usb/typec/ucsi/ucsi.c:	INIT_WORK(&ucsi->resume_work, ucsi_resume_work);

Regards!


EDIT: I have to correct myself:
The function is not in linux-6.1, but was added somewhere later.

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 26, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 26, 2023 via email

@dreirund
Copy link
Author

dreirund commented Feb 28, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 28, 2023 via email

@dreirund
Copy link
Author

dreirund commented Feb 28, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Feb 28, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 2, 2023 via email

@dreirund
Copy link
Author

dreirund commented Mar 4, 2023

Attached error logs

And here is a photograph I have made with my camera when I tried to send the machine to sleep state:


workqueue_lookup_screenshot_at_sleep


@smuellerDD
Copy link
Owner

smuellerDD commented Mar 5, 2023 via email

@dreirund
Copy link
Author

dreirund commented Mar 5, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 5, 2023 via email

@dreirund
Copy link
Author

dreirund commented Mar 13, 2023

OK, to get a clean testing environment I actually did start a recompile with a patched vanilla kernel.

To at first reproduce as closesly as possible my failing setup, I took vanilla kernel 6.1.12 and the LRNG patchset v48 from where I applied the patches kernel_patches/v6.1/*.patch (except v48-0000-cover-letter.patch since it contains only "garbage" (no actual patch data, only descriptive text)).

Then, when trying to compile the kernel, I get the error
vma.c:(.text+0x1151): undefined reference to `__get_random_u32_below'.

More context in the output of make -j4 vmlinux:

  DESCEND objtool
  CALL    scripts/checksyscalls.sh
  CC      init/main.o
[...]
  CC      drivers/char/lrng/lrng_selftest.o
  CC      drivers/tty/tty_baudrate.o
In file included from ./include/linux/string.h:253,
                 from ./include/linux/bitmap.h:11,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/cpumask.h:5,
                 from ./arch/x86/include/asm/msr.h:11,
                 from ./arch/x86/include/asm/processor.h:22,
                 from ./arch/x86/include/asm/cpufeature.h:5,
                 from ./arch/x86/include/asm/thread_info.h:53,
                 from ./include/linux/thread_info.h:60,
                 from ./arch/x86/include/asm/preempt.h:7,
                 from ./include/linux/preempt.h:78,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/mmzone.h:8,
                 from ./include/linux/gfp.h:7,
                 from ./include/linux/slab.h:15,
                 from ./include/linux/crypto.h:20,
                 from ./include/crypto/hash.h:11,
                 from ./include/linux/lrng.h:9,
                 from drivers/char/lrng/lrng_selftest.c:27:
In function ‘fortify_memset_chk’,
    inlined from ‘lrng_chacha20_drng_selftest’ at drivers/char/lrng/lrng_selftest.c:307:2:
./include/linux/fortify-string.h:314:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
  314 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memset_chk’,
    inlined from ‘lrng_chacha20_drng_selftest’ at drivers/char/lrng/lrng_selftest.c:321:2:
./include/linux/fortify-string.h:314:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
  314 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC      drivers/gpu/drm/i915/i915_mitigations.o
  CC      drivers/char/lrng/lrng_interface_dev_common.o
[...]
  CC      drivers/gpu/drm/drm_mipi_dsi.o
  AR      drivers/gpu/drm/built-in.a
  AR      drivers/gpu/built-in.a
  AR      drivers/built-in.a
  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x1a2: unreachable instruction
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  GEN     .vmlinux.objs
  MODPOST vmlinux.symvers
  CC      .vmlinux.export.o
  UPD     include/generated/utsversion.h
  CC      init/version-timestamp.o
  LD      .tmp_vmlinux.kallsyms1
ld: vmlinux.o: in function `map_vdso_randomized':
vma.c:(.text+0x1151): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_alloc':
(.text+0x232695): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_pack_alloc':
(.text+0x233d26): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `__do_sys_swapon':
swapfile.c:(.text+0x319a30): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `scan_swap_map_slots':
swapfile.c:(.text+0x31b2a9): undefined reference to `__get_random_u32_below'
ld: vmlinux.o:slub.c:(.text+0x3364aa): more undefined references to `__get_random_u32_below' follow
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1248: vmlinux] Error 2

Full make log attached: make_-j4_vmlinux.log, and the used .config attached: .config_6.1.12-lrng-custom.kconf.txt.

I then tried to just issue another make -j4 vmlinux on top, since I remember darkly of having had an error of this kind already before but nevertheless I somehow ended up with succeeding in compilation. It fails again:

make -j4 vmlinux:

  UPD     include/generated/compile.h
  DESCEND objtool
  CALL    scripts/checksyscalls.sh
  UPD     init/utsversion-tmp.h
  CC      init/version.o
  AR      init/built-in.a
  CHK     kernel/kheaders_data.tar.xz
  GEN     kernel/kheaders_data.tar.xz
  CC      kernel/kheaders.o
  AR      kernel/built-in.a
  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x1a2: unreachable instruction
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  GEN     .vmlinux.objs
  MODPOST vmlinux.symvers
  UPD     include/generated/utsversion.h
  CC      init/version-timestamp.o
  LD      .tmp_vmlinux.kallsyms1
ld: vmlinux.o: in function `map_vdso_randomized':
vma.c:(.text+0x1151): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_alloc':
(.text+0x232695): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_pack_alloc':
(.text+0x233d26): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `__do_sys_swapon':
swapfile.c:(.text+0x319a30): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `scan_swap_map_slots':
swapfile.c:(.text+0x31b2a9): undefined reference to `__get_random_u32_below'
ld: vmlinux.o:slub.c:(.text+0x3364aa): more undefined references to `__get_random_u32_below' follow
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1248: vmlinux] Error 2

Then trying make -j1 vmlinux similar:

  CALL    scripts/checksyscalls.sh
  DESCEND objtool
  CHK     kernel/kheaders_data.tar.xz
  UPD     include/generated/utsversion.h
  CC      init/version-timestamp.o
  LD      .tmp_vmlinux.kallsyms1
ld: vmlinux.o: in function `map_vdso_randomized':
vma.c:(.text+0x1151): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_alloc':
(.text+0x232695): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_pack_alloc':
(.text+0x233d26): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `__do_sys_swapon':
swapfile.c:(.text+0x319a30): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `scan_swap_map_slots':
swapfile.c:(.text+0x31b2a9): undefined reference to `__get_random_u32_below'
ld: vmlinux.o:slub.c:(.text+0x3364aa): more undefined references to `__get_random_u32_below' follow
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1248: vmlinux] Error 2

Now about to try vanilla kernel 6.1.12 with LRNG patchset v49.

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 13, 2023 via email

@dreirund
Copy link
Author

dreirund commented Mar 13, 2023

There was a new change added on 6.1.x backported from 6.2.

Same with latest LRNG release v49:

[...]
  LD      .tmp_vmlinux.kallsyms1
ld: vmlinux.o: in function `map_vdso_randomized':
vma.c:(.text+0x1151): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_alloc':
(.text+0x232695): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `bpf_jit_binary_pack_alloc':
(.text+0x233d26): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `__do_sys_swapon':
swapfile.c:(.text+0x319a30): undefined reference to `__get_random_u32_below'
ld: vmlinux.o: in function `scan_swap_map_slots':
swapfile.c:(.text+0x31b2a9): undefined reference to `__get_random_u32_below'
ld: vmlinux.o:slub.c:(.text+0x3364aa): more undefined references to `__get_random_u32_below' follow
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1248: vmlinux] Error 2

Is your

change added on 6.1.x backported from 6.2. This is handled with 6784a5d

not yet in any release?

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 14, 2023 via email

@ptr1337
Copy link

ptr1337 commented Mar 14, 2023

@dreirund
Feel free to pick the working patchset from here: https://github.com/CachyOS/kernel-patches/blob/master/6.1/misc/0001-lrng.patch

@dreirund
Copy link
Author

dreirund commented Mar 15, 2023

I think that something went wrong with the application of yoru patch:

$ grep __get_random_u32_below *.patch

v49-0023-LRMG-add-drop-in-replacement-random-4-API.patch:+u32 __get_random_u32_below(u32 ceil)
v49-0023-LRMG-add-drop-in-replacement-random-4-API.patch:+EXPORT_SYMBOL(__get_random_u32_below);

Not in kernel_patches/6.1, only in kernel_patches/6.2:

grep -r __get_random_u32_below kernel_patches:

kernel_patches/v6.2/v49-0023-LRMG-add-drop-in-replacement-random-4-API.patch:+u32 __get_random_u32_below(u32 ceil)
kernel_patches/v6.2/v49-0023-LRMG-add-drop-in-replacement-random-4-API.patch:+EXPORT_SYMBOL(__get_random_u32_below);

So that is only present for 6.2.x kernels, not (backported to) 6.1.x kernels (and 6.1.x is LTS, so it should be maintained I think).

Regards!

@dreirund
Copy link
Author

Feel free to pick the working patchset from here: https://github.com/CachyOS/kernel-patches/blob/master/6.1/misc/0001-lrng.patch

How much is this in sync with this repository "github.com/smuellerDD/lrng" here?

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 15, 2023 via email

@dreirund
Copy link
Author

As I pointed out, you need to add the followup patch which you simply need to apply on top 6784a5d

OK, I didn't got that I still need it when I use your v49-release, because said commit was from 2023-01-08, and the v49-release was "2 weeks ago" (so ca. 2023-03-01, definitely after the commit), so I assumed that you have incorporated this fix. -- Is there any reason that you do not incorporate it? You could make two 6.1-subdirectories, one 6.1.0-x and some 6.1.(x+1)+ or so.

Regards!

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 15, 2023 via email

@dreirund
Copy link
Author

dreirund commented Mar 15, 2023 via email

@smuellerDD
Copy link
Owner

smuellerDD commented Mar 15, 2023 via email

@dreirund
Copy link
Author

  • v49 is inteded for 6.2. Yet, there are backport patches to 6.1 available as 6.1 is an LTS kernel. v49 does include the symbol that causes you grief.

As I reported here with my grep seach, it seems not the case (only present for 6.2.x kernels, not for 6.1.x kernels), and as I reported here LRNG v49 caused me the same grief. Now I compiled vanilla kernel 6.1.12 with LRNG v49 and the additional patch, which applied without errors, and did relief my grief.

(I did never use git checkout, but always did download the v48/ v49 tarballs.)

@dreirund
Copy link
Author

dreirund commented Mar 17, 2023

OK, I have done some more indepth testing with different conditions.

I do not observe the problem with vanilla kernel 6.1.12 and LRNG patchsets v48 or v49 (both with additionally this patch which adds u32 __get_random_u32_below(u32 ceil)), but I do observe the problem with @pfactum's -pf-kernel 6.1-pf5. So the problem appears in interaction with LRNG and some other stuff in the -pf-kernel or the way LRNG was applied to the -pf-kernel. So I pull in @pfactum to this discussion, maybe he can help sorting things out.


Intermezzo: About the LRNG version that is used in 6.1-pf5:

I was asking @pfactum about which exact LRNG code he did use, the answer was somehow reverse engineered:

$ git log v6.1..lrng-6.1~ --oneline
b2659d10137f lrng: adopt e9a688bcb19348862afe30d7c85bc37c4c293471
970ec216f6ca lrng: fix return code of RNDADDENTROPY
f5ba2aec029d lrng: fix unlocking of backed entropy
767231139c9a lrng: limit kernel crypto API default DRNGs
e264d23da376 lrng-6.1: merge latest changes
a246a00bde6b lrng-6.1: accommodate v6.1 changes
bcab8deac91e LRNG - add hwrand framework interface
a1fab06005d0 LRNG - add /dev/lrng device file support
c77889487228 LRNG - add kernel crypto API interface
d595113de131 LRMG - add drop-in replacement random(4) API
487c70479912 LRNG - sysctls and /proc interface
f75e0b5160e0 LRNG - add power-on and runtime self-tests
210b7dd7f2cf LRNG - add interface for gathering of raw entropy
9aacb1ecd3bd LRNG - add option to enable runtime entropy rate configuration
2da57d8d7038 LRNG - add Jitter RNG fast noise source
77b5e0f28b4c crypto: move Jitter RNG header include dir
bd01627e0852 LRNG - CPU entropy source
1470d33db295 LRNG - add random.c entropy source support
88ad1d0ef3b3 LRNG - add SP800-90B compliant health tests
ef43add50768 LRNG - add scheduler-based entropy source
2f7065ddc579 scheduler - add entropy sampling hook
dbb4373ac57f LRNG - add interrupt entropy source
403059ab34ef LRNG - add common timer-based entropy source code
fb791a6c2a44 LRNG - add atomic DRNG implementation
5bf756e51df6 LRNG - add kernel crypto API PRNG extension
42c87bc978dc LRNG - add SP800-90A DRBG extension
7e75f46ebfcb crypto: DRBG - externalize DRBG functions for LRNG
b72fb7d7709d LRNG - add common generic hash support
c3422383f155 LRNG - add switchable DRNG support
f1be3e1c7bc9 LRNG - /proc interface
bac26cc2ba91 LRNG - allocate one DRNG instance per NUMA node
36d65e51658e crypto: Entropy Source and DRNG Manager

$ git diff v6.1..lrng-6.1~ | curl -F 'f:1=<-' ix.io
http://ix.io/4r6g

— maybe you two can sort out if LRNG was applied correctly.

I have not yet patched this (http://ix.io/4r6g) to vanilla kernel to test.


And now to my findings:

I had four modules loaded which are related to ucsi and depend on each other:

lsmod | grep -E '(ucsi|^Module)':

Module                  Size  Used by
ucsi_acpi              16384  0
typec_ucsi             36864  1 ucsi_acpi
typec                  94208  1 typec_ucsi
roles                  20480  1 typec_ucsi
  • When I unload ucsi_acpi, the problem does not appear.
  • When I reload ucsi_acpi, the problem does appear.
  • Exception: When I unload all four modules and then reload, the problem does not appear (I have triplechecked this). If I unload only up to typec and reload, the problem appears.

Weather something is connected to the USB-C port or not does not affect the result.

Here are the results of all my tests (testresults.ods, testresults.csv):

Kernel USB-C connected modules Problem observed
ucsi_acpi typec_ucsi typec roles
kept loaded from bootup manually unloaded … and reloaded kept loaded from bootup manually unloaded … and reloaded kept loaded from bootup manually unloaded … and reloaded kept loaded from bootup manually unloaded … and reloaded at sleep at suspend to disk
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
vanilla 6.1.12 + lrng v49 + patch 1 x x x x 0 0
vanilla 6.1.12 + lrng v48 + patch 1 x x x x 0 0
vanilla 6.1.12 + LRNG as used for 6.1-pf5 (power only) x x x x 0 0
6.1-pf5 0 x x x x 1 1
6.1-pf5 1 x x x x 1
6.1-pf5 1 x x x x 1
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 1
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 1
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0 (triple checked!)
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0
6.1-pf5 1 x x x x 0

And another remark: Whenever ucsi_acpi gets loaded, in the dmesg kernel log there appears the line

ucsi_acpi USBC000:00: PPM init failed (-110)

(Addendum: I see this also with a kernel (6.2-pf4) without LRNG) and I now also see a

ucsi_acpi USBC000:00: failed to re-enable notifications (-110)

about 9 seconds after my reload (I have not bisected when this message appears and when not; I am currently running in the session where I have unloaded all four modules and reloaded, where I now found this message.)


I am now at the end of what I see I can do.

@smuellerDD: If it makes sense to dig here deeper interactively, I am very happy to meet in person as teasered in our private email communication.

Regards!

@dreirund
Copy link
Author

dreirund commented Mar 17, 2023

I do not observe the problem with vanilla kernel 6.1.12 and LRNG patchsets v48 or v49 (both with additionally this patch which adds u32 __get_random_u32_below(u32 ceil)), but I do observe the problem with @pfactum's -pf-kernel 6.1-pf5. So the problem appears in interaction with LRNG and some other stuff in the -pf-kernel or the way LRNG was applied to the -pf-kernel.

To cross-check, I wanted to compile and test 6.1-pf6 (where LRNG has been dropped) with LRNG patchset v48 (and this necessary additional patch).

But I could not apply the patch:

  -> Applying patch 'v48-0023-LRMG-add-drop-in-replacement-random-4-API.patch' ...
patching file drivers/char/Makefile
Hunk #1 FAILED at 3.
1 out of 1 hunk FAILED -- saving rejects to file drivers/char/Makefile.rej
patching file drivers/char/lrng/Makefile
patching file drivers/char/lrng/lrng_interface_aux.c
patching file drivers/char/lrng/lrng_interface_dev_common.c
patching file drivers/char/lrng/lrng_interface_random_kernel.c
patching file drivers/char/lrng/lrng_interface_random_user.c

drivers/char/Makefile.rej reads

--- drivers/char/Makefile
+++ drivers/char/Makefile
@@ -3,7 +3,8 @@
 # Makefile for the kernel character device drivers.
 #
 
-obj-y                          += mem.o random.o
+obj-y                          += mem.o
+obj-$(CONFIG_RANDOM_DEFAULT_IMPL) += random.o
 obj-$(CONFIG_TTY_PRINTK)       += ttyprintk.o
 obj-y                          += misc.o
 obj-$(CONFIG_ATARI_DSP56K)     += dsp56k.o

@dreirund
Copy link
Author

dreirund commented Mar 17, 2023

  • v48 is intended to be used with 6.1. However, during the "stable" development cycle of 6.1 this one offending symbol was added which causes you grief. To handle that, I provide an extra update patch to v48 to make it work with latter 6.1 kernels.
  • v49 is inteded for 6.2. Yet, there are backport patches to 6.1 available as 6.1 is an LTS kernel. v49 does include the symbol that causes you grief.

A diff -r lrng-48/kernel_patches/v6.1 lrng-49/kernel_patches/v6.1 confirms that v48 and v49 are exactly the same with regard the code for 6.1 kernels: There are exactly no differences, the diff output is empty. No backport-to-6.1 patches present in v49.

Or do you mean that kernel_patches/v6.2 (i.e. the patches for 6.2.x kernels) should be used for newer 6.1 kernels?

@dreirund
Copy link
Author

dreirund commented Mar 25, 2023

I now did a cross-test with patching ↗ that LRNG into vanilla kernel 6.1.12, that according to @pfactum was used in kernel 6.1-pf5:

$ git diff v6.1..lrng-6.1~ | curl -F 'f:1=<-' ix.io
http://ix.io/4r6g

This also did not reproduce the issue.

So the initial issue seems not to come from LRNG alone, also not from vanilla kernel alone, also not from a combination of vanilla kernel and LRNG, but from a combination of LRNG and more changes done in 6.1-pf*.

If there is a wish to investigate further I think @pfactum should have interest, otherwise I propose to stop further investigation and leave a "bug buried somewhere" alone :-(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants