-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spin_lock_irqsave+sched_lock #14578
spin_lock_irqsave+sched_lock #14578
Conversation
157736b
to
175819e
Compare
@patacongo please review this patch which fix the long issue about sched lock. |
87f9b88
to
cde599d
Compare
5e196c5
to
9cf3e7f
Compare
… with the _wo_note suffix. Signed-off-by: hujun5 <[email protected]>
…ed_[un]lock reason: 1 Accelerated the implementation of sched_lock, remove enter_critical_section in sched_lock and only enter_critical_section when task scheduling is required. 2 we add sched_lock_wo_note/sched_unlock_wo_note and it does not perform instrumentation logic Signed-off-by: hujun5 <[email protected]>
reason: We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to replace enter_critical_section in the subsequent process. We imitate the implementation of Linux by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs within spin_lock_irqsave, which can lead to spinlock failures and deadlocks. Signed-off-by: hujun5 <[email protected]>
@hujun260 Great work, I think we can prepare a report showing the benefits of SMP mode after all spinlock changes |
after below change merge to kernel, spin_lock() will turn off preemption by default, but this change is not applicable to all scenarios. The locations in the kernel that use spin_lock() extensively only require short critical sections and do not trigger scheduling, which leads to serious performance degradation of NuttX in AMP mode. In this PR, I try to expose similar problems and hope that each subsystem will carefully check the code coverage apache#14578 |commit b69111d |Author: hujun5 <[email protected]> |Date: Thu Jan 23 16:14:18 2025 +0800 | | spinlock: add sched_lock to spin_lock_irqsave | | reason: | We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to | replace enter_critical_section in the subsequent process. We imitate the implementation of Linux | by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs | within spin_lock_irqsave, which can lead to spinlock failures and deadlocks. | | Signed-off-by: hujun5 <[email protected]> Signed-off-by: chao an <[email protected]>
after below change merge to kernel, spin_lock() will turn off preemption by default, but this change is not applicable to all scenarios. The locations in the kernel that use spin_lock() extensively only require short critical sections and do not trigger scheduling, which leads to serious performance degradation of NuttX in AMP mode. In this PR, I try to expose similar problems and hope that each subsystem will carefully check the code coverage #14578 |commit b69111d |Author: hujun5 <[email protected]> |Date: Thu Jan 23 16:14:18 2025 +0800 | | spinlock: add sched_lock to spin_lock_irqsave | | reason: | We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to | replace enter_critical_section in the subsequent process. We imitate the implementation of Linux | by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs | within spin_lock_irqsave, which can lead to spinlock failures and deadlocks. | | Signed-off-by: hujun5 <[email protected]> Signed-off-by: chao an <[email protected]>
Hi, Seems this PR is crashing most of our devices. My local tests and internal tests seems failing due to Here is the open issue #15688 |
Is this affecting all Espressif devices, @eren-terzioglu ? Can you please take a look as soon as possible, @hujun260 ? |
As far as I saw failing defconfigs are: Xtensa
Risc-V
|
ok, i will look into this issue |
…69111d of apache#14578 reason: Due to the addition of sched_lock in the spinlock, using a spinlock in the *cpustart file during the boot phase is quite special. CPU0 waits for CPU1 to start up, using a spinlock as a multi-core synchronization strategy. However, the matching calls are not made within the same task, resulting in a mismatch in the scheduler lock count and preventing the system from booting. The sequence is: CPU0 spin_lock, spin_lock, spin_unlock; CPU1 spin_unlock. CPU0 and CPU1 are running different tasks. Signed-off-by: hujun5 <[email protected]>
…69111d of apache#14578 reason: Due to the addition of sched_lock in the spinlock, using a spinlock in the *cpustart file during the boot phase is quite special. CPU0 waits for CPU1 to start up, using a spinlock as a multi-core synchronization strategy. However, the matching calls are not made within the same task, resulting in a mismatch in the scheduler lock count and preventing the system from booting. The sequence is: CPU0 spin_lock, spin_lock, spin_unlock; CPU1 spin_unlock. CPU0 and CPU1 are running different tasks. Signed-off-by: hujun5 <[email protected]>
Summary
1 Accelerated the implementation of sched_lock, remove enter_critical_section in sched_lock and
only enter_critical_section when task scheduling is required.
2 we add sched_lock_wo_note/sched_unlock_wo_note and it does not perform instrumentation logic
3 We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to
replace enter_critical_section in the subsequent process. We imitate the implementation of Linux
by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs
within spin_lock_irqsave, which can lead to spinlock failures and deadlocks.
The entire implementation process includes:
1 spin_lock_irqsave + sched_lock
2 spin_lock/rw/spin_trylock + sched_lock
3 enter_critical_section + sched_lock
We are currently implementing the first step.
Impact
spinlock and sched_lock
Testing
Build Host:
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx