Replies: 2 comments
-
Potential Bugs in Per-CPU Scheduling and Solutions
|
Beta Was this translation helpful? Give feedback.
-
Potential Bugs in Per-CPU Scheduling and Solutions
|
Beta Was this translation helpful? Give feedback.
-
Background
Orginally, ArceOS uses a rude global RunQueue, and scheduling operations like
task yielding, waiting on wait queue and notifying a blocked task are
all under the protection of a global SpinLockNoIrq hold by RunQueue.
To support percpu scheduling, we must refactor the run queue structure,
as well as the locking mechanism in the current scheduling framework.
AxRunQueue and Scheduler crate
For the design of the scheduler interface, we can reference the design used in Linux:
Current
scheduler
crate used by ArceOSprovides a more fundamental scheduling method interface, which only includes
basic task operations and does not account for multiple run queues:
The current scheduler design focuses solely on the task states within its own ready queue.
The scheduler is held by AxRunQueue as a global static variable
and serves as a globally unique scheduler for all cores.
Referencing Linux's design, methods such as
select_task_rq
and those related toload balancing should be provided by the scheduler itself.
However, to simplify the design and minimize modifications to the scheduler crate,
we continue to use ArceOS's original design, managing the scheduler with AxRunQueue.
We will change
AxRunQueue
to be a per-CPU variable instead of a globally unique instance(as well as
EXITED_TASKS
,WAIT_FOR_EXIT
, andTIMER_LIST
).By doing this, the originally global unique SpinNoIrq of AxRunQueue needs to be distributed across each core.
We will refactor the locking mechanism and refine the granularity of the locks.
cpumask crate
We introduce cpumask crate for CPU affinity attribute for a task.
Lock, Irq and Preemption
AxRunQueue
The lock for AxRunQueue no longer exists.
For the run queue, we have refined the locks to the ready queue within the scheduler,
meaning that only operations that require interaction with the ready queue,
such as picking the next task and pushing tasks, need to be locked.
The current run queue for a core can be obtained through the
current_run_queue
method.This process needs to be performed under the protection of
kernel_guard
to ensure the safety of preemption and interrupt states.WaitQueue
Operations on the wait queue are no longer protected by the large lock of AxRunQueue.
We need to protect the wait queue using
SpinNoIrq
and distinguish it from operations related to the run queue:select_run_queue
method to choose an appropriate run queue for insertion.TimerList
The TimerList is also designed to be per-CPU, used for recording and responding to specific clock times.
This allows us to eliminate the lock for TimerList itself.
TimerList may be used in
wait_timeout_until
, where a task can simultaneously wait for either a notification or a timer event.Therefore, a task may be placed in both TimerList and WaitQueue.
To prevent a task from being awakened by both methods simultaneously, we need to apply an
unblock_lock
to the task, ensuring that the unblock operation for a task can succeed only once.The
on_cpu
flagWhen we reduce the lock granularity of the run queue and distinguish it from the wait queue locks,
we need to address a phenomenon:
when a task calls a
wait_xxx
method to wait on a specific wait queue,it may not have been scheduled away from its current CPU before being woken up by another CPU's wait queue and running on that CPU.
The general flow may be as follows:
We have to use some stragety to ensure read-after-write consistency.
shenango and skyloft introduce a
stack_busy
flag in task struct to indicate whether the task has finishes switching stacks,it is set as true for a task when yielding is about to happened, and cleared after its context has been saved to stack.
During scheduling process, when it tries to yield to a task with
stack_busy
true, it need to enter a spin loop:Linux use a
on_cpu
flagDuring a scheduling event in Linux, the process begins by calling
prepare_task
to set theon_cpu
flag of the next task to 1.After invoking the
switch_to
method to switch to the next task,the next task receives a return value pointing to the previous task's pointer,
allowing it to clear the
on_cpu
flag of the previous task.Basic workflow:
The TTWU_QUEUE feature in Linux allows the use of IPI to wake up a remote CPU within the
try_to_wake_up
function,instead of waiting for the task on the remote CPU to complete its scheduling process.
This reduces the overhead of spinlocks and locks.
on_cpu
flag in axtaskInspired by Linux's
on_cpu
flag design, we adopted a simpler logic.The on_cpu flag of a task running on a CPU is set to
true
, and after a task yields the CPU, the next task clears theon_cpu
flag of the previous task using theclear_prev_task_on_cpu
method.This method requires the task structure to store a pointer to the
on_cpu
flag of the previous task:In the
unblock_task
method, ifthe
on_cpu flag of the target task is found to betrue
, it indicates that the task has not completed its scheduling.We spin and wait for the task's
on_cpu
flag to becomefalse
, ensuring that the task being placed into the run queue has already vacated a CPU.Beta Was this translation helpful? Give feedback.
All reactions