kernel_samsung_a53x/kernel/sched
K Prateek Nayak e6fdbec5ef sched/core: Prevent wakeup of ksoftirqd during idle load balance
[ Upstream commit e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ]

Scheduler raises a SCHED_SOFTIRQ to trigger a load balancing event on
from the IPI handler on the idle CPU. If the SMP function is invoked
from an idle CPU via flush_smp_call_function_queue() then the HARD-IRQ
flag is not set and raise_softirq_irqoff() needlessly wakes ksoftirqd
because soft interrupts are handled before ksoftirqd get on the CPU.

Adding a trace_printk() in nohz_csd_func() at the spot of raising
SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup,
and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the
current behavior:

       <idle>-0   [000] dN.1.:  nohz_csd_func: Raising SCHED_SOFTIRQ from nohz_csd_func
       <idle>-0   [000] dN.4.:  sched_wakeup: comm=ksoftirqd/0 pid=16 prio=120 target_cpu=000
       <idle>-0   [000] .Ns1.:  softirq_entry: vec=7 [action=SCHED]
       <idle>-0   [000] .Ns1.:  softirq_exit: vec=7  [action=SCHED]
       <idle>-0   [000] d..2.:  sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=16 next_prio=120
  ksoftirqd/0-16  [000] d..2.:  sched_switch: prev_comm=ksoftirqd/0 prev_pid=16 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
       ...

Use __raise_softirq_irqoff() to raise the softirq. The SMP function call
is always invoked on the requested CPU in an interrupt handler. It is
guaranteed that soft interrupts are handled at the end.

Following are the observations with the changes when enabling the same
set of events:

       <idle>-0       [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ for nohz_idle_balance
       <idle>-0       [000] dN.1.: softirq_raise: vec=7 [action=SCHED]
       <idle>-0       [000] .Ns1.: softirq_entry: vec=7 [action=SCHED]

No unnecessary ksoftirqd wakeups are seen from idle task's context to
service the softirq.

Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@inria.fr/ [1]
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119054432.6405-5-kprateek.nayak@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-17 13:24:33 +01:00
..
ems kernel: sched: ems: drop usage of SCHED_FEAT 2024-11-19 17:52:14 +01:00
autogroup.c
autogroup.h
clock.c
completion.c
core.c sched/core: Prevent wakeup of ksoftirqd during idle load balance 2024-12-17 13:24:33 +01:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq.c
cpufreq_schedutil.c schedutil: Allow CPU frequency changes to be amended before they're set 2024-11-19 18:06:02 +01:00
cpupri.c
cpupri.h
cputime.c sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime 2024-11-23 23:20:24 +01:00
deadline.c
debug.c
fair.c sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy 2024-12-17 13:24:33 +01:00
features.h
idle.c sched/fair: Trigger the update of blocked load on newly idle cpu 2024-12-17 13:24:33 +01:00
isolation.c
loadavg.c
Makefile
membarrier.c sched/membarrier: reduce the ability to hammer on sys_membarrier 2024-11-18 12:13:39 +01:00
pelt.c
pelt.h
psi.c
rt.c sched/rt: Disallow writing invalid values to sched_rt_period_us 2024-11-18 22:25:32 +01:00
sched-pelt.h
sched.h sched/fair: Add NOHZ balancer flag for nohz.next_balance updates 2024-12-17 13:24:33 +01:00
sec_mpam.c
sec_mpam_cpbm.h
sec_mpam_sysfs.c
sec_mpam_sysfs.h
smp.h
stats.c
stats.h
stop_task.c
swait.c
topology.c sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-11-19 12:27:00 +01:00
wait.c
wait_bit.c