(cherry picked from commit 52e60a930f0480f2082bc65836c7edb4861de7aa)
(cherry picked from commit bb05861a4f3bb99b76e7e43ba17f3659fcb12443)
(cherry picked from commit d92be4aa5222a52104bed81fe60c5c10a091b4c1)
(cherry picked from commit cc59747fd00d37a9cbcfc3d5175de1a9062dd758)
(cherry picked from commit 101acdd830d9b9cd02104b7721886ee3d16a3930)
(cherry picked from commit faf5434d8d14ae88886aa7b21308c68ff2bce208)
(cherry picked from commit edb793442e06e5499761c40335b109118d13d676)
(cherry picked from commit 36d9884dd06be11d312174806f5635fad205d96c)
(cherry picked from commit e70e90b8bd8f7eba350557a51048f6431bd990b6)
(cherry picked from commit 9eaab60f5360aa13185c63b82231d1f2c814bdcf)
(cherry picked from commit e51c6715a6110da0697d65530154e44f729d7f70)
(cherry picked from commit 06c1d2b640eec244403d0c52da508b210af94881)
(cherry picked from commit 9d1b56fdba1323c41725873e2367b2bdd2693817)
(cherry picked from commit 7df85bddf4000f0f9146ca72e89da11ea82a0a99)
(cherry picked from commit 63765755ceb1e0e74b13395a3c3a9d4c9badb1a1)
(cherry picked from commit 47ba0aaf2ab21b1b1d8c3bb3f651b7254020443d)
(cherry picked from commit d5a3efc7ea799fd64bae357c4d42d315a14c4909)
(cherry picked from commit 620d3a3d388d3c8ec1227ba54ef7ec74b240831d)
Running the devfreq monitor in a normal-priority workqueue can result
in the devfreq mutex lock being held for a while when the monitor
worker sleeps inside update_devfreq(). Freeing up the devfreq mutex
lock is important to ensure timely devfreq boosts, so run the monitor
workqueue in a high-priority workqueue to reduce the lock duration.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Change-Id: I0dd9363d2b8da9dda0bed205a5ddc36f75fabeef
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
(cherry picked from commit 7c201829c9c1e1ebb1384de66e02b8249d83167e)
Signed-off-by: TogoFire <togofire@mailfence.com>
Signed-off-by: onettboots <blackcocopet@gmail.com>
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org
Signed-off-by: John Vincent <git@tenseventyseven.cf>
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.
tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.
I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.
Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.
There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.
Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>