Commit graph

7437 commits

Author SHA1 Message Date
Vincent Guittot
a8fb0d2a01 sched/fair: Skip update_blocked_averages if we are defering load balance
In newidle_balance(), the scheduler skips load balance to the new idle cpu
when the 1st sd of this_rq is:

   this_rq->avg_idle < sd->max_newidle_lb_cost

Doing a costly call to update_blocked_averages() will not be useful and
simply adds overhead when this condition is true.

Check the condition early in newidle_balance() to skip
update_blocked_averages() when possible.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20211019123537.17146-3-vincent.guittot@linaro.org
2024-12-18 12:21:09 +01:00
Vincent Guittot
1840166c64 sched/fair: Account update_blocked_averages in newidle_balance cost
The time spent to update the blocked load can be significant depending of
the complexity fo the cgroup hierarchy. Take this time into account in
the cost of the 1st load balance of a newly idle cpu.

Also reduce the number of call to sched_clock_cpu() and track more actual
work.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20211019123537.17146-2-vincent.guittot@linaro.org
2024-12-18 12:21:05 +01:00
Vincent Guittot
6c5c1812b2 sched/fair: Sync load_sum with load_avg after dequeue
commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
reported some inconsitencies between *_avg and *_sum.

commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fixed some but one remains when dequeuing load.

sync the cfs's load_sum with its load_avg after dequeuing the load of a
sched_entity.

Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210701171837.32156-1-vincent.guittot@linaro.org
2024-12-18 12:20:59 +01:00
Rik van Riel
02d6e7246b sched/fair: Ensure that the CFS parent is added after unthrottling
Ensure that a CFS parent will be in the list whenever one of its children is also
in the list.

A warning on rq->tmp_alone_branch != &rq->leaf_cfs_rq_list has been
reported while running LTP test cfs_bandwidth01.

Odin Ugedal found the root cause:

	$ tree /sys/fs/cgroup/ltp/ -d --charset=ascii
	/sys/fs/cgroup/ltp/
	|-- drain
	`-- test-6851
	    `-- level2
		|-- level3a
		|   |-- worker1
		|   `-- worker2
		`-- level3b
		    `-- worker3

Timeline (ish):
- worker3 gets throttled
- level3b is decayed, since it has no more load
- level2 get throttled
- worker3 get unthrottled
- level2 get unthrottled
  - worker3 is added to list
  - level3b is not added to list, since nr_running==0 and is decayed

 [ Vincent Guittot: Rebased and updated to fix for the reported warning. ]

Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Acked-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210621174330.11258-1-vincent.guittot@linaro.org
2024-12-18 12:20:53 +01:00
Dietmar Eggemann
7552240c84 sched/fair: Return early from update_tg_cfs_load() if delta == 0
In case the _avg delta is 0 there is no need to update se's _avg
(level n) nor cfs_rq's _avg (level n-1). These values stay the same.

Since cfs_rq's _avg isn't changed, i.e. no load is propagated down,
cfs_rq's _sum should stay the same as well.

So bail out after se's _sum has been updated.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210601083616.804229-1-dietmar.eggemann@arm.com
2024-12-18 12:20:46 +01:00
Odin Ugedal
e6ce990033 sched/fair: Correctly insert cfs_rq's to list on unthrottle
Fix an issue where fairness is decreased since cfs_rq's can end up not
being decayed properly. For two sibling control groups with the same
priority, this can often lead to a load ratio of 99/1 (!!).

This happens because when a cfs_rq is throttled, all the descendant
cfs_rq's will be removed from the leaf list. When they initial cfs_rq
is unthrottled, it will currently only re add descendant cfs_rq's if
they have one or more entities enqueued. This is not a perfect
heuristic.

Instead, we insert all cfs_rq's that contain one or more enqueued
entities, or it its load is not completely decayed.

Can often lead to situations like this for equally weighted control
groups:

  $ ps u -C stress
  USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root       10009 88.8  0.0   3676   100 pts/1    R+   11:04   0:13 stress --cpu 1
  root       10023  3.0  0.0   3676   104 pts/1    R+   11:04   0:00 stress --cpu 1

Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
[vingo: !SMP build fix]
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al
2024-12-18 12:20:43 +01:00
Vincent Guittot
4dcff238b1 sched/pelt: Ensure that *_sum is always synced with *_avg
Rounding in PELT calculation happening when entities are attached/detached
of a cfs_rq can result into situations where util/runnable_avg is not null
but util/runnable_sum is. This is normally not possible so we need to
ensure that util/runnable_sum stays synced with util/runnable_avg.

detach_entity_load_avg() is the last place where we don't sync
util/runnable_sum with util/runnbale_avg when moving some sched_entities

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210601085832.12626-1-vincent.guittot@linaro.org
2024-12-18 12:20:38 +01:00
Vincent Guittot
f2b5f774cd sched/fair: Keep load_avg and load_sum synced
when removing a cfs_rq from the list we only check _sum value so we must
ensure that _avg and _sum stay synced so load_sum can't be null whereas
load_avg is not after propagating load in the cgroup hierarchy.

Use load_avg to compute load_sum similarly to what is done for util_sum
and runnable_sum.

Fixes: 0e2d2aaaae52 ("sched/fair: Rewrite PELT migration propagation")
Reported-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lkml.kernel.org/r/20210527122916.27683-2-vincent.guittot@linaro.org
2024-12-18 12:20:35 +01:00
Rik van Riel
5f7d83f746 sched,fair: Skip newidle_balance if a wakeup is pending
The try_to_wake_up function has an optimization where it can queue
a task for wakeup on its previous CPU, if the task is still in the
middle of going to sleep inside schedule().

Once schedule() re-enables IRQs, the task will be woken up with an
IPI, and placed back on the runqueue.

If we have such a wakeup pending, there is no need to search other
CPUs for runnable tasks. Just skip (or bail out early from) newidle
balancing, and run the just woken up task.

For a memcache like workload test, this reduces total CPU use by
about 2%, proportionally split between user and system time,
and p99 and p95 application response time by 10% on average.
The schedstats run_delay number shows a similar improvement.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210422130236.0bb353df@imladris.surriel.com
2024-12-18 12:20:32 +01:00
Valentin Schneider
2786c1d912 sched/fair: Introduce a CPU capacity comparison helper
During load-balance, groups classified as group_misfit_task are filtered
out if they do not pass

  group_smaller_max_cpu_capacity(<candidate group>, <local group>);

which itself employs fits_capacity() to compare the sgc->max_capacity of
both groups.

Due to the underlying margin, fits_capacity(X, 1024) will return false for
any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are
{261, 871, 1024}. If a CPU-bound task ends up on one of those "medium"
CPUs, misfit migration will never intentionally upmigrate it to a CPU of
higher capacity due to the aforementioned margin.

One may argue the 20% margin of fits_capacity() is excessive in the advent
of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here
is that fits_capacity() is meant to compare a utilization value to a
capacity value, whereas here it is being used to compare two capacity
values. As CPU capacity and task utilization have different dynamics, a
sensible approach here would be to add a new helper dedicated to comparing
CPU capacities.

Also note that comparing capacity extrema of local and source sched_group's
doesn't make much sense when at the day of the day the imbalance will be
pulled by a known env->dst_cpu, whose capacity can be anywhere within the
local group's capacity extrema.

While at it, replace group_smaller_{min, max}_cpu_capacity() with
comparisons of the source group's min/max capacity and the destination
CPU's capacity.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Link: https://lkml.kernel.org/r/20210407220628.3798191-4-valentin.schneider@arm.com
2024-12-18 12:20:29 +01:00
Valentin Schneider
69f4b0d9cd sched/fair: Clean up active balance nr_balance_failed trickery
When triggering an active load balance, sd->nr_balance_failed is set to
such a value that any further can_migrate_task() using said sd will ignore
the output of task_hot().

This behaviour makes sense, as active load balance intentionally preempts a
rq's running task to migrate it right away, but this asynchronous write is
a bit shoddy, as the stopper thread might run active_load_balance_cpu_stop
before the sd->nr_balance_failed write either becomes visible to the
stopper's CPU or even happens on the CPU that appended the stopper work.

Add a struct lb_env flag to denote active balancing, and use it in
can_migrate_task(). Remove the sd->nr_balance_failed write that served the
same purpose. Cleanup the LBF_DST_PINNED active balance special case.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210407220628.3798191-3-valentin.schneider@arm.com
2024-12-18 12:20:26 +01:00
Aubrey Li
06dc4d7881 sched/fair: Reduce long-tail newly idle balance cost
A long-tail load balance cost is observed on the newly idle path,
this is caused by a race window between the first nr_running check
of the busiest runqueue and its nr_running recheck in detach_tasks.

Before the busiest runqueue is locked, the tasks on the busiest
runqueue could be pulled by other CPUs and nr_running of the busiest
runqueu becomes 1 or even 0 if the running task becomes idle, this
causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
load_balance redo at the same sched_domain level.

In order to find the new busiest sched_group and CPU, load balance will
recompute and update the various load statistics, which eventually leads
to the long-tail load balance cost.

This patch clears LBF_ALL_PINNED flag for this race condition, and hence
reduces the long-tail cost of newly idle balance.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1614154549-116078-1-git-send-email-aubrey.li@intel.com
2024-12-18 12:20:21 +01:00
Barry Song
4d6c2b17f9 sched/fair: Optimize test_idle_cores() for !SMT
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.

This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.

Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.

  $ numactl -N 0 hackbench -p -T -l 20000 -g $1

The below is the result of hackbench w/ and w/o this patch:

  g=    2      4     6       8      10     12      14
  w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
  w/ : 1.8428 3.7436 5.4501 6.9522 8.2882  9.9535 11.3367
			    +4.1%  +8.3%  +7.3%   +6.3%

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
2024-12-18 12:20:16 +01:00
Vincent Guittot
2cff544d66 sched/fair: Reorder newidle_balance pulled_task tests
Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224133007.28644-6-vincent.guittot@linaro.org
2024-12-18 12:20:06 +01:00
Nahuel Gómez
27515e820a battery: sm5451_charger: fix build on 5.10
debugfs_create_x32 is a void.

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:56 +01:00
Nahuel Gómez
7fb3935edb battery: import sm5451_charger driver from F926B
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:51 +01:00
Nahuel Gómez
cb6a5e60da battery: nuke sm5451_charger driver from a53x
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:46 +01:00
Nahuel Gómez
633af00caf configs: drop HICCUP_CC_DISABLE
Match a33x defconfig

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:20 +01:00
1bcc615dc7 Reapply "mfc: Import IS_UHD_RES definition"
This reverts commit d9434755e0.
2024-12-18 11:46:00 +01:00
53464baf85 Reapply "configs: kill SCHEDSTATS and SCHED_DEBUG"
This reverts commit 326c808f5c.
2024-12-18 11:25:40 +01:00
f656b91682 Reapply "configs: drop KZEROD"
This reverts commit 0cb16ca2c7.
2024-12-18 11:25:29 +01:00
20898b7053 Reapply "mm: vmstat: use power efficient workingqueues"
This reverts commit dcc7a0ac59.
2024-12-18 11:25:22 +01:00
e4f1c291cf Reapply "PM / freezer: Reduce freeze timeout to 1 second for Android"
This reverts commit cac147d1b1.
2024-12-18 11:25:17 +01:00
427697671c Reapply "mfc: Reduce QoS boosting from Samsung hacks"
This reverts commit e1b24976b4.
2024-12-18 11:25:12 +01:00
c88287ceb0 Reapply "sysctl: promote several nodes out of CONFIG_SCHED_DEBUG"
This reverts commit ea96a0db96.
2024-12-18 11:25:07 +01:00
4704b76edc Reapply "blkdev: set max nr_requests to 64"
This reverts commit 07889cad44.
2024-12-18 11:24:59 +01:00
933c6f2671 Reapply "rcu: Make the grace period workers unbound again"
This reverts commit 613c9dd5f1.
2024-12-18 11:24:55 +01:00
2ad6f215fd Reapply "smp: Use migrate disable/enable in smp_call_function_single_async()"
This reverts commit f37c1e4577.
2024-12-18 11:24:48 +01:00
f6ee50f616 Reapply "sched: core: Minimize number of tasks to load balance"
This reverts commit d03ac752be.
2024-12-18 11:24:44 +01:00
357b4c015c Reapply "mm: Add likelihood labels to quiet_vmstat conditions"
This reverts commit 46ff115435.
2024-12-18 11:24:38 +01:00
55d2e44b8e Reapply "ARM64: dts/s5e8825: disable more unused stuff"
This reverts commit 30bd5b7761.
2024-12-18 11:24:33 +01:00
178633ff08 Reapply "configs: disable some unnecessary DSS stuff"
This reverts commit 47e32f67d1.
2024-12-18 11:24:28 +01:00
52028ebf99 Reapply "ARM64: dts/s5e8825: drop more reserved memory"
This reverts commit 46ca369f57.
2024-12-18 11:24:24 +01:00
12ce49e196 Reapply "timekeeping: Keep the tick alive when CPUs cycle out of s2idle"
This reverts commit 0a04e29636.
2024-12-18 11:24:12 +01:00
0a8f9b7b96 Reapply "media: v4l: Use interruptible waits"
This reverts commit 1d724a61ea.
2024-12-18 11:24:07 +01:00
eb0ca57e21 Revert "lib/Kconfig.debug: Remove DEBUG_KERNEL depend on DEBUG_KMEMLEAK|SCHED_DEBUG|SCHEDSTATS"
This reverts commit bb13d89a87.
2024-12-18 11:09:39 +01:00
febb7ecbd1 Revert "sysctl: promote several nodes out of CONFIG_SCHED_DEBUG"
This reverts commit 26944181d5.
2024-12-18 11:09:28 +01:00
8ce8f6324e Revert "fs,kernel,mm: tune to Ktweak balance"
This reverts commit 50e7a3b302.
2024-12-18 11:09:11 +01:00
afd9b3fd16 Revert "BACKPORT: FROMGIT: kbuild: Remove support for Clang's ThinLTO caching"
This reverts commit f268fdb66d.
2024-12-18 11:08:09 +01:00
48c33cf21c Revert "sched: core: Minimize number of tasks to load balance"
This reverts commit e983caf81d.
2024-12-18 11:07:27 +01:00
d9434755e0 Revert "mfc: Import IS_UHD_RES definition"
This reverts commit 50f2fde1f2.
2024-12-18 11:06:02 +01:00
0147ae88b9 Revert "ARM64: dts: s5e8825: disable some debug stuff"
This reverts commit 554d5fd356.
2024-12-18 11:05:39 +01:00
2381d696fa Revert "FROMLIST: memblock: handle overlapped reserved memory region"
This reverts commit 0adf530671.
2024-12-18 11:05:10 +01:00
58e2d66f91 Revert "fs,kernel,mm: tune to Ktweak balance"
This reverts commit 50e7a3b302.
2024-12-18 09:40:48 +01:00
1d724a61ea Revert "media: v4l: Use interruptible waits"
This reverts commit 4c1b6e4beb.
2024-12-18 09:40:12 +01:00
0a04e29636 Revert "timekeeping: Keep the tick alive when CPUs cycle out of s2idle"
This reverts commit 3af8699aee.
2024-12-18 09:38:06 +01:00
46ca369f57 Revert "ARM64: dts/s5e8825: drop more reserved memory"
This reverts commit 9632f64cde.
2024-12-18 09:37:47 +01:00
47e32f67d1 Revert "configs: disable some unnecessary DSS stuff"
This reverts commit b8c6547f74.
2024-12-18 00:32:45 +01:00
30bd5b7761 Revert "ARM64: dts/s5e8825: disable more unused stuff"
This reverts commit 171825cf94.
2024-12-18 00:32:35 +01:00
46ff115435 Revert "mm: Add likelihood labels to quiet_vmstat conditions"
This reverts commit 39410cb2f3.
2024-12-18 00:32:31 +01:00