Commit graph

7185 commits

Author SHA1 Message Date
Vincent Guittot
6c5c1812b2 sched/fair: Sync load_sum with load_avg after dequeue
commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
reported some inconsitencies between *_avg and *_sum.

commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fixed some but one remains when dequeuing load.

sync the cfs's load_sum with its load_avg after dequeuing the load of a
sched_entity.

Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210701171837.32156-1-vincent.guittot@linaro.org
2024-12-18 12:20:59 +01:00
Rik van Riel
02d6e7246b sched/fair: Ensure that the CFS parent is added after unthrottling
Ensure that a CFS parent will be in the list whenever one of its children is also
in the list.

A warning on rq->tmp_alone_branch != &rq->leaf_cfs_rq_list has been
reported while running LTP test cfs_bandwidth01.

Odin Ugedal found the root cause:

	$ tree /sys/fs/cgroup/ltp/ -d --charset=ascii
	/sys/fs/cgroup/ltp/
	|-- drain
	`-- test-6851
	    `-- level2
		|-- level3a
		|   |-- worker1
		|   `-- worker2
		`-- level3b
		    `-- worker3

Timeline (ish):
- worker3 gets throttled
- level3b is decayed, since it has no more load
- level2 get throttled
- worker3 get unthrottled
- level2 get unthrottled
  - worker3 is added to list
  - level3b is not added to list, since nr_running==0 and is decayed

 [ Vincent Guittot: Rebased and updated to fix for the reported warning. ]

Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Acked-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210621174330.11258-1-vincent.guittot@linaro.org
2024-12-18 12:20:53 +01:00
Dietmar Eggemann
7552240c84 sched/fair: Return early from update_tg_cfs_load() if delta == 0
In case the _avg delta is 0 there is no need to update se's _avg
(level n) nor cfs_rq's _avg (level n-1). These values stay the same.

Since cfs_rq's _avg isn't changed, i.e. no load is propagated down,
cfs_rq's _sum should stay the same as well.

So bail out after se's _sum has been updated.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210601083616.804229-1-dietmar.eggemann@arm.com
2024-12-18 12:20:46 +01:00
Odin Ugedal
e6ce990033 sched/fair: Correctly insert cfs_rq's to list on unthrottle
Fix an issue where fairness is decreased since cfs_rq's can end up not
being decayed properly. For two sibling control groups with the same
priority, this can often lead to a load ratio of 99/1 (!!).

This happens because when a cfs_rq is throttled, all the descendant
cfs_rq's will be removed from the leaf list. When they initial cfs_rq
is unthrottled, it will currently only re add descendant cfs_rq's if
they have one or more entities enqueued. This is not a perfect
heuristic.

Instead, we insert all cfs_rq's that contain one or more enqueued
entities, or it its load is not completely decayed.

Can often lead to situations like this for equally weighted control
groups:

  $ ps u -C stress
  USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root       10009 88.8  0.0   3676   100 pts/1    R+   11:04   0:13 stress --cpu 1
  root       10023  3.0  0.0   3676   104 pts/1    R+   11:04   0:00 stress --cpu 1

Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
[vingo: !SMP build fix]
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al
2024-12-18 12:20:43 +01:00
Vincent Guittot
4dcff238b1 sched/pelt: Ensure that *_sum is always synced with *_avg
Rounding in PELT calculation happening when entities are attached/detached
of a cfs_rq can result into situations where util/runnable_avg is not null
but util/runnable_sum is. This is normally not possible so we need to
ensure that util/runnable_sum stays synced with util/runnable_avg.

detach_entity_load_avg() is the last place where we don't sync
util/runnable_sum with util/runnbale_avg when moving some sched_entities

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210601085832.12626-1-vincent.guittot@linaro.org
2024-12-18 12:20:38 +01:00
Vincent Guittot
f2b5f774cd sched/fair: Keep load_avg and load_sum synced
when removing a cfs_rq from the list we only check _sum value so we must
ensure that _avg and _sum stay synced so load_sum can't be null whereas
load_avg is not after propagating load in the cgroup hierarchy.

Use load_avg to compute load_sum similarly to what is done for util_sum
and runnable_sum.

Fixes: 0e2d2aaaae52 ("sched/fair: Rewrite PELT migration propagation")
Reported-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lkml.kernel.org/r/20210527122916.27683-2-vincent.guittot@linaro.org
2024-12-18 12:20:35 +01:00
Rik van Riel
5f7d83f746 sched,fair: Skip newidle_balance if a wakeup is pending
The try_to_wake_up function has an optimization where it can queue
a task for wakeup on its previous CPU, if the task is still in the
middle of going to sleep inside schedule().

Once schedule() re-enables IRQs, the task will be woken up with an
IPI, and placed back on the runqueue.

If we have such a wakeup pending, there is no need to search other
CPUs for runnable tasks. Just skip (or bail out early from) newidle
balancing, and run the just woken up task.

For a memcache like workload test, this reduces total CPU use by
about 2%, proportionally split between user and system time,
and p99 and p95 application response time by 10% on average.
The schedstats run_delay number shows a similar improvement.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210422130236.0bb353df@imladris.surriel.com
2024-12-18 12:20:32 +01:00
Valentin Schneider
2786c1d912 sched/fair: Introduce a CPU capacity comparison helper
During load-balance, groups classified as group_misfit_task are filtered
out if they do not pass

  group_smaller_max_cpu_capacity(<candidate group>, <local group>);

which itself employs fits_capacity() to compare the sgc->max_capacity of
both groups.

Due to the underlying margin, fits_capacity(X, 1024) will return false for
any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are
{261, 871, 1024}. If a CPU-bound task ends up on one of those "medium"
CPUs, misfit migration will never intentionally upmigrate it to a CPU of
higher capacity due to the aforementioned margin.

One may argue the 20% margin of fits_capacity() is excessive in the advent
of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here
is that fits_capacity() is meant to compare a utilization value to a
capacity value, whereas here it is being used to compare two capacity
values. As CPU capacity and task utilization have different dynamics, a
sensible approach here would be to add a new helper dedicated to comparing
CPU capacities.

Also note that comparing capacity extrema of local and source sched_group's
doesn't make much sense when at the day of the day the imbalance will be
pulled by a known env->dst_cpu, whose capacity can be anywhere within the
local group's capacity extrema.

While at it, replace group_smaller_{min, max}_cpu_capacity() with
comparisons of the source group's min/max capacity and the destination
CPU's capacity.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Link: https://lkml.kernel.org/r/20210407220628.3798191-4-valentin.schneider@arm.com
2024-12-18 12:20:29 +01:00
Valentin Schneider
69f4b0d9cd sched/fair: Clean up active balance nr_balance_failed trickery
When triggering an active load balance, sd->nr_balance_failed is set to
such a value that any further can_migrate_task() using said sd will ignore
the output of task_hot().

This behaviour makes sense, as active load balance intentionally preempts a
rq's running task to migrate it right away, but this asynchronous write is
a bit shoddy, as the stopper thread might run active_load_balance_cpu_stop
before the sd->nr_balance_failed write either becomes visible to the
stopper's CPU or even happens on the CPU that appended the stopper work.

Add a struct lb_env flag to denote active balancing, and use it in
can_migrate_task(). Remove the sd->nr_balance_failed write that served the
same purpose. Cleanup the LBF_DST_PINNED active balance special case.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210407220628.3798191-3-valentin.schneider@arm.com
2024-12-18 12:20:26 +01:00
Aubrey Li
06dc4d7881 sched/fair: Reduce long-tail newly idle balance cost
A long-tail load balance cost is observed on the newly idle path,
this is caused by a race window between the first nr_running check
of the busiest runqueue and its nr_running recheck in detach_tasks.

Before the busiest runqueue is locked, the tasks on the busiest
runqueue could be pulled by other CPUs and nr_running of the busiest
runqueu becomes 1 or even 0 if the running task becomes idle, this
causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
load_balance redo at the same sched_domain level.

In order to find the new busiest sched_group and CPU, load balance will
recompute and update the various load statistics, which eventually leads
to the long-tail load balance cost.

This patch clears LBF_ALL_PINNED flag for this race condition, and hence
reduces the long-tail cost of newly idle balance.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1614154549-116078-1-git-send-email-aubrey.li@intel.com
2024-12-18 12:20:21 +01:00
Barry Song
4d6c2b17f9 sched/fair: Optimize test_idle_cores() for !SMT
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.

This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.

Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.

  $ numactl -N 0 hackbench -p -T -l 20000 -g $1

The below is the result of hackbench w/ and w/o this patch:

  g=    2      4     6       8      10     12      14
  w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
  w/ : 1.8428 3.7436 5.4501 6.9522 8.2882  9.9535 11.3367
			    +4.1%  +8.3%  +7.3%   +6.3%

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
2024-12-18 12:20:16 +01:00
Vincent Guittot
2cff544d66 sched/fair: Reorder newidle_balance pulled_task tests
Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224133007.28644-6-vincent.guittot@linaro.org
2024-12-18 12:20:06 +01:00
Nahuel Gómez
27515e820a battery: sm5451_charger: fix build on 5.10
debugfs_create_x32 is a void.

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:56 +01:00
Nahuel Gómez
7fb3935edb battery: import sm5451_charger driver from F926B
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:51 +01:00
Nahuel Gómez
cb6a5e60da battery: nuke sm5451_charger driver from a53x
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:46 +01:00
Nahuel Gómez
633af00caf configs: drop HICCUP_CC_DISABLE
Match a33x defconfig

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 12:15:20 +01:00
1bcc615dc7 Reapply "mfc: Import IS_UHD_RES definition"
This reverts commit d9434755e0.
2024-12-18 11:46:00 +01:00
53464baf85 Reapply "configs: kill SCHEDSTATS and SCHED_DEBUG"
This reverts commit 326c808f5c.
2024-12-18 11:25:40 +01:00
f656b91682 Reapply "configs: drop KZEROD"
This reverts commit 0cb16ca2c7.
2024-12-18 11:25:29 +01:00
20898b7053 Reapply "mm: vmstat: use power efficient workingqueues"
This reverts commit dcc7a0ac59.
2024-12-18 11:25:22 +01:00
e4f1c291cf Reapply "PM / freezer: Reduce freeze timeout to 1 second for Android"
This reverts commit cac147d1b1.
2024-12-18 11:25:17 +01:00
427697671c Reapply "mfc: Reduce QoS boosting from Samsung hacks"
This reverts commit e1b24976b4.
2024-12-18 11:25:12 +01:00
c88287ceb0 Reapply "sysctl: promote several nodes out of CONFIG_SCHED_DEBUG"
This reverts commit ea96a0db96.
2024-12-18 11:25:07 +01:00
4704b76edc Reapply "blkdev: set max nr_requests to 64"
This reverts commit 07889cad44.
2024-12-18 11:24:59 +01:00
933c6f2671 Reapply "rcu: Make the grace period workers unbound again"
This reverts commit 613c9dd5f1.
2024-12-18 11:24:55 +01:00
2ad6f215fd Reapply "smp: Use migrate disable/enable in smp_call_function_single_async()"
This reverts commit f37c1e4577.
2024-12-18 11:24:48 +01:00
f6ee50f616 Reapply "sched: core: Minimize number of tasks to load balance"
This reverts commit d03ac752be.
2024-12-18 11:24:44 +01:00
357b4c015c Reapply "mm: Add likelihood labels to quiet_vmstat conditions"
This reverts commit 46ff115435.
2024-12-18 11:24:38 +01:00
55d2e44b8e Reapply "ARM64: dts/s5e8825: disable more unused stuff"
This reverts commit 30bd5b7761.
2024-12-18 11:24:33 +01:00
178633ff08 Reapply "configs: disable some unnecessary DSS stuff"
This reverts commit 47e32f67d1.
2024-12-18 11:24:28 +01:00
52028ebf99 Reapply "ARM64: dts/s5e8825: drop more reserved memory"
This reverts commit 46ca369f57.
2024-12-18 11:24:24 +01:00
12ce49e196 Reapply "timekeeping: Keep the tick alive when CPUs cycle out of s2idle"
This reverts commit 0a04e29636.
2024-12-18 11:24:12 +01:00
0a8f9b7b96 Reapply "media: v4l: Use interruptible waits"
This reverts commit 1d724a61ea.
2024-12-18 11:24:07 +01:00
eb0ca57e21 Revert "lib/Kconfig.debug: Remove DEBUG_KERNEL depend on DEBUG_KMEMLEAK|SCHED_DEBUG|SCHEDSTATS"
This reverts commit bb13d89a87.
2024-12-18 11:09:39 +01:00
febb7ecbd1 Revert "sysctl: promote several nodes out of CONFIG_SCHED_DEBUG"
This reverts commit 26944181d5.
2024-12-18 11:09:28 +01:00
8ce8f6324e Revert "fs,kernel,mm: tune to Ktweak balance"
This reverts commit 50e7a3b302.
2024-12-18 11:09:11 +01:00
afd9b3fd16 Revert "BACKPORT: FROMGIT: kbuild: Remove support for Clang's ThinLTO caching"
This reverts commit f268fdb66d.
2024-12-18 11:08:09 +01:00
48c33cf21c Revert "sched: core: Minimize number of tasks to load balance"
This reverts commit e983caf81d.
2024-12-18 11:07:27 +01:00
d9434755e0 Revert "mfc: Import IS_UHD_RES definition"
This reverts commit 50f2fde1f2.
2024-12-18 11:06:02 +01:00
0147ae88b9 Revert "ARM64: dts: s5e8825: disable some debug stuff"
This reverts commit 554d5fd356.
2024-12-18 11:05:39 +01:00
2381d696fa Revert "FROMLIST: memblock: handle overlapped reserved memory region"
This reverts commit 0adf530671.
2024-12-18 11:05:10 +01:00
58e2d66f91 Revert "fs,kernel,mm: tune to Ktweak balance"
This reverts commit 50e7a3b302.
2024-12-18 09:40:48 +01:00
1d724a61ea Revert "media: v4l: Use interruptible waits"
This reverts commit 4c1b6e4beb.
2024-12-18 09:40:12 +01:00
0a04e29636 Revert "timekeeping: Keep the tick alive when CPUs cycle out of s2idle"
This reverts commit 3af8699aee.
2024-12-18 09:38:06 +01:00
46ca369f57 Revert "ARM64: dts/s5e8825: drop more reserved memory"
This reverts commit 9632f64cde.
2024-12-18 09:37:47 +01:00
47e32f67d1 Revert "configs: disable some unnecessary DSS stuff"
This reverts commit b8c6547f74.
2024-12-18 00:32:45 +01:00
30bd5b7761 Revert "ARM64: dts/s5e8825: disable more unused stuff"
This reverts commit 171825cf94.
2024-12-18 00:32:35 +01:00
46ff115435 Revert "mm: Add likelihood labels to quiet_vmstat conditions"
This reverts commit 39410cb2f3.
2024-12-18 00:32:31 +01:00
d03ac752be Revert "sched: core: Minimize number of tasks to load balance"
This reverts commit e983caf81d.
2024-12-18 00:32:20 +01:00
f37c1e4577 Revert "smp: Use migrate disable/enable in smp_call_function_single_async()"
This reverts commit b6764b2064.
2024-12-18 00:32:11 +01:00