kernel_samsung_a53x

Author	SHA1	Message	Date
Sultan Alsawaf	73ad55b1e6	sched/rt: Change default SCHED_RR timeslice from 100 ms to 1 jiffy For us, it's most helpful to have the round-robin timeslice as low as is allowed by the scheduler to reduce latency. Since it's limited by the scheduler tick rate, just set the default to 1 jiffy, which is the lowest possible value. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: I6c9f6bb5bbadf363efb719d3e30b0b073654d688	2024-12-18 15:03:27 +01:00
Vincent Guittot	0f92567070	sched/fair: Make sure to try to detach at least one movable task During load balance, we try at most env->loop_max time to move a task. But it can happen that the loop_max LRU tasks (ie tail of the cfs_tasks list) can't be moved to dst_cpu because of affinity. In this case, loop in the list until we found at least one. The maximum of detached tasks remained the same as before. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220825122726.20819-2-vincent.guittot@linaro.org	2024-12-18 15:03:21 +01:00
Miguel de Dios	8b98e08a04	sched: reduce softirq conflicts with RT This is a forward port of pa/890483 with modifications from the original patch due to changes in sched/softirq.c which applies the same logic. We're finding audio glitches caused by audio-producing RT tasks that are either interrupted to handle softirq's or that are scheduled onto cpu's that are handling softirq's. In a previous patch, we attempted to catch many cases of the latter problem, but it's clear that we are still losing significant numbers of races in some apps. This patch attempts to address the following problem:: It attempts to reduce the most common windows in which we lose the race between scheduling an RT task on a remote core and starting to handle softirq's on that core. We still lose some races, but we lose significantly fewer. (And we don't want to introduce any heavyweight forms of synchronization on these paths.) Bug: 64912585 Bug: 136771796 Bug: 144961676 Change-Id: Ida89a903be0f1965552dd0e84e67ef1d3158c7d8 Signed-off-by: Miguel de Dios <migueldedios@google.com> Signed-off-by: Jimmy Shiu <jimmyshiu@google.com> (cherry picked from commit 3c2569f21be6b7c457b389821b098d30bd03b84c) (cherry picked from commit f9cf84966a315169f3f505993b9e15b6c53f2bbb) (cherry picked from commit e6edaa29efee9e2f2451ab50030d8cde19ab227b) (cherry picked from commit db52e23658eaf304ba2333f04d910313c0c21e05) (cherry picked from commit aa2d4d07929d1c06c6b99ed8718394cf8e471d21) (cherry picked from commit cefdb96edccdc3f788275d1bcef89c8dfeeb8b11) (cherry picked from commit 5a45a0208069ac5704713b35a0858fcd5997f97a) (cherry picked from commit 86ac137eb7399547c6cd20d60063d244091e920f)	2024-12-18 15:03:18 +01:00
Samuel Pascua	04ff6cbc91	drivers: soc: acpm: Prevent optimization of 'acpm_initdata' More optimization issues when compiling with Clang. Panics happen when the device goes into standby with the following report. <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM:: MIF down. cur_count: 5, acc_count: 5 <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM:: MIF_UP history: <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x540000, time: 5:35:40, latency: 1955[usec] <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x400000, time: 5:35:40, latency: 1956[usec] <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x100000, time: 5:35:41, latency: 1954[usec] <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x400000, time: 5:35:41, latency: 1955[usec] <6>[ 1470.900859] [0: Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x100000, time: 5:35:41, latency: 1955[usec] <0>[ 1470.900859] [0: Binder:4157_2: 8735] Unable to handle kernel paging request at virtual address ffffff800b346f9c <2>[ 1470.900859] [0: Binder:4157_2: 8735] sec_debug_set_extra_info_fault = KERN / 0xffffff800b346f9c <1>[ 1470.900859] [0: Binder:4157_2: 8735] Mem abort info: <1>[ 1470.900859] [0: Binder:4157_2: 8735] Exception class = DABT (current EL), IL = 32 bits <1>[ 1470.900859] [0: Binder:4157_2: 8735] SET = 0, FnV = 0 <1>[ 1470.900859] [0: Binder:4157_2: 8735] EA = 0, S1PTW = 0 <1>[ 1470.900859] [0: Binder:4157_2: 8735] Data abort info: <1>[ 1470.900859] [0: Binder:4157_2: 8735] ISV = 0, ISS = 0x00000061 <1>[ 1470.900859] [0: Binder:4157_2: 8735] CM = 0, WnR = 1 <1>[ 1470.900859] [0: Binder:4157_2: 8735] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff800a66a000 <1>[ 1470.900859] [0: Binder:4157_2: 8735] [ffffff800b346f9c] pgd=000000097cdfe003, pud=000000097cdfe003, pmd=00000009740b7003, pte=00e800000203f707 <0>[ 1470.900859] [0: Binder:4157_2: 8735] Internal error: Oops: 96000061 [#1] PREEMPT SMP <4>[ 1470.900859] [0: Binder:4157_2: 8735] Modules linked in: <0>[ 1470.900859] [0: Binder:4157_2: 8735] Process Binder:4157_2 (pid: 8735, stack limit = 0xffffff8039708000) <0>[ 1470.900859] [0: Binder:4157_2: 8735] debug-snapshot: core register saved(CPU:0) <0>[ 1470.900859] [0: Binder:4157_2: 8735] L2ECTLR_EL1: 0000000000000007 <0>[ 1470.900859] [0: Binder:4157_2: 8735] L2ECTLR_EL1 valid_bit(30) is NOT set (0x0) <0>[ 1470.900859] [0: Binder:4157_2: 8735] CPUMERRSR: 0000000008000001, L2MERRSR: 0000000010200c00 <0>[ 1470.900859] [0: Binder:4157_2: 8735] CPUMERRSR valid_bit(31) is NOT set (0x0) <0>[ 1470.900859] [0: Binder:4157_2: 8735] L2MERRSR valid_bit(31) is NOT set (0x0) <0>[ 1470.900859] [0: Binder:4157_2: 8735] debug-snapshot: context saved(CPU:0) <6>[ 1470.900859] [0: Binder:4157_2: 8735] debug-snapshot: item - log_kevents is disabled <6>[ 1470.900859] [0: Binder:4157_2: 8735] TIF_FOREIGN_FPSTATE: 1, FP/SIMD depth 0, cpu: 0 <4>[ 1470.900859] [0: Binder:4157_2: 8735] CPU: 0 PID: 8735 Comm: Binder:4157_2 Not tainted 4.14.113 - Fresh Core-user #1 <4>[ 1470.900859] [0: Binder:4157_2: 8735] Hardware name: Samsung A50 LTN OPEN rev04 board based on Exynos9610 (DT) <4>[ 1470.900859] [0: Binder:4157_2: 8735] task: ffffffc0466d6000 task.stack: ffffff8039708000 <4>[ 1470.900859] [0: Binder:4157_2: 8735] PC is at acpm_get_inform+0x90/0x100 <4>[ 1470.900859] [0: Binder:4157_2: 8735] LR is at acpm_get_inform+0x7c/0x100 <4>[ 1470.900859] [0: Binder:4157_2: 8735] pc : [<ffffff8008505cd4>] lr : [<ffffff8008505cc0>] pstate: 604001c5 <4>[ 1470.900859] [0: Binder:4157_2: 8735] sp : ffffff803970bac0 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x29: ffffff803970bac0 x28: ffffffc0466d6000 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x27: ffffff8008e44b64 x26: ffffff8008e44b3e <4>[ 1470.900859] [0: Binder:4157_2: 8735] x25: ffffff8009e5f210 x24: 0000000010624dd3 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x23: 0000000000000029 x22: 0000000000000018 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x21: ffffff8009e2c000 x20: ffffff8008ef6785 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x19: ffffff8008ef674c x18: 00000000000000a0 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x17: ffffff8009b3023c x16: 0000000000000001 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x15: ffffff8008c8a964 x14: 202c303030303031 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x13: 7830203a72657375 x12: 0000000000000000 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x11: 0000000000000000 x10: ffffffffffffffff <4>[ 1470.900859] [0: Binder:4157_2: 8735] x9 : ffffff800b346f00 x8 : ffffff800b346f00 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x7 : 203a79636e657461 x6 : ffffff80f615273c <4>[ 1470.900859] [0: Binder:4157_2: 8735] x5 : 000000000000221f x4 : 000000000000000c <4>[ 1470.900859] [0: Binder:4157_2: 8735] x3 : 000000000000000a x2 : 0000000000000000 <4>[ 1470.900859] [0: Binder:4157_2: 8735] x1 : 00000000000001c0 x0 : 0000000000000041 Similar solution as d855e6f. Make the structs volatile to prevent optimization. Signed-off-by: John Vincent <git@tensevntysevn.cf> Signed-off-by: Samuel Pascua <sgpascua@ngcp.ph>	2024-12-18 15:02:50 +01:00
xxmustafacooTR	5843cd8543	drivers: soc: samsung: acpm: disable lto	2024-12-18 15:02:45 +01:00
xxmustafacooTR	9ca91577b1	fvmap: optimize voltages	2024-12-18 15:02:40 +01:00
Redick Lin	70c79241d2	soc: samsung: acpm: extend the timeout for acpm ipc retry Extend it from 15ms to 200ms Bug: 172883429 Change-Id: I39e8e860dfeaa4d1d3b702f06dca51dd01bc8367 Signed-off-by: Redick Lin <redicklin@google.com>	2024-12-18 15:02:36 +01:00
Nahuel Gómez	a13ddad5af	vboot_dlkm: drop duplicate sm5451 module Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 15:02:19 +01:00
Nahuel Gómez	375f7e1318	ARM64: dts/s5e8825: GPU undervolt to 790mV This is a base value only though, I'm not sure if it actually works. Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 15:01:20 +01:00
Zhaoyang Huang	f11d790f3a	psi: fix possible trigger missing in the window When a new threshold breaching stall happens after a psi event was generated and within the window duration, the new event is not generated because the events are rate-limited to one per window. If after that no new stall is recorded then the event will not be generated even after rate-limiting duration has passed. This is happening because with no new stall, window_update will not be called even though threshold was previously breached. To fix this, record threshold breaching occurrence and generate the event once window duration is passed. Suggested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/1643093818-19835-1-git-send-email-huangzhaoyang@gmail.com (cherry picked from commit e6df4ead85d9da1b07dd40bd4c6d2182f3e210c4) (cherry picked from commit 007db79713251fd67cc5ce7142942772f6074aca) (cherry picked from commit 129c4324ce9ea8be9c528629bca72e1e1f80d3fb)	2024-12-18 15:01:14 +01:00
Hailong Liu	e48c85b067	psi: Fix trigger being fired unexpectedly at initial When a trigger being created, its win.start_value and win.start_time are reset to zero. If group->total[PSI_POLL][t->state] has accumulated before, this trigger will be fired unexpectedly in the next period, even if its growth time does not reach its threshold. So set the window of the new trigger to the current state value. Signed-off-by: Hailong Liu <liuhailong@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/1648789811-3788971-1-git-send-email-liuhailong@linux.alibaba.com (cherry picked from commit 915a087e4c47334a2f7ba2a4092c4bade0873769) (cherry picked from commit 168af79cd0d6750ddf6c775c79833059da606c34) (cherry picked from commit 9b3b0d7ba99e41a1159a0b9c58ff138bb51d52a1)	2024-12-18 15:01:10 +01:00
Chengming Zhou	09558667e8	sched/psi: report zeroes for CPU full at the system level Martin find it confusing when look at the /proc/pressure/cpu output, and found no hint about that CPU "full" line in psi Documentation. % cat /proc/pressure/cpu some avg10=0.92 avg60=0.91 avg300=0.73 total=933490489 full avg10=0.22 avg60=0.23 avg300=0.16 total=358783277 The PSI_CPU_FULL state is introduced by commit e7fcd7622823 ("psi: Add PSI_CPU_FULL state"), which mainly for cgroup level, but also counted at the system level as a side effect. Naturally, the FULL state doesn't exist for the CPU resource at the system level. These "full" numbers can come from CPU idle schedule latency. For example, t1 is the time when task wakeup on an idle CPU, t2 is the time when CPU pick and switch to it. The delta of (t2 - t1) will be in CPU_FULL state. Another case all processes can be stalled is when all cgroups have been throttled at the same time, which unlikely to happen. Anyway, CPU_FULL metric is meaningless and confusing at the system level. So this patch will report zeroes for CPU full at the system level, and update psi Documentation accordingly. Fixes: e7fcd7622823 ("psi: Add PSI_CPU_FULL state") Reported-by: Martin Steigerwald <Martin.Steigerwald@proact.de> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220408121914.82855-1-zhouchengming@bytedance.com (cherry picked from commit 890d550d7dbac7a31ecaa78732aa22be282bb6b8) (cherry picked from commit f5187de2b75d019739caec97e8e7886a27e8554c) (cherry picked from commit 669718aaede6df28e37f6a11c0e257d18f050b1a)	2024-12-18 15:01:07 +01:00
John Stultz	91839ef7d5	ANDROID: sched: Fix off-by-one with cpupri MAX_RT_PRIO evaluation This patch addresses an issue seen where SCHED_FIFO prio 99 tasks were being woken up on a cpu where a long-running softirq was executing, and the RT task was not being migrated, causing long (10+ms wakeup latencies). Prior to upstream commit 934fc3314b39 ("sched/cpupri: Remap CPUPRI_NORMAL to MAX_RT_PRIO-1") the task->prio -> cpupri mapping is a little ackward. For RT tasks, its calculated as: cpupri = MAX_RT_PRIO - prio + 1; See: https://android.googlesource.com/kernel/common/+/refs/heads/android13-5.10/kernel/sched/cpupri.c#39 This is added ontop of the also ackward detail that the task->prio is inverted (RT prio99 -> 0), means the cpupri mapping for RT tasks goes from 2->101. This makes it easy to evaluate the cpupri incorrectly. Which it turns out happened In commit 3adfd8e344ac ("ANDROID: sched: avoid placing RT threads on cores handling softirqs"): `3adfd8e344`%5E%21/ With the lines: int task_pri = convert_prio(p->prio); bool drop_nopreempts = task_pri <= MAX_RT_PRIO; Where the added logic to decide to migrate a rt task off of a cpu depended on this drop_nopreempts being true. This works properly for rt tasks from prio 99 to 1, but for the case of task->prio == 0 (userland rt prio 99 tasks) it breaks, as the cpupri will be MAX_RT_PRIO - 0 + 1, which then gets checked as <= MAX_RT_PRIO. This prevents the cpu from being dropped from the scheduling set and prevents the rt user prio 99 task from migrating, which results in high priority rt tasks being left on cpus where long running softirqs are executing, causing long latencies. This patch fixes the off by one by changing the evaulation to MAX_RT_PRIO + 1, so that user-prio 99 tasks will also be migrated if appropriate. Luckilly this odd cpupri mapping has been fixed upstream, making this patch no longer necessary in 5.15 and newer kernels. Fixes: 3adfd8e344ac ("ANDROID: sched: avoid placing RT threads on cores handling softirqs") Signed-off-by: John Stultz <jstultz@google.com> Change-Id: Ia2db7cd461eb4c90f5850b791de1ae95582f7530 (cherry picked from commit bec3b2f002cd13d44e32710315de91e636ecef84) (cherry picked from commit e8e441d4a2f6b6a316d229f8ff1ebe0f40099817) (cherry picked from commit 1c1f38f1e1fa82e09e0e3f6f941ca81127682ec3)	2024-12-18 15:01:04 +01:00
Hui Su	f93cf8f6a1	sched: Use task_current() instead of 'rq->curr == p' Use the task_current() function where appropriate. No functional change. Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030173223.GA52339@rlk	2024-12-18 15:00:48 +01:00
Ksawlii	b6b17fef92	Revert "defconfig: a53x_defconfig: Nuke KPROBE" This reverts commit `d30b19e276`.	2024-12-18 14:59:29 +01:00
Ksawlii	d30b19e276	defconfig: a53x_defconfig: Nuke KPROBE	2024-12-18 14:14:56 +01:00
Ksawlii	d1a6ca7818	defconfigs: a53x*: Regenerated with newer clang (19) and Linux 5.10.231	2024-12-18 12:27:51 +01:00
Josh Don	bb74b7f456	sched: Allow newidle balancing to bail out of load_balance While doing newidle load balancing, it is possible for new tasks to arrive, such as with pending wakeups. newidle_balance() already accounts for this by exiting the sched_domain load_balance() iteration if it detects these cases. This is very important for minimizing wakeup latency. However, if we are already in load_balance(), we may stay there for a while before returning back to newidle_balance(). This is most exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A very straightforward workaround to this is to adjust should_we_balance() to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are detected. This was tested with the following reproduction: - two threads that take turns sleeping and waking each other up are affined to two cores - a large number of threads with 100% utilization are pinned to all other cores Without this patch, wakeup latency was ~120us for the pair of threads, almost entirely spent in load_balance(). With this patch, wakeup latency is ~6us. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220609025515.2086253-1-joshdon@google.com	2024-12-18 12:21:46 +01:00
Vincent Guittot	180d869ae6	sched/pelt: Relax the sync of load_sum with load_avg Similarly to util_avg and util_sum, don't sync load_sum with the low bound of load_avg but only ensure that load_sum stays in the correct range. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.org	2024-12-18 12:21:43 +01:00
Vincent Guittot	9340880407	sched/pelt: Relax the sync of runnable_sum with runnable_avg Similarly to util_avg and util_sum, don't sync runnable_sum with the low bound of runnable_avg but only ensure that runnable_sum stays in the correct range. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.org	2024-12-18 12:21:39 +01:00
Vincent Guittot	90b4b30575	sched/pelt: Continue to relax the sync of util_sum with util_avg Rick reported performance regressions in bugzilla because of cpu frequency being lower than before: https://bugzilla.kernel.org/show_bug.cgi?id=215045 He bisected the problem to: commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent") This commit forces util_sum to be synced with the new util_avg after removing the contribution of a task and before the next periodic sync. By doing so util_sum is rounded to its lower bound and might lost up to LOAD_AVG_MAX-1 of accumulated contribution which has not yet been reflected in util_avg. update_tg_cfs_util() is not the only place where we round util_sum and lost some accumulated contributions that are not already reflected in util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not sync util_sum with the new util_avg. Instead of always setting util_sum to the low bound of util_avg, which can significantly lower the utilization, we propagate the difference. In addition, we also check that cfs's util_sum always stays above the lower bound for a given util_avg as it has been observed that sched_entity's util_sum is sometimes above cfs one. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.org	2024-12-18 12:21:35 +01:00
Mathias Krause	b59eaa682e	sched/fair: Prevent dead task groups from regaining cfs_rq's Kevin is reporting crashes which point to a use-after-free of a cfs_rq in update_blocked_averages(). Initial debugging revealed that we've live cfs_rq's (on_list=1) in an about to be kfree()'d task group in free_fair_sched_group(). However, it was unclear how that can happen. His kernel config happened to lead to a layout of struct sched_entity that put the 'my_q' member directly into the middle of the object which makes it incidentally overlap with SLUB's freelist pointer. That, in combination with SLAB_FREELIST_HARDENED's freelist pointer mangling, leads to a reliable access violation in form of a #GP which made the UAF fail fast. Michal seems to have run into the same issue[1]. He already correctly diagnosed that commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") is causing the preconditions for the UAF to happen by re-adding cfs_rq's also to task groups that have no more running tasks, i.e. also to dead ones. His analysis, however, misses the real root cause and it cannot be seen from the crash backtrace only, as the real offender is tg_unthrottle_up() getting called via sched_cfs_period_timer() via the timer interrupt at an inconvenient time. When unregister_fair_sched_group() unlinks all cfs_rq's from the dying task group, it doesn't protect itself from getting interrupted. If the timer interrupt triggers while we iterate over all CPUs or after unregister_fair_sched_group() has finished but prior to unlinking the task group, sched_cfs_period_timer() will execute and walk the list of task groups, trying to unthrottle cfs_rq's, i.e. re-add them to the dying task group. These will later -- in free_fair_sched_group() -- be kfree()'ed while still being linked, leading to the fireworks Kevin and Michal are seeing. To fix this race, ensure the dying task group gets unlinked first. However, simply switching the order of unregistering and unlinking the task group isn't sufficient, as concurrent RCU walkers might still see it, as can be seen below: CPU1: CPU2: : timer IRQ: : do_sched_cfs_period_timer(): : : : distribute_cfs_runtime(): : rcu_read_lock(); : : : unthrottle_cfs_rq(): sched_offline_group(): : : walk_tg_tree_from(…,tg_unthrottle_up,…): list_del_rcu(&tg->list); : (1) : list_for_each_entry_rcu(child, &parent->children, siblings) : : (2) list_del_rcu(&tg->siblings); : : tg_unthrottle_up(): unregister_fair_sched_group(): struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)]; : : list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); : : : : if (!cfs_rq_is_decayed(cfs_rq) \|\| cfs_rq->nr_running) (3) : list_add_leaf_cfs_rq(cfs_rq); : : : : : : : : : : (4) : rcu_read_unlock(); CPU 2 walks the task group list in parallel to sched_offline_group(), specifically, it'll read the soon to be unlinked task group entry at (1). Unlinking it on CPU 1 at (2) therefore won't prevent CPU 2 from still passing it on to tg_unthrottle_up(). CPU 1 now tries to unlink all cfs_rq's via list_del_leaf_cfs_rq() in unregister_fair_sched_group(). Meanwhile CPU 2 will re-add some of these at (3), which is the cause of the UAF later on. To prevent this additional race from happening, we need to wait until walk_tg_tree_from() has finished traversing the task groups, i.e. after the RCU read critical section ends in (4). Afterwards we're safe to call unregister_fair_sched_group(), as each new walk won't see the dying task group any more. On top of that, we need to wait yet another RCU grace period after unregister_fair_sched_group() to ensure print_cfs_stats(), which might run concurrently, always sees valid objects, i.e. not already free'd ones. This patch survives Michal's reproducer[2] for 8h+ now, which used to trigger within minutes before. [1] https://lore.kernel.org/lkml/20211011172236.11223-1-mkoutny@suse.com/ [2] https://lore.kernel.org/lkml/20211102160228.GA57072@blackbody.suse.cz/ Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") [peterz: shuffle code around a bit] Reported-by: Kevin Tanguy <kevin.tanguy@corp.ovh.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2024-12-18 12:21:30 +01:00
Vincent Guittot	cd10658059	sched/fair: Remove sysctl_sched_migration_cost condition With a default value of 500us, sysctl_sched_migration_cost is significanlty higher than the cost of load_balance. Remove the condition and rely on the sd->max_newidle_lb_cost to abort newidle_balance. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20211019123537.17146-5-vincent.guittot@linaro.org	2024-12-18 12:21:25 +01:00
Vincent Guittot	3dc3531c78	sched/fair: Wait before decaying max_newidle_lb_cost Decay max_newidle_lb_cost only when it has not been updated for a while and ensure to not decay a recently changed value. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20211019123537.17146-4-vincent.guittot@linaro.org	2024-12-18 12:21:16 +01:00
Vincent Guittot	a8fb0d2a01	sched/fair: Skip update_blocked_averages if we are defering load balance In newidle_balance(), the scheduler skips load balance to the new idle cpu when the 1st sd of this_rq is: this_rq->avg_idle < sd->max_newidle_lb_cost Doing a costly call to update_blocked_averages() will not be useful and simply adds overhead when this condition is true. Check the condition early in newidle_balance() to skip update_blocked_averages() when possible. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20211019123537.17146-3-vincent.guittot@linaro.org	2024-12-18 12:21:09 +01:00
Vincent Guittot	1840166c64	sched/fair: Account update_blocked_averages in newidle_balance cost The time spent to update the blocked load can be significant depending of the complexity fo the cgroup hierarchy. Take this time into account in the cost of the 1st load balance of a newly idle cpu. Also reduce the number of call to sched_clock_cpu() and track more actual work. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20211019123537.17146-2-vincent.guittot@linaro.org	2024-12-18 12:21:05 +01:00
Vincent Guittot	6c5c1812b2	sched/fair: Sync load_sum with load_avg after dequeue commit 9e077b52d86a ("sched/pelt: Check that _avg are null when _sum are") reported some inconsitencies between _avg and _sum. commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent") fixed some but one remains when dequeuing load. sync the cfs's load_sum with its load_avg after dequeuing the load of a sched_entity. Fixes: 9e077b52d86a ("sched/pelt: Check that _avg are null when _sum are") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Odin Ugedal <odin@uged.al> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20210701171837.32156-1-vincent.guittot@linaro.org	2024-12-18 12:20:59 +01:00
Rik van Riel	02d6e7246b	sched/fair: Ensure that the CFS parent is added after unthrottling Ensure that a CFS parent will be in the list whenever one of its children is also in the list. A warning on rq->tmp_alone_branch != &rq->leaf_cfs_rq_list has been reported while running LTP test cfs_bandwidth01. Odin Ugedal found the root cause: $ tree /sys/fs/cgroup/ltp/ -d --charset=ascii /sys/fs/cgroup/ltp/ \|-- drain `-- test-6851 `-- level2 \|-- level3a \| \|-- worker1 \| `-- worker2 `-- level3b `-- worker3 Timeline (ish): - worker3 gets throttled - level3b is decayed, since it has no more load - level2 get throttled - worker3 get unthrottled - level2 get unthrottled - worker3 is added to list - level3b is not added to list, since nr_running==0 and is decayed [ Vincent Guittot: Rebased and updated to fix for the reported warning. ] Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Acked-by: Odin Ugedal <odin@uged.al> Link: https://lore.kernel.org/r/20210621174330.11258-1-vincent.guittot@linaro.org	2024-12-18 12:20:53 +01:00
Dietmar Eggemann	7552240c84	sched/fair: Return early from update_tg_cfs_load() if delta == 0 In case the _avg delta is 0 there is no need to update se's _avg (level n) nor cfs_rq's _avg (level n-1). These values stay the same. Since cfs_rq's _avg isn't changed, i.e. no load is propagated down, cfs_rq's _sum should stay the same as well. So bail out after se's _sum has been updated. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20210601083616.804229-1-dietmar.eggemann@arm.com	2024-12-18 12:20:46 +01:00
Odin Ugedal	e6ce990033	sched/fair: Correctly insert cfs_rq's to list on unthrottle Fix an issue where fairness is decreased since cfs_rq's can end up not being decayed properly. For two sibling control groups with the same priority, this can often lead to a load ratio of 99/1 (!!). This happens because when a cfs_rq is throttled, all the descendant cfs_rq's will be removed from the leaf list. When they initial cfs_rq is unthrottled, it will currently only re add descendant cfs_rq's if they have one or more entities enqueued. This is not a perfect heuristic. Instead, we insert all cfs_rq's that contain one or more enqueued entities, or it its load is not completely decayed. Can often lead to situations like this for equally weighted control groups: $ ps u -C stress USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1 root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1 Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()") [vingo: !SMP build fix] Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al	2024-12-18 12:20:43 +01:00
Vincent Guittot	4dcff238b1	sched/pelt: Ensure that _sum is always synced with _avg Rounding in PELT calculation happening when entities are attached/detached of a cfs_rq can result into situations where util/runnable_avg is not null but util/runnable_sum is. This is normally not possible so we need to ensure that util/runnable_sum stays synced with util/runnable_avg. detach_entity_load_avg() is the last place where we don't sync util/runnable_sum with util/runnbale_avg when moving some sched_entities Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210601085832.12626-1-vincent.guittot@linaro.org	2024-12-18 12:20:38 +01:00
Vincent Guittot	f2b5f774cd	sched/fair: Keep load_avg and load_sum synced when removing a cfs_rq from the list we only check _sum value so we must ensure that _avg and _sum stay synced so load_sum can't be null whereas load_avg is not after propagating load in the cgroup hierarchy. Use load_avg to compute load_sum similarly to what is done for util_sum and runnable_sum. Fixes: 0e2d2aaaae52 ("sched/fair: Rewrite PELT migration propagation") Reported-by: Odin Ugedal <odin@uged.al> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Odin Ugedal <odin@uged.al> Link: https://lkml.kernel.org/r/20210527122916.27683-2-vincent.guittot@linaro.org	2024-12-18 12:20:35 +01:00
Rik van Riel	5f7d83f746	sched,fair: Skip newidle_balance if a wakeup is pending The try_to_wake_up function has an optimization where it can queue a task for wakeup on its previous CPU, if the task is still in the middle of going to sleep inside schedule(). Once schedule() re-enables IRQs, the task will be woken up with an IPI, and placed back on the runqueue. If we have such a wakeup pending, there is no need to search other CPUs for runnable tasks. Just skip (or bail out early from) newidle balancing, and run the just woken up task. For a memcache like workload test, this reduces total CPU use by about 2%, proportionally split between user and system time, and p99 and p95 application response time by 10% on average. The schedstats run_delay number shows a similar improvement. Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20210422130236.0bb353df@imladris.surriel.com	2024-12-18 12:20:32 +01:00
Valentin Schneider	2786c1d912	sched/fair: Introduce a CPU capacity comparison helper During load-balance, groups classified as group_misfit_task are filtered out if they do not pass group_smaller_max_cpu_capacity(<candidate group>, <local group>); which itself employs fits_capacity() to compare the sgc->max_capacity of both groups. Due to the underlying margin, fits_capacity(X, 1024) will return false for any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are {261, 871, 1024}. If a CPU-bound task ends up on one of those "medium" CPUs, misfit migration will never intentionally upmigrate it to a CPU of higher capacity due to the aforementioned margin. One may argue the 20% margin of fits_capacity() is excessive in the advent of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here is that fits_capacity() is meant to compare a utilization value to a capacity value, whereas here it is being used to compare two capacity values. As CPU capacity and task utilization have different dynamics, a sensible approach here would be to add a new helper dedicated to comparing CPU capacities. Also note that comparing capacity extrema of local and source sched_group's doesn't make much sense when at the day of the day the imbalance will be pulled by a known env->dst_cpu, whose capacity can be anywhere within the local group's capacity extrema. While at it, replace group_smaller_{min, max}_cpu_capacity() with comparisons of the source group's min/max capacity and the destination CPU's capacity. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Link: https://lkml.kernel.org/r/20210407220628.3798191-4-valentin.schneider@arm.com	2024-12-18 12:20:29 +01:00
Valentin Schneider	69f4b0d9cd	sched/fair: Clean up active balance nr_balance_failed trickery When triggering an active load balance, sd->nr_balance_failed is set to such a value that any further can_migrate_task() using said sd will ignore the output of task_hot(). This behaviour makes sense, as active load balance intentionally preempts a rq's running task to migrate it right away, but this asynchronous write is a bit shoddy, as the stopper thread might run active_load_balance_cpu_stop before the sd->nr_balance_failed write either becomes visible to the stopper's CPU or even happens on the CPU that appended the stopper work. Add a struct lb_env flag to denote active balancing, and use it in can_migrate_task(). Remove the sd->nr_balance_failed write that served the same purpose. Cleanup the LBF_DST_PINNED active balance special case. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210407220628.3798191-3-valentin.schneider@arm.com	2024-12-18 12:20:26 +01:00
Aubrey Li	06dc4d7881	sched/fair: Reduce long-tail newly idle balance cost A long-tail load balance cost is observed on the newly idle path, this is caused by a race window between the first nr_running check of the busiest runqueue and its nr_running recheck in detach_tasks. Before the busiest runqueue is locked, the tasks on the busiest runqueue could be pulled by other CPUs and nr_running of the busiest runqueu becomes 1 or even 0 if the running task becomes idle, this causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers load_balance redo at the same sched_domain level. In order to find the new busiest sched_group and CPU, load balance will recompute and update the various load statistics, which eventually leads to the long-tail load balance cost. This patch clears LBF_ALL_PINNED flag for this race condition, and hence reduces the long-tail cost of newly idle balance. Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/1614154549-116078-1-git-send-email-aubrey.li@intel.com	2024-12-18 12:20:21 +01:00
Barry Song	4d6c2b17f9	sched/fair: Optimize test_idle_cores() for !SMT update_idle_core() is only done for the case of sched_smt_present. but test_idle_cores() is done for all machines even those without SMT. This can contribute to up 8%+ hackbench performance loss on a machine like kunpeng 920 which has no SMT. This patch removes the redundant test_idle_cores() for !SMT machines. Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get an average. $ numactl -N 0 hackbench -p -T -l 20000 -g $1 The below is the result of hackbench w/ and w/o this patch: g= 2 4 6 8 10 12 14 w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929 w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367 +4.1% +8.3% +7.3% +6.3% Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com	2024-12-18 12:20:16 +01:00
Vincent Guittot	2cff544d66	sched/fair: Reorder newidle_balance pulled_task tests Reorder the tests and skip useless ones when no load balance has been performed and rq lock has not been released. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224133007.28644-6-vincent.guittot@linaro.org	2024-12-18 12:20:06 +01:00
Nahuel Gómez	27515e820a	battery: sm5451_charger: fix build on 5.10 debugfs_create_x32 is a void. Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 12:15:56 +01:00
Nahuel Gómez	7fb3935edb	battery: import sm5451_charger driver from F926B Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 12:15:51 +01:00
Nahuel Gómez	cb6a5e60da	battery: nuke sm5451_charger driver from a53x Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 12:15:46 +01:00
Nahuel Gómez	633af00caf	configs: drop HICCUP_CC_DISABLE Match a33x defconfig Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-12-18 12:15:20 +01:00
Ksawlii	1bcc615dc7	Reapply "mfc: Import IS_UHD_RES definition" This reverts commit `d9434755e0`.	2024-12-18 11:46:00 +01:00
Ksawlii	53464baf85	Reapply "configs: kill SCHEDSTATS and SCHED_DEBUG" This reverts commit `326c808f5c`.	2024-12-18 11:25:40 +01:00
Ksawlii	f656b91682	Reapply "configs: drop KZEROD" This reverts commit `0cb16ca2c7`.	2024-12-18 11:25:29 +01:00
Ksawlii	20898b7053	Reapply "mm: vmstat: use power efficient workingqueues" This reverts commit `dcc7a0ac59`.	2024-12-18 11:25:22 +01:00
Ksawlii	e4f1c291cf	Reapply "PM / freezer: Reduce freeze timeout to 1 second for Android" This reverts commit `cac147d1b1`.	2024-12-18 11:25:17 +01:00
Ksawlii	427697671c	Reapply "mfc: Reduce QoS boosting from Samsung hacks" This reverts commit `e1b24976b4`.	2024-12-18 11:25:12 +01:00
Ksawlii	c88287ceb0	Reapply "sysctl: promote several nodes out of CONFIG_SCHED_DEBUG" This reverts commit `ea96a0db96`.	2024-12-18 11:25:07 +01:00
Ksawlii	4704b76edc	Reapply "blkdev: set max nr_requests to 64" This reverts commit `07889cad44`.	2024-12-18 11:24:59 +01:00

1 2 3 4 5 ...

7211 commits