Commit graph

341 commits

Author SHA1 Message Date
c09929ddba Revert "kernel: panic.c: Set panic_timeoutu = 0; and panic_on_oops = 0;"
This reverts commit 5eecb1807a.
2024-12-13 19:39:25 +01:00
5eecb1807a kernel: panic.c: Set panic_timeoutu = 0; and panic_on_oops = 0;
Before setting sysctl -w kernel.panic=0
and sysctl -w kernel.panic_on_oops=0 (using adb shell) i had random crashes
2024-12-08 00:01:12 +01:00
Barry Song
b949395346 sched/fair: Optimize test_idle_cores() for !SMT
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.

This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.

Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.

  $ numactl -N 0 hackbench -p -T -l 20000 -g $1

The below is the result of hackbench w/ and w/o this patch:

  g=    2      4     6       8      10     12      14
  w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
  w/ : 1.8428 3.7436 5.4501 6.9522 8.2882  9.9535 11.3367
			    +4.1%  +8.3%  +7.3%   +6.3%

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Change-Id: I0dd9363d2b8da9dda0bed205a5ddc36f75fabeef
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
(cherry picked from commit 7c201829c9c1e1ebb1384de66e02b8249d83167e)
Signed-off-by: TogoFire <togofire@mailfence.com>
Signed-off-by: onettboots <blackcocopet@gmail.com>
2024-12-03 21:50:09 +01:00
Exynos-nibba
6b31cca191 gfs: disable for better ui perf 2024-12-03 21:49:35 +01:00
Peter Zijlstra
88fde1823a sched,rt: Use cpumask_any*_distribute()
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org
Signed-off-by: John Vincent <git@tenseventyseven.cf>
2024-12-03 21:47:07 +01:00
Sebastian Andrzej Siewior
98e696a272 sched/rt: Don't try push tasks if there are none.
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.

tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.

I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.

Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.

There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.

Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2024-12-03 21:45:38 +01:00
Park Ju Hyung
51cd5d4626 trace: add CONFIG_DISABLE_TRACE_PRINTK option
Poorly made kernel trees often use trace_printk() without
properly guarding them in a #ifdef macro.
Such usage of trace_printk() causes a warning at
boot and additional memory allocation.

This option serves to disable those all at once with ease.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
(cherry picked from commit 9ec68f89188e461721de418545e31f37800dfa02)
(cherry picked from commit 8fb7e59ccd6cda94e29af9e6e38a96eda458c9da)
(cherry picked from commit 515ff4ab9e2428b642fcd158af94c83e3059b33b)
(cherry picked from commit ac9a6d9d6a744a11c49e9d756bd0229c912b773d)
(cherry picked from commit e4f4c2c3e696ab4135e5e63b2acae6400476ac35)
Signed-off-by: SirRGB <sirrgb@proton.me>
2024-12-03 21:45:24 +01:00
Joel Gómez
613ee071be sysctl: read-only dirty*_bytes
Signed-off-by: Joel Gómez <thegame455min@gmail.com>
2024-12-03 21:44:27 +01:00
Sultan Alsawaf
d30b7253ed qos: Don't allow userspace to impose restrictions on CPU idle levels
Giving userspace intimate control over CPU latency requirements is
nonsense. Userspace can't even stop itself from being preempted, so
there's no reason for it to have access to a mechanism primarily used to
eliminate CPU delays on the order of microseconds.

Remove userspace's ability to send pm_qos requests so that it can't hurt
power consumption.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-03 21:43:46 +01:00
John Vincent
fff787f527 sched: fair: Reduce runtime allocated for tasks constrained by CFS bandwidth
A bunch of kernels for desktop Linux have been reducing this value to improve interactivity. From Zen[1] to CachyOS[2]. There have been attempts to reduce it on Android as well.

Experiment with reducing the CFS bandwidth slice to 4 msec, 1 less from the default. This is something I honestly don't want userspace to touch so keep it out from sysfs and modify it from the kernel directly instead. I honestly think that the 'interactivity' benefits (if it does hold water) of this change should be reflected on all performance modes on FreshROMs.

Test for performance and battery life.

[1]: https://github.com/zen-kernel/zen-kernel/commit/7de2596b35ac1db
[2]: https://github.com/CachyOS/linux/blob/base-5.18/kernel/sched/fair.c

Signed-off-by: John Vincent <git@tenseventyseven.cf>
2024-12-03 21:42:01 +01:00
Tyler Nijmeh
9341ba576e sched: Process new forks before processing their parent
This should let brand new tasks launch marginally faster.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: Dušan Uverić <dusan.uveric9@gmail.com>
Signed-off-by: sohamxda7 <sensoham135@gmail.com>
Signed-off-by: Peppe289 <gsperanza204@gmail.com>
Signed-off-by: dodyirawan85 <40514988+dodyirawan85@users.noreply.github.com>
2024-12-03 21:39:00 +01:00
Tyler Nijmeh
ff454a80ba sched: Allow realtime tasks to consume entire sched periods
If the scenario is right, we can run realtime tasks for 5% longer. This
also disables lockup protection from unhandled realtime tasks.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: ZyCromerZ <neetroid97@gmail.com>
2024-12-03 21:38:07 +01:00
tytydraco
9e751eb5cb sched: Do not reduce perceived CPU capacity while idle
CPUs that are idle are excellent candidates for latency sensitive or
high-performance tasks. Decrementing their capacity while they are idle
will result in these CPUs being chosen less, and they will prefer to
schedule smaller tasks instead of large ones. Disable this.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2024-12-03 21:37:52 +01:00
Rik van Riel
2a2e1902e3 bpf: use kvzmalloc to allocate BPF verifier environment
[ Upstream commit 434247637c66e1be2bc71a9987d4c3f0d8672387 ]

The kzmalloc call in bpf_check can fail when memory is very fragmented,
which in turn can lead to an OOM kill.

Use kvzmalloc to fall back to vmalloc when memory is too fragmented to
allocate an order 3 sized bpf verifier environment.

Admittedly this is not a very common case, and only happens on systems
where memory has already been squeezed close to the limit, but this does
not seem like much of a hot path, and it's a simple enough fix.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20241008170735.16766766@imladris.surriel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-30 02:33:27 +01:00
Daniel Micay
1f63f26cd2 make sysctl constants read-only
Most of this is extracted from the last publicly available version of
the PaX patches where it's part of KERNEXEC as __read_only. It has been
extended to a few more of these constants.
2024-11-30 02:16:12 +01:00
Ksawlii
9947944ca2 Revert "cgroup/cpuset: Prevent UAF in proc_cpuset_show()"
This reverts commit 106a2662b1.
2024-11-24 00:23:50 +01:00
Ksawlii
481b3fc579 Revert "dma-debug: avoid deadlock between dma debug vs printk and netconsole"
This reverts commit a4688d6248.
2024-11-24 00:23:48 +01:00
Ksawlii
8a0aebb9d1 Revert "rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow"
This reverts commit 84d6dbf54c.
2024-11-24 00:23:46 +01:00
Ksawlii
04dba1cb02 Revert "cgroup: Protect css->cgroup write under css_set_lock"
This reverts commit 630897cdcb.
2024-11-24 00:23:40 +01:00
Ksawlii
93a14d2549 Revert "smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()"
This reverts commit 0b1e77f743.
2024-11-24 00:23:40 +01:00
Ksawlii
d5a08b5325 Revert "uprobes: Use kzalloc to allocate xol area"
This reverts commit 9a18ce1f12.
2024-11-24 00:23:37 +01:00
Ksawlii
afebebc45a Revert "perf/aux: Fix AUX buffer serialization"
This reverts commit 6d56f2f8a3.
2024-11-24 00:23:37 +01:00
Ksawlii
1dfae2e328 Revert "rtmutex: Drop rt_mutex::wait_lock before scheduling"
This reverts commit 07dcd58fea.
2024-11-24 00:23:36 +01:00
Ksawlii
7534e491aa Revert "cgroup: Make operations on the cgroup root_list RCU safe"
This reverts commit 6c357bd6a8.
2024-11-24 00:23:32 +01:00
Ksawlii
92c7fd94e3 Revert "x86/ibt,ftrace: Search for __fentry__ location"
This reverts commit f6f1a8e333.
2024-11-24 00:23:31 +01:00
Ksawlii
cf372342bd Revert "ftrace: Fix possible use-after-free issue in ftrace_location()"
This reverts commit 2c12c9f7ef.
2024-11-24 00:23:31 +01:00
Ksawlii
88d2580c4c Revert "padata: Honor the caller's alignment in case of chunk_size 0"
This reverts commit d937fc3fb1.
2024-11-24 00:23:30 +01:00
Ksawlii
9629f5d149 Revert "kthread: add kthread_work tracepoints"
This reverts commit 0944044e57.
2024-11-24 00:23:23 +01:00
Ksawlii
e6affe90e3 Revert "kthread: fix task state in kthread worker if being frozen"
This reverts commit b1ce87a881.
2024-11-24 00:23:23 +01:00
Ksawlii
fdff7f158f Revert "bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit"
This reverts commit 1f10bbe850.
2024-11-24 00:23:22 +01:00
Ksawlii
40967b338d Reapply "bpf: Fix DEVMAP_HASH overflow check on 32-bit arches"
This reverts commit ccff2373d6.
2024-11-24 00:23:17 +01:00
Ksawlii
d439656b70 Reapply "bpf: Eliminate rlimit-based memory accounting for devmap maps"
This reverts commit 41a87832ee.
2024-11-24 00:23:17 +01:00
Ksawlii
78f121c9d6 Revert "bpf: Fix DEVMAP_HASH overflow check on 32-bit arches"
This reverts commit 736d05560d.
2024-11-24 00:23:17 +01:00
Ksawlii
268d243925 Revert "padata: use integer wrap around to prevent deadlock on seq_nr overflow"
This reverts commit 3b9a874cc0.
2024-11-24 00:23:15 +01:00
Ksawlii
c848ca572d Revert "lockdep: fix deadlock issue between lockdep and rcu"
This reverts commit 6624949eca.
2024-11-24 00:23:13 +01:00
Ksawlii
62994ec6e3 Revert "signal: Replace BUG_ON()s"
This reverts commit dd7f63056a.
2024-11-24 00:23:07 +01:00
Ksawlii
734a82005c Revert "rcuscale: Provide clear error when async specified without primitives"
This reverts commit 6a6821675d.
2024-11-24 00:23:07 +01:00
Ksawlii
06e1842e46 Revert "perf/core: Fix small negative period being ignored"
This reverts commit 347040dc8f.
2024-11-24 00:23:05 +01:00
Ksawlii
adff2b9a7c Revert "uprobes: fix kernel info leak via "[uprobes]" vma"
This reverts commit 6ec781ea39.
2024-11-24 00:23:00 +01:00
Ksawlii
3447182bc8 Revert "tracing: Remove precision vsnprintf() check from print event"
This reverts commit 67512a7336.
2024-11-24 00:22:59 +01:00
Ksawlii
bc40c2aebd Revert "tracing: Have saved_cmdlines arrays all in one allocation"
This reverts commit 6154d7268a.
2024-11-24 00:22:59 +01:00
Ksawlii
bef5e7428f Revert "kallsyms: Make kallsyms_on_each_symbol generally available"
This reverts commit 9dc3580302.
2024-11-24 00:22:59 +01:00
Ksawlii
62030ec18e Revert "kallsyms: Make module_kallsyms_on_each_symbol generally available"
This reverts commit 7b296fe960.
2024-11-24 00:22:59 +01:00
Ksawlii
5fca109408 Revert "tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols"
This reverts commit 058ee93b4b.
2024-11-24 00:22:59 +01:00
Ksawlii
93ad10a774 Revert "tracing/kprobes: Fix symbol counting logic by looking at modules as well"
This reverts commit ef8f154c7c.
2024-11-24 00:22:59 +01:00
Ksawlii
5ec15cdd1e Revert "bpf: Check percpu map value size first"
This reverts commit 22c88ba3cc.
2024-11-24 00:22:59 +01:00
Ksawlii
a8299c44cf Revert "resource: fix region_intersects() vs add_memory_driver_managed()"
This reverts commit 29fa0c8808.
2024-11-24 00:22:55 +01:00
Byeonguk Jeong
6b0b24ccd0 bpf: Fix out-of-bounds write in trie_get_next_key()
[ Upstream commit 13400ac8fb80c57c2bfb12ebd35ee121ce9b4d21 ]

trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Tested-by: Hou Tao <houtao1@huawei.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/Zxx384ZfdlFYnz6J@localhost.localdomain
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-23 23:22:03 +01:00
Xiu Jianfeng
ed45b3440f cgroup: Fix potential overflow issue when checking max_depth
[ Upstream commit 3cc4e13bb1617f6a13e5e6882465984148743cf4 ]

cgroup.max.depth is the maximum allowed descent depth below the current
cgroup. If the actual descent depth is equal or larger, an attempt to
create a new child cgroup will fail. However due to the cgroup->max_depth
is of int type and having the default value INT_MAX, the condition
'level > cgroup->max_depth' will never be satisfied, and it will cause
an overflow of the level after it reaches to INT_MAX.

Fix it by starting the level from 0 and using '>=' instead.

It's worth mentioning that this issue is unlikely to occur in reality,
as it's impossible to have a depth of INT_MAX hierarchy, but should be
be avoided logically.

Fixes: 1a926e0bbab8 ("cgroup: implement hierarchy limits")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-23 23:22:02 +01:00
Jinjie Ruan
a2c183d885 posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
[ Upstream commit 6e62807c7fbb3c758d233018caf94dfea9c65dbd ]

If get_clock_desc() succeeds, it calls fget() for the clockid's fd,
and get the clk->rwsem read lock, so the error path should release
the lock to make the lock balance and fput the clockid's fd to make
the refcount balance and release the fd related resource.

However the below commit left the error path locked behind resulting in
unbalanced locking. Check timespec64_valid_strict() before
get_clock_desc() to fix it, because the "ts" is not changed
after that.

Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()")
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
[pabeni@redhat.com: fixed commit message typo]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-23 23:22:01 +01:00