kernel_samsung_a53x

Author	SHA1	Message	Date
K Prateek Nayak	d4d43810f2	sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy [ Upstream commit ff47a0acfcce309cf9e175149c75614491953c8f ] Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") optimizes IPIs to idle CPUs in TIF_POLLING_NRFLAG mode by setting the TIF_NEED_RESCHED flag in idle task's thread info and relying on flush_smp_call_function_queue() in idle exit path to run the call-function. A softirq raised by the call-function is handled shortly after in do_softirq_post_smp_call_flush() but the TIF_NEED_RESCHED flag remains set and is only cleared later when schedule_idle() calls __schedule(). need_resched() check in _nohz_idle_balance() exists to bail out of load balancing if another task has woken up on the CPU currently in-charge of idle load balancing which is being processed in SCHED_SOFTIRQ context. Since the optimization mentioned above overloads the interpretation of TIF_NEED_RESCHED, check for idle_cpu() before going with the existing need_resched() check which can catch a genuine task wakeup on an idle CPU processing SCHED_SOFTIRQ from do_softirq_post_smp_call_flush(), as well as the case where ksoftirqd needs to be preempted as a result of new task wakeup or slice expiry. In case of PREEMPT_RT or threadirqs, although the idle load balancing may be inhibited in some cases on the ilb CPU, the fact that ksoftirqd is the only fair task going back to sleep will trigger a newidle balance on the CPU which will alleviate some imbalance if it exists if idle balance fails to do so. Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119054432.6405-4-kprateek.nayak@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Valentin Schneider	8a8ef40c42	sched/fair: Add NOHZ balancer flag for nohz.next_balance updates [ Upstream commit efd984c481abb516fab8bafb25bf41fd9397a43c ] A following patch will trigger NOHZ idle balances as a means to update nohz.next_balance. Vincent noted that blocked load updates can have non-negligible overhead, which should be avoided if the intent is to only update nohz.next_balance. Add a new NOHZ balance kick flag, NOHZ_NEXT_KICK. Gate NOHZ blocked load update by the presence of NOHZ_STATS_KICK - currently all NOHZ balance kicks will have the NOHZ_STATS_KICK flag set, so no change in behaviour is expected. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210823111700.2842997-2-valentin.schneider@arm.com Stable-dep-of: ff47a0acfcce ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Vincent Guittot	ab620a407a	sched/fair: Trigger the update of blocked load on newly idle cpu [ Upstream commit c6f886546cb8a38617cdbe755fe50d3acd2463e4 ] Instead of waking up a random and already idle CPU, we can take advantage of this_cpu being about to enter idle to run the ILB and update the blocked load. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224133007.28644-7-vincent.guittot@linaro.org Stable-dep-of: ff47a0acfcce ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Vincent Guittot	9ae9714a14	sched/fair: Merge for each idle cpu loop of ILB [ Upstream commit 7a82e5f52a3506bc35a4dc04d53ad2c9daf82e7f ] Remove the specific case for handling this_cpu outside for_each_cpu() loop when running ILB. Instead we use for_each_cpu_wrap() and start with the next cpu after this_cpu so we will continue to finish with this_cpu. update_nohz_stats() is now used for this_cpu too and will prevents unnecessary update. We don't need a special case for handling the update of nohz.next_balance for this_cpu anymore because it is now handled by the loop like others. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224133007.28644-5-vincent.guittot@linaro.org Stable-dep-of: ff47a0acfcce ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Vincent Guittot	4ae526c326	sched/fair: Remove unused parameter of update_nohz_stats [ Upstream commit 64f84f273592d17dcdca20244168ad9f525a39c3 ] idle load balance is the only user of update_nohz_stats and doesn't use force parameter. Remove it Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224133007.28644-4-vincent.guittot@linaro.org Stable-dep-of: ff47a0acfcce ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Vincent Guittot	fe0cdb4e3f	sched/fair: Remove update of blocked load from newidle_balance [ Upstream commit 0826530de3cbdc89e60a89e86def94a5f0fc81ca ] newidle_balance runs with both preempt and irq disabled which prevent local irq to run during this period. The duration for updating the blocked load of CPUs varies according to the number of CPU cgroups with non-decayed load and extends this critical period to an uncontrolled level. Remove the update from newidle_balance and trigger a normal ILB that will take care of the update instead. This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to O(nr_cgroups). Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224133007.28644-2-vincent.guittot@linaro.org Stable-dep-of: ff47a0acfcce ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
K Prateek Nayak	d08554baac	sched/core: Remove the unnecessary need_resched() check in nohz_csd_func() [ Upstream commit ea9cffc0a154124821531991d5afdd7e8b20d7aa ] The need_resched() check currently in nohz_csd_func() can be tracked to have been added in scheduler_ipi() back in 2011 via commit ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance") Since then, it has travelled quite a bit but it seems like an idle_cpu() check currently is sufficient to detect the need to bail out from an idle load balancing. To justify this removal, consider all the following case where an idle load balancing could race with a task wakeup: o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle") a target perceived to be idle (target_rq->nr_running == 0) will return true for ttwu_queue_cond(target) which will offload the task wakeup to the idle target via an IPI. In all such cases target_rq->ttwu_pending will be set to 1 before queuing the wake function. If an idle load balance races here, following scenarios are possible: - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual IPI is sent to the CPU to wake it out of idle. If the nohz_csd_func() queues before sched_ttwu_pending(), the idle load balance will bail out since idle_cpu(target) returns 0 since target_rq->ttwu_pending is 1. If the nohz_csd_func() is queued after sched_ttwu_pending() it should see rq->nr_running to be non-zero and bail out of idle load balancing. - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI, the sender will simply set TIF_NEED_RESCHED for the target to put it out of idle and flush_smp_call_function_queue() in do_idle() will execute the call function. Depending on the ordering of the queuing of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in nohz_csd_func() should either see target_rq->ttwu_pending = 1 or target_rq->nr_running to be non-zero if there is a genuine task wakeup racing with the idle load balance kick. o The waker CPU perceives the target CPU to be busy (targer_rq->nr_running != 0) but the CPU is in fact going idle and due to a series of unfortunate events, the system reaches a case where the waker CPU decides to perform the wakeup by itself in ttwu_queue() on the target CPU but target is concurrently selected for idle load balance (XXX: Can this happen? I'm not sure, but we'll consider the mother of all coincidences to estimate the worst case scenario). ttwu_do_activate() calls enqueue_task() which would increment "rq->nr_running" post which it calls wakeup_preempt() which is responsible for setting TIF_NEED_RESCHED (via a resched IPI or by setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key thing to note in this case is that rq->nr_running is already non-zero in case of a wakeup before TIF_NEED_RESCHED is set which would lead to idle_cpu() check returning false. In all cases, it seems that need_resched() check is unnecessary when checking for idle_cpu() first since an impending wakeup racing with idle load balancer will either set the "rq->ttwu_pending" or indicate a newly woken task via "rq->nr_running". Chasing the reason why this check might have existed in the first place, I came across Peter's suggestion on the fist iteration of Suresh's patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was: sched_ttwu_do_pending(list); if (unlikely((rq->idle == current) && rq->nohz_balance_kick && !need_resched())) raise_softirq_irqoff(SCHED_SOFTIRQ); Since the condition to raise the SCHED_SOFIRQ was preceded by sched_ttwu_do_pending() (which is equivalent of sched_ttwu_pending()) in the current upstream kernel, the need_resched() check was necessary to catch a newly queued task. Peter suggested modifying it to: if (idle_cpu() && rq->nohz_balance_kick && !need_resched()) raise_softirq_irqoff(SCHED_SOFTIRQ); where idle_cpu() seems to have replaced "rq->idle == current" check. Even back then, the idle_cpu() check would have been sufficient to catch a new task being enqueued. Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") overloads the interpretation of TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based on Peter's suggestion. Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119054432.6405-3-kprateek.nayak@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:33 +01:00
Uros Bizjak	7870fff3b6	tracing: Use atomic64_inc_return() in trace_clock_counter() [ Upstream commit eb887c4567d1b0e7684c026fe7df44afa96589e6 ] Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref) to use optimized implementation and ease register pressure around the primitive for targets that implement optimized variant. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241007085651.48544-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:31 +01:00
Levi Yun	e992d99dcd	dma-debug: fix a possible deadlock on radix_lock [ Upstream commit 7543c3e3b9b88212fcd0aaf5cab5588797bdc7de ] radix_lock() shouldn't be held while holding dma_hash_entry[idx].lock otherwise, there's a possible deadlock scenario when dma debug API is called holding rq_lock(): CPU0 CPU1 CPU2 dma_free_attrs() check_unmap() add_dma_entry() __schedule() //out (A) rq_lock() get_hash_bucket() (A) dma_entry_hash check_sync() (A) radix_lock() (W) dma_entry_hash dma_entry_free() (W) radix_lock() // CPU2's one (W) rq_lock() CPU1 situation can happen when it extending radix tree and it tries to wake up kswapd via wake_all_kswapd(). CPU2 situation can happen while perf_event_task_sched_out() (i.e. dma sync operation is called while deleting perf_event using etm and etr tmc which are Arm Coresight hwtracing driver backends). To remove this possible situation, call dma_entry_free() after put_hash_bucket() in check_unmap(). Reported-by: Denis Nikitin <denik@chromium.org> Closes: https://lists.linaro.org/archives/list/coresight@lists.linaro.org/thread/2WMS7BBSF5OZYB63VT44U5YWLFP5HL6U/#RWM6MLQX5ANBTEQ2PRM7OXCBGCE6NPWU Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:30 +01:00
Marco Elver	148c1e84e1	kcsan: Turn report_filterlist_lock into a raw_spinlock [ Upstream commit 59458fa4ddb47e7891c61b4a928d13d5f5b00aa0 ] Ran Xiaokai reports that with a KCSAN-enabled PREEMPT_RT kernel, we can see splats like: \| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 \| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 \| preempt_count: 10002, expected: 0 \| RCU nest depth: 0, expected: 0 \| no locks held by swapper/1/0. \| irq event stamp: 156674 \| hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240 \| hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0 \| softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60 \| softirqs last disabled at (0): [<0000000000000000>] 0x0 \| Preemption disabled at: \| [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90 \| CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3 \| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 \| Call Trace: \| <IRQ> \| dump_stack_lvl+0x7e/0xc0 \| dump_stack+0x1d/0x30 \| __might_resched+0x1a2/0x270 \| rt_spin_lock+0x68/0x170 \| kcsan_skip_report_debugfs+0x43/0xe0 \| print_report+0xb5/0x590 \| kcsan_report_known_origin+0x1b1/0x1d0 \| kcsan_setup_watchpoint+0x348/0x650 \| __tsan_unaligned_write1+0x16d/0x1d0 \| hrtimer_interrupt+0x3d6/0x430 \| __sysvec_apic_timer_interrupt+0xe8/0x3a0 \| sysvec_apic_timer_interrupt+0x97/0xc0 \| </IRQ> On a detected data race, KCSAN's reporting logic checks if it should filter the report. That list is protected by the report_filterlist_lock non-raw spinlock which may sleep on RT kernels. Since KCSAN may report data races in any context, convert it to a raw_spinlock. This requires being careful about when to allocate memory for the filter list itself which can be done via KCSAN's debugfs interface. Concurrent modification of the filter list via debugfs should be rare: the chosen strategy is to optimistically pre-allocate memory before the critical section and discard if unused. Link: https://lore.kernel.org/all/20240925143154.2322926-1-ranxiaokai627@163.com/ Reported-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Tested-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:29 +01:00
Maciej Fijalkowski	7d4a3d040e	bpf: fix OOB devmap writes when deleting elements commit ab244dd7cf4c291f82faacdc50b45cc0f55b674d upstream. Jordy reported issue against XSKMAP which also applies to DEVMAP - the index used for accessing map entry, due to being a signed integer, causes the OOB writes. Fix is simple as changing the type from int to u32, however, when compared to XSKMAP case, one more thing needs to be addressed. When map is released from system via dev_map_free(), we iterate through all of the entries and an iterator variable is also an int, which implies OOB accesses. Again, change it to be u32. Example splat below: [ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000 [ 160.731662] #PF: supervisor read access in kernel mode [ 160.736876] #PF: error_code(0x0000) - not-present page [ 160.742095] PGD 0 P4D 0 [ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP [ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487 [ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 160.767642] Workqueue: events_unbound bpf_map_free_deferred [ 160.773308] RIP: 0010:dev_map_free+0x77/0x170 [ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff [ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202 [ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024 [ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000 [ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001 [ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122 [ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000 [ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000 [ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0 [ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 160.874092] PKRU: 55555554 [ 160.876847] Call Trace: [ 160.879338] <TASK> [ 160.881477] ? __die+0x20/0x60 [ 160.884586] ? page_fault_oops+0x15a/0x450 [ 160.888746] ? search_extable+0x22/0x30 [ 160.892647] ? search_bpf_extables+0x5f/0x80 [ 160.896988] ? exc_page_fault+0xa9/0x140 [ 160.900973] ? asm_exc_page_fault+0x22/0x30 [ 160.905232] ? dev_map_free+0x77/0x170 [ 160.909043] ? dev_map_free+0x58/0x170 [ 160.912857] bpf_map_free_deferred+0x51/0x90 [ 160.917196] process_one_work+0x142/0x370 [ 160.921272] worker_thread+0x29e/0x3b0 [ 160.925082] ? rescuer_thread+0x4b0/0x4b0 [ 160.929157] kthread+0xd4/0x110 [ 160.932355] ? kthread_park+0x80/0x80 [ 160.936079] ret_from_fork+0x2d/0x50 [ 160.943396] ? kthread_park+0x80/0x80 [ 160.950803] ret_from_fork_asm+0x11/0x20 [ 160.958482] </TASK> Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references") CC: stable@vger.kernel.org Reported-by: Jordy Zomer <jordyzomer@google.com> Suggested-by: Jordy Zomer <jordyzomer@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-17 13:24:29 +01:00
Kuan-Wei Chiu	c31c8dd62f	tracing: Fix cmp_entries_dup() to respect sort() comparison rules commit e63fbd5f6810ed756bbb8a1549c7d4132968baa9 upstream. The cmp_entries_dup() function used as the comparator for sort() violated the symmetry and transitivity properties required by the sorting algorithm. Specifically, it returned 1 whenever memcmp() was non-zero, which broke the following expectations: * Symmetry: If x < y, then y > x. * Transitivity: If x < y and y < z, then x < z. These violations could lead to incorrect sorting and failure to correctly identify duplicate elements. Fix the issue by directly returning the result of memcmp(), which adheres to the required comparison properties. Cc: stable@vger.kernel.org Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map") Link: https://lore.kernel.org/20241203202228.1274403-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-17 13:24:28 +01:00
Hou Tao	6260fcd754	bpf: Fix exact match conditions in trie_get_next_key() [ Upstream commit 27abc7b3fa2e09bbe41e2924d328121546865eda ] trie_get_next_key() uses node->prefixlen == key->prefixlen to identify an exact match, However, it is incorrect because when the target key doesn't fully match the found node (e.g., node->prefixlen != matchlen), these two nodes may also have the same prefixlen. It will return expected result when the passed key exist in the trie. However when a recently-deleted key or nonexistent key is passed to trie_get_next_key(), it may skip keys and return incorrect result. Fix it by using node->prefixlen == matchlen to identify exact matches. When the condition is true after the search, it also implies node->prefixlen equals key->prefixlen, otherwise, the search would return NULL instead. Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:28 +01:00
Hou Tao	50e0b53ff1	bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie [ Upstream commit eae6a075e9537dd69891cf77ca5a88fa8a28b4a1 ] Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST flags. These flags can be specified by users and are relevant since LPM trie supports exact matches during update. Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:28 +01:00
Levi Yun	c20eee8bbe	trace/trace_event_perf: remove duplicate samples on the first tracepoint event [ Upstream commit afe5960dc208fe069ddaaeb0994d857b24ac19d1 ] When a tracepoint event is created with attr.freq = 1, 'hwc->period_left' is not initialized correctly. As a result, in the perf_swevent_overflow() function, when the first time the event occurs, it calculates the event overflow and the perf_swevent_set_period() returns 3, this leads to the event are recorded for three duplicate times. Step to reproduce: 1. Enable the tracepoint event & starting tracing $ echo 1 > /sys/kernel/tracing/events/module/module_free $ echo 1 > /sys/kernel/tracing/tracing_on 2. Record with perf $ perf record -a --strict-freq -F 1 -e "module:module_free" 3. Trigger module_free event. $ modprobe -i sunrpc $ modprobe -r sunrpc Result: - Trace pipe result: $ cat trace_pipe modprobe-174509 [003] ..... 6504.868896: module_free: sunrpc - perf sample: modprobe 174509 [003] 6504.868980: module:module_free: sunrpc modprobe 174509 [003] 6504.868980: module:module_free: sunrpc modprobe 174509 [003] 6504.868980: module:module_free: sunrpc By setting period_left via perf_swevent_set_period() as other sw_event did, This problem could be solved. After patch: - Trace pipe result: $ cat trace_pipe modprobe 1153096 [068] 613468.867774: module:module_free: xfs - perf sample modprobe 1153096 [068] 613468.867794: module:module_free: xfs Link: https://lore.kernel.org/20240913021347.595330-1-yeoreum.yun@arm.com Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment") Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:08 +01:00
Chen Ridong	5c074418aa	cgroup/bpf: only cgroup v2 can be attached by bpf programs [ Upstream commit 2190df6c91373fdec6db9fc07e427084f232f57e ] Only cgroup v2 can be attached by bpf programs, so this patch introduces that cgroup_bpf_inherit and cgroup_bpf_offline can only be called in cgroup v2, and this can fix the memleak mentioned by commit 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline"), which has been reverted. Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Link: https://lore.kernel.org/cgroups/aka2hk5jsel5zomucpwlxsej6iwnfw4qu5jkrmjhyfhesjlfdw@46zxhg5bdnr7/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:02 +01:00
Chen Ridong	f70b602307	Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline" [ Upstream commit feb301c60970bd2a1310a53ce2d6e4375397a51b ] This reverts commit 04f8ef5643bcd8bcde25dfdebef998aea480b2ba. Only cgroup v2 can be attached by cgroup by BPF programs. Revert this commit and cgroup_bpf_inherit and cgroup_bpf_offline won't be called in cgroup v1. The memory leak issue will be fixed with next patch. Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline") Link: https://lore.kernel.org/cgroups/aka2hk5jsel5zomucpwlxsej6iwnfw4qu5jkrmjhyfhesjlfdw@46zxhg5bdnr7/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:02 +01:00
Miguel Ojeda	1d4d4e5c76	time: Fix references to _msecs_to_jiffies() handling of values [ Upstream commit 92b043fd995a63a57aae29ff85a39b6f30cd440c ] The details about the handling of the "normal" values were moved to the _msecs_to_jiffies() helpers in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies"). However, the same commit still mentioned __msecs_to_jiffies() in the added documentation. Thus point to _msecs_to_jiffies() instead. Fixes: ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241025110141.157205-2-ojeda@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:24:00 +01:00
Paul E. McKenney	3f7db10a69	rcu-tasks: Idle tasks on offline CPUs are in quiescent states commit 5c9a9ca44fda41c5e82f50efced5297a9c19760d upstream. Any idle task corresponding to an offline CPU is in an RCU Tasks Trace quiescent state. This commit causes rcu_tasks_trace_postscan() to ignore idle tasks for offline CPUs, which it can do safely due to CPU-hotplug operations being disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org> Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-17 13:23:58 +01:00
guoweikang	be2d07b4f2	ftrace: Fix regression with module command in stack_trace_filter commit 45af52e7d3b8560f21d139b3759735eead8b1653 upstream. When executing the following command: # echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter The current mod command causes a null pointer dereference. While commit 0f17976568b3f ("ftrace: Fix regression with module command in stack_trace_filter") has addressed part of the issue, it left a corner case unhandled, which still results in a kernel crash. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241120052750.275463-1-guoweikang.kernel@gmail.com Fixes: 04ec7bb642b77 ("tracing: Have the trace_array hold the list of registered func probes"); Signed-off-by: guoweikang <guoweikang.kernel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-17 13:20:50 +01:00
Ksawlii	90bc687dab	Revert "make sysctl constants read-only" This reverts commit `1f63f26cd2`.	2024-12-03 19:57:26 +01:00
Rik van Riel	2a2e1902e3	bpf: use kvzmalloc to allocate BPF verifier environment [ Upstream commit 434247637c66e1be2bc71a9987d4c3f0d8672387 ] The kzmalloc call in bpf_check can fail when memory is very fragmented, which in turn can lead to an OOM kill. Use kvzmalloc to fall back to vmalloc when memory is too fragmented to allocate an order 3 sized bpf verifier environment. Admittedly this is not a very common case, and only happens on systems where memory has already been squeezed close to the limit, but this does not seem like much of a hot path, and it's a simple enough fix. Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/r/20241008170735.16766766@imladris.surriel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-30 02:33:27 +01:00
Daniel Micay	1f63f26cd2	make sysctl constants read-only Most of this is extracted from the last publicly available version of the PaX patches where it's part of KERNEXEC as __read_only. It has been extended to a few more of these constants.	2024-11-30 02:16:12 +01:00
Ksawlii	9947944ca2	Revert "cgroup/cpuset: Prevent UAF in proc_cpuset_show()" This reverts commit `106a2662b1`.	2024-11-24 00:23:50 +01:00
Ksawlii	481b3fc579	Revert "dma-debug: avoid deadlock between dma debug vs printk and netconsole" This reverts commit `a4688d6248`.	2024-11-24 00:23:48 +01:00
Ksawlii	8a0aebb9d1	Revert "rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow" This reverts commit `84d6dbf54c`.	2024-11-24 00:23:46 +01:00
Ksawlii	04dba1cb02	Revert "cgroup: Protect css->cgroup write under css_set_lock" This reverts commit `630897cdcb`.	2024-11-24 00:23:40 +01:00
Ksawlii	93a14d2549	Revert "smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()" This reverts commit `0b1e77f743`.	2024-11-24 00:23:40 +01:00
Ksawlii	d5a08b5325	Revert "uprobes: Use kzalloc to allocate xol area" This reverts commit `9a18ce1f12`.	2024-11-24 00:23:37 +01:00
Ksawlii	afebebc45a	Revert "perf/aux: Fix AUX buffer serialization" This reverts commit `6d56f2f8a3`.	2024-11-24 00:23:37 +01:00
Ksawlii	1dfae2e328	Revert "rtmutex: Drop rt_mutex::wait_lock before scheduling" This reverts commit `07dcd58fea`.	2024-11-24 00:23:36 +01:00
Ksawlii	7534e491aa	Revert "cgroup: Make operations on the cgroup root_list RCU safe" This reverts commit `6c357bd6a8`.	2024-11-24 00:23:32 +01:00
Ksawlii	92c7fd94e3	Revert "x86/ibt,ftrace: Search for __fentry__ location" This reverts commit `f6f1a8e333`.	2024-11-24 00:23:31 +01:00
Ksawlii	cf372342bd	Revert "ftrace: Fix possible use-after-free issue in ftrace_location()" This reverts commit `2c12c9f7ef`.	2024-11-24 00:23:31 +01:00
Ksawlii	88d2580c4c	Revert "padata: Honor the caller's alignment in case of chunk_size 0" This reverts commit `d937fc3fb1`.	2024-11-24 00:23:30 +01:00
Ksawlii	9629f5d149	Revert "kthread: add kthread_work tracepoints" This reverts commit `0944044e57`.	2024-11-24 00:23:23 +01:00
Ksawlii	e6affe90e3	Revert "kthread: fix task state in kthread worker if being frozen" This reverts commit `b1ce87a881`.	2024-11-24 00:23:23 +01:00
Ksawlii	fdff7f158f	Revert "bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit" This reverts commit `1f10bbe850`.	2024-11-24 00:23:22 +01:00
Ksawlii	40967b338d	Reapply "bpf: Fix DEVMAP_HASH overflow check on 32-bit arches" This reverts commit `ccff2373d6`.	2024-11-24 00:23:17 +01:00
Ksawlii	d439656b70	Reapply "bpf: Eliminate rlimit-based memory accounting for devmap maps" This reverts commit `41a87832ee`.	2024-11-24 00:23:17 +01:00
Ksawlii	78f121c9d6	Revert "bpf: Fix DEVMAP_HASH overflow check on 32-bit arches" This reverts commit `736d05560d`.	2024-11-24 00:23:17 +01:00
Ksawlii	268d243925	Revert "padata: use integer wrap around to prevent deadlock on seq_nr overflow" This reverts commit `3b9a874cc0`.	2024-11-24 00:23:15 +01:00
Ksawlii	c848ca572d	Revert "lockdep: fix deadlock issue between lockdep and rcu" This reverts commit `6624949eca`.	2024-11-24 00:23:13 +01:00
Ksawlii	62994ec6e3	Revert "signal: Replace BUG_ON()s" This reverts commit `dd7f63056a`.	2024-11-24 00:23:07 +01:00
Ksawlii	734a82005c	Revert "rcuscale: Provide clear error when async specified without primitives" This reverts commit `6a6821675d`.	2024-11-24 00:23:07 +01:00
Ksawlii	06e1842e46	Revert "perf/core: Fix small negative period being ignored" This reverts commit `347040dc8f`.	2024-11-24 00:23:05 +01:00
Ksawlii	adff2b9a7c	Revert "uprobes: fix kernel info leak via "[uprobes]" vma" This reverts commit `6ec781ea39`.	2024-11-24 00:23:00 +01:00
Ksawlii	3447182bc8	Revert "tracing: Remove precision vsnprintf() check from print event" This reverts commit `67512a7336`.	2024-11-24 00:22:59 +01:00
Ksawlii	bc40c2aebd	Revert "tracing: Have saved_cmdlines arrays all in one allocation" This reverts commit `6154d7268a`.	2024-11-24 00:22:59 +01:00
Ksawlii	bef5e7428f	Revert "kallsyms: Make kallsyms_on_each_symbol generally available" This reverts commit `9dc3580302`.	2024-11-24 00:22:59 +01:00

1 2 3 4 5 ...

349 commits