Commit graph

7351 commits

Author SHA1 Message Date
Ksawlii
d2a123efff Revert "BACKPORT: FROMLIST: binder: fix freeze UAF in binder_release_work()"
This reverts commit 5380adeb80.
2024-12-18 15:46:58 +01:00
Ksawlii
d61e735066 Revert "BACKPORT: FROMGIT: binder: fix memleak of proc->delivered_freeze"
This reverts commit 1cf14664be.
2024-12-18 15:46:57 +01:00
Ksawlii
4a640601c9 Revert "BACKPORT: FROMGIT: binder: add delivered_freeze to debugfs output"
This reverts commit 4554485eb2.
2024-12-18 15:46:56 +01:00
Ksawlii
7225849ad5 Revert "tcp: tracking packets with CE marks in BW rate sample"
This reverts commit 33eb09a194.
2024-12-18 15:36:41 +01:00
Ksawlii
1f9b6d5fff Revert "net-tcp_bbr: v2: adjust skb tx.in_flight upon split in tcp_fragment()"
This reverts commit eddb362fbb.
2024-12-18 15:36:35 +01:00
Ksawlii
77303eca0a Revert "tcp: introduce per-route feature RTAX_FEATURE_ECN_LOW"
This reverts commit c57053bdc6.
2024-12-18 15:36:29 +01:00
Ksawlii
b9f262b660 Revert "tcp: add rcv_wnd and plb_rehash to TCP_INFO"
This reverts commit a00aad3a04.
2024-12-18 15:32:40 +01:00
Ksawlii
7905b1f2d0 Revert "net-tcp: add fast_ack_mode=1: skip rwin check in tcp_fast_ack_mode__tcp_ack_snd_check()"
This reverts commit 54113b9ad5.
2024-12-18 15:32:27 +01:00
Ksawlii
c2cfa39ed7 Revert "tcp: export TCPI_OPT_ECN_LOW in tcp_info tcpi_options field"
This reverts commit 62a04f2316.
2024-12-18 15:32:26 +01:00
Ksawlii
bfea72055d Revert "tcp: add accessors to read/set tp->snd_cwnd"
This reverts commit c8a587ff65.
2024-12-18 15:30:18 +01:00
Nahuel Gómez
958fb60b40 configs: use bbr as default tcp cong. algorithm
We already have the BBRv3 patches merged.

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 15:14:17 +01:00
Nahuel Gómez
a6f250c0e8 sched: fair: tune BORE
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 15:10:03 +01:00
Nahuel Gómez
46aa3942ce sched: fair: properly call reweight_task for BORE
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 15:09:59 +01:00
Masahito S
f299ae0372 5.4 bootup fix: Variable declaration types 2024-12-18 15:09:53 +01:00
Masahito S
0769637133 5.4 compile error fix: ISO C90 forbids mixed declarations and code 2024-12-18 15:09:49 +01:00
Masahito S
30aca2aa4e 5.4 compile error fix: Use SYSCTL_ZERO and SYSCTL_ONE 2024-12-18 15:09:47 +01:00
Masahito S
ff6bc15f2a sched: linux5.4.y-bore5.1.0 2024-12-18 15:09:41 +01:00
Mubashir Adnan Qureshi
d496daa833 tcp: add sysctls for TCP PLB parameters
PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226

This commit adds new sysctl knobs and sets their default values for
TCP PLB.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:08:12 +01:00
Mubashir Adnan Qureshi
f3217ac2c6 tcp: add support for PLB in DCTCP
PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the
congestion signal, PLB also uses ECN to make decisions whether to change
the path or not upon sustained congestion.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:57 +01:00
Mubashir Adnan Qureshi
a00aad3a04 tcp: add rcv_wnd and plb_rehash to TCP_INFO
rcv_wnd can be useful to diagnose TCP performance where receiver window
becomes the bottleneck. rehash reports the PLB and timeout triggered
rehash attempts by the TCP connection.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:43 +01:00
Eric Dumazet
c8a587ff65 tcp: add accessors to read/set tp->snd_cwnd
We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:40 +01:00
Yuchung Cheng
33eb09a194 tcp: tracking packets with CE marks in BW rate sample
In order to track CE marks per rate sample (one round trip), TCP needs a
per-skb header field to record the tp->delivered_ce count when the skb
was sent. To make space, we replace the "last_in_flight" field which is
used exclusively for NV congestion control. The stat needed by NV can be
alternatively approximated by existing stats tcp_sock delivered and
mss_cache.

This patch counts the number of packets delivered which have CE marks in
the rate sample, using similar approach of delivery accounting.

Cc: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:35 +01:00
Neal Cardwell
0f3a19122b net-tcp_bbr: broaden app-limited rate sample detection
This commit is a bug fix for the Linux TCP app-limited
(application-limited) logic that is used for collecting rate
(bandwidth) samples.

Previously the app-limited logic only looked for "bubbles" of
silence in between application writes, by checking at the start
of each sendmsg. But "bubbles" of silence can also happen before
retransmits: e.g. bubbles can happen between an application write
and a retransmit, or between two retransmits.

Retransmits are triggered by ACKs or timers. So this commit checks
for bubbles of app-limited silence upon ACKs or timers.

Why does this commit check for app-limited state at the start of
ACKs and timer handling? Because at that point we know whether
inflight was fully using the cwnd.  During processing the ACK or
timer event we often change the cwnd; after changing the cwnd we
can't know whether inflight was fully using the old cwnd.

Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc
Change-Id: I37221506f5166877c2b110753d39bb0757985e68
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:30 +01:00
Neal Cardwell
0aafceb293 net-tcp_bbr: v2: export FLAG_ECE in rate_sample.is_ece
For understanding the relationship between inflight and ECN signals,
to try to find the highest inflight value that has acceptable levels
ECN marking.

Effort: net-tcp_bbr
Origin-9xx-SHA1: 3eba998f2898541406c2666781182200934965a8
Change-Id: I3a964e04cee83e11649a54507043d2dfe769a3b3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:14 +01:00
Neal Cardwell
1970659e8a net-tcp_bbr: v2: introduce ca_ops->skb_marked_lost() CC module callback API
For connections experiencing reordering, RACK can mark packets lost
long after we receive the SACKs/ACKs hinting that the packets were
actually lost.

This means that CC modules cannot easily learn the volume of inflight
data at which packet loss happens by looking at the current inflight
or even the packets in flight when the most recently SACKed packet was
sent. To learn this, CC modules need to know how many packets were in
flight at the time lost packets were sent. This new callback, combined
with TCP_SKB_CB(skb)->tx.in_flight, allows them to learn this.

This also provides a consistent callback that is invoked whether
packets are marked lost upon ACK processing, using the RACK reordering
timer, or at RTO time.

Effort: net-tcp_bbr
Origin-9xx-SHA1: afcbebe3374e4632ac6714d39e4dc8a8455956f4
Change-Id: I54826ab53df636be537e5d3c618a46145d12d51a
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:09 +01:00
Neal Cardwell
20cf76b03c net-tcp_bbr: v2: adjust skb tx.in_flight upon merge in tcp_shifted_skb()
When tcp_shifted_skb() updates state as adjacent SACKed skbs are
coalesced, previously the tx.in_flight was not adjusted, so we could
get contradictory state where the skb's recorded pcount was bigger
than the tx.in_flight (the number of segments that were in_flight
after sending the skb).

Normally have a SACKed skb with contradictory pcount/tx.in_flight
would not matter. However, with SACK reneging, the SACKed bit is
removed, and an skb once again becomes eligible for retransmitting,
fragmenting, SACKing, etc. Packetdrill testing verified the following
sequence is possible in a kernel that does not have this commit:

 - skb N is SACKed
 - skb N+1 is SACKed and combined with skb N using tcp_shifted_skb()
   - tcp_shifted_skb() will increase the pcount of prev,
     but leave tx.in_flight as-is
   - so prev skb can have pcount > tx.in_flight
 - RTO, tcp_timeout_mark_lost(), detect reneg,
   remove "SACKed" bit, mark skb N as lost
   - find pcount of skb N is greater than its tx.in_flight

I suspect this issue iw what caused the bbr2_inflight_hi_from_lost_skb():
  WARN_ON_ONCE(inflight_prev < 0)
to fire in production machines using bbr2.

Effort: net-tcp_bbr
Origin-9xx-SHA1: 1a3e997e613d2dcf32b947992882854ebe873715
Change-Id: I1b0b75c27519953430c7db51c6f358f104c7af55
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:04 +01:00
Neal Cardwell
eddb362fbb net-tcp_bbr: v2: adjust skb tx.in_flight upon split in tcp_fragment()
When we fragment an skb that has already been sent, we need to update
the tx.in_flight for the first skb in the resulting pair ("buff").

Because we were not updating the tx.in_flight, the tx.in_flight value
was inconsistent with the pcount of the "buff" skb (tx.in_flight would
be too high). That meant that if the "buff" skb was lost, then
bbr2_inflight_hi_from_lost_skb() would calculate an inflight_hi value
that is too high. This could result in longer queues and higher packet
loss.

Packetdrill testing verified that without this commit, when the second
half of an skb is SACKed and then later the first half of that skb is
marked lost, the calculated inflight_hi was incorrect.

Effort: net-tcp_bbr
Origin-9xx-SHA1: 385f1ddc610798fab2837f9f372857438b25f874
Origin-9xx-SHA1: a0eb099690af net-tcp_bbr: v2: fix tcp_fragment() tx.in_flight recomputation [prod feb 8 2021; use as a fixup]
Origin-9xx-SHA1: 885503228153ff0c9114e net-tcp_bbr: v2: introduce tcp_skb_tx_in_flight_is_suspicious() helper for warnings
Change-Id: I617f8cab4e9be7a0b8e8d30b047bf8645393354d
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:07:00 +01:00
Yousuk Seung
f1bd4e759f net-tcp: add new ca opts flag TCP_CONG_WANTS_CE_EVENTS
Add a a new ca opts flag TCP_CONG_WANTS_CE_EVENTS that allows a
congestion control module to receive CE events.

Currently congestion control modules have to set the TCP_CONG_NEEDS_ECN
bit in opts flag to receive CE events but this may incur changes in ECN
behavior elsewhere. This patch adds a new bit TCP_CONG_WANTS_CE_EVENTS
that allows congestion control modules to receive CE events
independently of TCP_CONG_NEEDS_ECN.

Effort: net-tcp
Origin-9xx-SHA1: 9f7e14716cde760bc6c67ef8ef7e1ee48501d95b
Change-Id: I2255506985242f376d910c6fd37daabaf4744f24
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:57 +01:00
Neal Cardwell
54113b9ad5 net-tcp: add fast_ack_mode=1: skip rwin check in tcp_fast_ack_mode__tcp_ack_snd_check()
Add logic for an experimental TCP connection behavior, enabled with
tp->fast_ack_mode = 1, which disables checking the receive window
before sending an ack in __tcp_ack_snd_check(). If this behavior is
enabled, the data receiver sends an ACK if the amount of data is >
RCV.MSS.

Change-Id: Iaa0a0fd7108221f883137a79d5bfa724f1b096d4
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:47 +01:00
Jianfeng Wang
96c42cd35d net-tcp_bbr: v2: inform CC module of losses repaired by TLP probe
Before this commit, when there is a packet loss that creates a sequence
hole that is filled by a TLP loss probe, then tcp_process_tlp_ack()
only informs the congestion control (CC) module via a back-to-back entry
and exit of CWR. But some congestion control modules (e.g. BBR) do not
respond to CWR events.

This commit adds a new CA event with which the core TCP stack notifies
the CC module when a loss is repaired by a TLP. This will allow CC
modules that do not use the CWR mechanism to have a custom handler for
such TLP recoveries.

Effort: net-tcp_bbr
Change-Id: Ieba72332b401b329bff5a641d2b2043a3fb8f632
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:40 +01:00
Neal Cardwell
4a3e7b65cd net-tcp_bbr: v2: introduce is_acking_tlp_retrans_seq into rate_sample
Introduce is_acking_tlp_retrans_seq into rate_sample. This bool will
export to the CC module the knowledge of whether the current ACK
matched a TLP retransmit.

Note that when this bool is true, we cannot yet tell (in general) whether
this ACK is for the original or the TLP retransmit.

Effort: net-tcp_bbr
Change-Id: I2e6494332167e75efcbdc99bd5c119034e9c39b4
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:33 +01:00
David Morley
c57053bdc6 tcp: introduce per-route feature RTAX_FEATURE_ECN_LOW
Define and implement a new per-route feature, RTAX_FEATURE_ECN_LOW.

This feature indicates that the given destination network is a
low-latency ECN environment, meaning both that ECN CE marks are
applied by the network using a low-latency marking threshold and also
that TCP endpoints provide precise per-data-segment ECN feedback in
ACKs (where the ACK ECE flag echoes the received CE status of all
newly-acknowledged data segments). This feature indication can be used
by congestion control algorithms to decide how to interpret ECN
signals over the given destination network.

This feature is appropriate for datacenter-style ECN marking, such as
the ECN marking approach expected by DCTCP or BBR congestion control
modules.

Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Tested-by: David Morley <morleyd@google.com>
Change-Id: I6bc06e9c6cb426fbae7243fc71c9a8c18175f5d3
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:30 +01:00
Neal Cardwell
62a04f2316 tcp: export TCPI_OPT_ECN_LOW in tcp_info tcpi_options field
Analogous to other important ECN information, export TCPI_OPT_ECN_LOW
in tcp_info tcpi_options field.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Change-Id: I08d8d8c7e8780e6e37df54038ee50301ac5a0320
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-12-18 15:06:07 +01:00
Sultan Alsawaf
3764b4f297 mm: Omit RCU read lock in list_lru_count_one() when RCU isn't needed
The RCU read lock isn't necessary in list_lru_count_one() when the
condition that requires RCU (CONFIG_MEMCG && !CONFIG_SLOB) isn't met.
The highly-frequent RCU lock and unlock adds measurable overhead to the
shrink_slab() path when it isn't needed. As such, we can simply omit the
RCU read lock in this case to improve performance.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: priiii1808 <priyanshusinghal0818@gmail.com>
2024-12-18 15:05:51 +01:00
Al Viro
436eb37b50 d_add_ci(): make sure we don't miss d_lookup_done()
All callers of d_alloc_parallel() must make sure that resulting
in-lookup dentry (if any) will encounter __d_lookup_done() before
the final dput().  d_add_ci() might end up creating in-lookup
dentries; they are fed to d_splice_alias(), which will normally
make sure they meet __d_lookup_done().  However, it is possible
to end up with d_splice_alias() failing with ERR_PTR(-ELOOP)
without having done so.  It takes a corrupted ntfs or case-insensitive
xfs image, but neither should end up with memory corruption...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-12-18 15:05:46 +01:00
Sebastian Andrzej Siewior
16ecd487af sched/rt: Don't try push tasks if there are none.
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.

tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.

I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.

Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.

There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.

Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2024-12-18 15:05:40 +01:00
Yaroslav Furman
9c41dd9383 rcu: Fix a performance regression.
Commit "rcu: Create RCU-specific workqueues with rescuers" switched RCU
to using local workqueses and removed power efficiency flag from them.

This caused a performance regression that can be observed in Geekbench 5
after enabling CONFIG_WQ_POWER_EFFICIENT_DEFAULT: score went down from
760/2500 to 620/2300 (single/multi core respectively).

Add WQ_POWER_EFFICIENT flag to avoid this regression.

Change-Id: I2c4f41faa55548f9e81a1c0cbe10703948062d89
2024-12-18 15:05:33 +01:00
Carlos Llamas
5380adeb80 BACKPORT: FROMLIST: binder: fix freeze UAF in binder_release_work()
When a binder reference is cleaned up, any freeze work queued in the
associated process should also be removed. Otherwise, the reference is
freed while its ref->freeze.work is still queued in proc->work leading
to a use-after-free issue as shown by the following KASAN report:

  ==================================================================
  BUG: KASAN: slab-use-after-free in binder_release_work+0x398/0x3d0
  Read of size 8 at addr ffff31600ee91488 by task kworker/5:1/211

  CPU: 5 UID: 0 PID: 211 Comm: kworker/5:1 Not tainted 6.11.0-rc7-00382-gfc6c92196396 #22
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events binder_deferred_func
  Call trace:
   binder_release_work+0x398/0x3d0
   binder_deferred_func+0xb60/0x109c
   process_one_work+0x51c/0xbd4
   worker_thread+0x608/0xee8

  Allocated by task 703:
   __kmalloc_cache_noprof+0x130/0x280
   binder_thread_write+0xdb4/0x42a0
   binder_ioctl+0x18f0/0x25ac
   __arm64_sys_ioctl+0x124/0x190
   invoke_syscall+0x6c/0x254

  Freed by task 211:
   kfree+0xc4/0x230
   binder_deferred_func+0xae8/0x109c
   process_one_work+0x51c/0xbd4
   worker_thread+0x608/0xee8
  ==================================================================

This commit fixes the issue by ensuring any queued freeze work is removed
when cleaning up a binder reference.

Fixes: d579b04a52a1 ("binder: frozen notification")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>

Bug: 366003708
Link: https://lore.kernel.org/all/20240924184401.76043-4-cmllamas@google.com/
Change-Id: Icc40e7dd6157981f4adbea7243e55be118552321
[cmllamas: drop BINDER_STAT_FREEZE as it's not supported here]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-12-18 15:03:54 +01:00
Carlos Llamas
1cf14664be BACKPORT: FROMGIT: binder: fix memleak of proc->delivered_freeze
If a freeze notification is cleared with BC_CLEAR_FREEZE_NOTIFICATION
before calling binder_freeze_notification_done(), then it is detached
from its reference (e.g. ref->freeze) but the work remains queued in
proc->delivered_freeze. This leads to a memory leak when the process
exits as any pending entries in proc->delivered_freeze are not freed:

  unreferenced object 0xffff38e8cfa36180 (size 64):
    comm "binder-util", pid 655, jiffies 4294936641
    hex dump (first 32 bytes):
      b8 e9 9e c8 e8 38 ff ff b8 e9 9e c8 e8 38 ff ff  .....8.......8..
      0b 00 00 00 00 00 00 00 3c 1f 4b 00 00 00 00 00  ........<.K.....
    backtrace (crc 95983b32):
      [<000000000d0582cf>] kmemleak_alloc+0x34/0x40
      [<000000009c99a513>] __kmalloc_cache_noprof+0x208/0x280
      [<00000000313b1704>] binder_thread_write+0xdec/0x439c
      [<000000000cbd33bb>] binder_ioctl+0x1b68/0x22cc
      [<000000002bbedeeb>] __arm64_sys_ioctl+0x124/0x190
      [<00000000b439adee>] invoke_syscall+0x6c/0x254
      [<00000000173558fc>] el0_svc_common.constprop.0+0xac/0x230
      [<0000000084f72311>] do_el0_svc+0x40/0x58
      [<000000008b872457>] el0_svc+0x38/0x78
      [<00000000ee778653>] el0t_64_sync_handler+0x120/0x12c
      [<00000000a8ec61bf>] el0t_64_sync+0x190/0x194

This patch fixes the leak by ensuring that any pending entries in
proc->delivered_freeze are freed during binder_deferred_release().

Fixes: d579b04a52a1 ("binder: frozen notification")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20240926233632.821189-8-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 366003708
(cherry picked from commit 1db76ec2b4b206ff943e292a0b55e68ff3443598
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
 char-misc-next)
Change-Id: Iafdec3421c521b4b591b94455deba7ee5102c8ca
[cmllamas: drop BINDER_STAT_FREEZE and use binder_proc_ext_entry()]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-12-18 15:03:35 +01:00
Carlos Llamas
4554485eb2 BACKPORT: FROMGIT: binder: add delivered_freeze to debugfs output
Add the pending proc->delivered_freeze work to the debugfs output. This
information was omitted in the original implementation of the freeze
notification and can be valuable for debugging issues.

Fixes: d579b04a52a1 ("binder: frozen notification")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240926233632.821189-9-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 366003708
(cherry picked from commit cb2aeb2ec25884133110ffe5a67ff3cf7dee5ceb
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
 char-misc-next)
Change-Id: Ifc9a22b52e38c35af661732486fa1f154adb34de
[cmllamas: fix KMI break with binder_proc_ext_entry()]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2024-12-18 15:03:31 +01:00
Sultan Alsawaf
73ad55b1e6 sched/rt: Change default SCHED_RR timeslice from 100 ms to 1 jiffy
For us, it's most helpful to have the round-robin timeslice as low as is
allowed by the scheduler to reduce latency. Since it's limited by the
scheduler tick rate, just set the default to 1 jiffy, which is the
lowest possible value.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: I6c9f6bb5bbadf363efb719d3e30b0b073654d688
2024-12-18 15:03:27 +01:00
Vincent Guittot
0f92567070 sched/fair: Make sure to try to detach at least one movable task
During load balance, we try at most env->loop_max time to move a task.
But it can happen that the loop_max LRU tasks (ie tail of
the cfs_tasks list) can't be moved to dst_cpu because of affinity.
In this case, loop in the list until we found at least one.

The maximum of detached tasks remained the same as before.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220825122726.20819-2-vincent.guittot@linaro.org
2024-12-18 15:03:21 +01:00
Miguel de Dios
8b98e08a04 sched: reduce softirq conflicts with RT
This is a forward port of pa/890483 with modifications from the original
patch due to changes in sched/softirq.c which applies the same logic.

We're finding audio glitches caused by audio-producing RT tasks
that are either interrupted to handle softirq's or that are
scheduled onto cpu's that are handling softirq's.
In a previous patch, we attempted to catch many cases of the
latter problem, but it's clear that we are still losing
significant numbers of races in some apps.

This patch attempts to address the following problem::
   It attempts to reduce the most common windows in which
   we lose the race between scheduling an RT task on a remote
   core and starting to handle softirq's on that core.
   We still lose some races, but we lose significantly fewer.
   (And we don't want to introduce any heavyweight forms
   of synchronization on these paths.)

Bug: 64912585
Bug: 136771796
Bug: 144961676
Change-Id: Ida89a903be0f1965552dd0e84e67ef1d3158c7d8
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
(cherry picked from commit 3c2569f21be6b7c457b389821b098d30bd03b84c)
(cherry picked from commit f9cf84966a315169f3f505993b9e15b6c53f2bbb)
(cherry picked from commit e6edaa29efee9e2f2451ab50030d8cde19ab227b)
(cherry picked from commit db52e23658eaf304ba2333f04d910313c0c21e05)
(cherry picked from commit aa2d4d07929d1c06c6b99ed8718394cf8e471d21)
(cherry picked from commit cefdb96edccdc3f788275d1bcef89c8dfeeb8b11)
(cherry picked from commit 5a45a0208069ac5704713b35a0858fcd5997f97a)
(cherry picked from commit 86ac137eb7399547c6cd20d60063d244091e920f)
2024-12-18 15:03:18 +01:00
Samuel Pascua
04ff6cbc91 drivers: soc: acpm: Prevent optimization of 'acpm_initdata'
More optimization issues when compiling with Clang. Panics happen when the device goes into standby with the following report.

<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM:: MIF down. cur_count: 5, acc_count: 5
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM:: MIF_UP history:
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x540000, time: 5:35:40, latency: 1955[usec]
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x400000, time: 5:35:40, latency: 1956[usec]
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x100000, time: 5:35:41, latency: 1954[usec]
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x400000, time: 5:35:41, latency: 1955[usec]
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] EXYNOS-PM: mifuser: 0x100000, time: 5:35:41, latency: 1955[usec]
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] Unable to handle kernel paging request at virtual address ffffff800b346f9c
<2>[ 1470.900859]  [0:  Binder:4157_2: 8735] sec_debug_set_extra_info_fault = KERN / 0xffffff800b346f9c
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735] Mem abort info:
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735]   Exception class = DABT (current EL), IL = 32 bits
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735]   SET = 0, FnV = 0
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735]   EA = 0, S1PTW = 0
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735] Data abort info:
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735]   ISV = 0, ISS = 0x00000061
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735]   CM = 0, WnR = 1
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff800a66a000
<1>[ 1470.900859]  [0:  Binder:4157_2: 8735] [ffffff800b346f9c] *pgd=000000097cdfe003, *pud=000000097cdfe003, *pmd=00000009740b7003, *pte=00e800000203f707
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] Internal error: Oops: 96000061 [#1] PREEMPT SMP
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] Modules linked in:
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] Process Binder:4157_2 (pid: 8735, stack limit = 0xffffff8039708000)
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] debug-snapshot: core register saved(CPU:0)
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] L2ECTLR_EL1: 0000000000000007
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] L2ECTLR_EL1 valid_bit(30) is NOT set (0x0)
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] CPUMERRSR: 0000000008000001, L2MERRSR: 0000000010200c00
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] CPUMERRSR valid_bit(31) is NOT set (0x0)
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] L2MERRSR valid_bit(31) is NOT set (0x0)
<0>[ 1470.900859]  [0:  Binder:4157_2: 8735] debug-snapshot: context saved(CPU:0)
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] debug-snapshot: item - log_kevents is disabled
<6>[ 1470.900859]  [0:  Binder:4157_2: 8735] TIF_FOREIGN_FPSTATE: 1, FP/SIMD depth 0, cpu: 0
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] CPU: 0 PID: 8735 Comm: Binder:4157_2 Not tainted 4.14.113 - Fresh Core-user #1
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] Hardware name: Samsung A50 LTN OPEN rev04 board based on Exynos9610 (DT)
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] task: ffffffc0466d6000 task.stack: ffffff8039708000
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] PC is at acpm_get_inform+0x90/0x100
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] LR is at acpm_get_inform+0x7c/0x100
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] pc : [<ffffff8008505cd4>] lr : [<ffffff8008505cc0>] pstate: 604001c5
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] sp : ffffff803970bac0
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x29: ffffff803970bac0 x28: ffffffc0466d6000
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x27: ffffff8008e44b64 x26: ffffff8008e44b3e
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x25: ffffff8009e5f210 x24: 0000000010624dd3
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x23: 0000000000000029 x22: 0000000000000018
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x21: ffffff8009e2c000 x20: ffffff8008ef6785
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x19: ffffff8008ef674c x18: 00000000000000a0
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x17: ffffff8009b3023c x16: 0000000000000001
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x15: ffffff8008c8a964 x14: 202c303030303031
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x13: 7830203a72657375 x12: 0000000000000000
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x11: 0000000000000000 x10: ffffffffffffffff
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x9 : ffffff800b346f00 x8 : ffffff800b346f00
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x7 : 203a79636e657461 x6 : ffffff80f615273c
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x5 : 000000000000221f x4 : 000000000000000c
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x3 : 000000000000000a x2 : 0000000000000000
<4>[ 1470.900859]  [0:  Binder:4157_2: 8735] x1 : 00000000000001c0 x0 : 0000000000000041

Similar solution as d855e6f. Make the structs volatile to prevent optimization.

Signed-off-by: John Vincent <git@tensevntysevn.cf>
Signed-off-by: Samuel Pascua <sgpascua@ngcp.ph>
2024-12-18 15:02:50 +01:00
xxmustafacooTR
5843cd8543 drivers: soc: samsung: acpm: disable lto 2024-12-18 15:02:45 +01:00
xxmustafacooTR
9ca91577b1 fvmap: optimize voltages 2024-12-18 15:02:40 +01:00
Redick Lin
70c79241d2 soc: samsung: acpm: extend the timeout for acpm ipc retry
Extend it from 15ms to 200ms

Bug: 172883429
Change-Id: I39e8e860dfeaa4d1d3b702f06dca51dd01bc8367
Signed-off-by: Redick Lin <redicklin@google.com>
2024-12-18 15:02:36 +01:00
Nahuel Gómez
a13ddad5af vboot_dlkm: drop duplicate sm5451 module
Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 15:02:19 +01:00
Nahuel Gómez
375f7e1318 ARM64: dts/s5e8825: GPU undervolt to 790mV
This is a base value only though, I'm not sure if it actually works.

Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>
2024-12-18 15:01:20 +01:00
Zhaoyang Huang
f11d790f3a psi: fix possible trigger missing in the window
When a new threshold breaching stall happens after a psi event was
generated and within the window duration, the new event is not
generated because the events are rate-limited to one per window. If
after that no new stall is recorded then the event will not be
generated even after rate-limiting duration has passed. This is
happening because with no new stall, window_update will not be called
even though threshold was previously breached. To fix this, record
threshold breaching occurrence and generate the event once window
duration is passed.

Suggested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/r/1643093818-19835-1-git-send-email-huangzhaoyang@gmail.com
(cherry picked from commit e6df4ead85d9da1b07dd40bd4c6d2182f3e210c4)
(cherry picked from commit 007db79713251fd67cc5ce7142942772f6074aca)
(cherry picked from commit 129c4324ce9ea8be9c528629bca72e1e1f80d3fb)
2024-12-18 15:01:14 +01:00