kernel_samsung_a53x/net
Zijian Zhang 6a90728288 tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg
[ Upstream commit ca70b8baf2bd125b2a4d96e76db79375c07d7ff2 ]

The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.

Potential problems with this strategy are as follows:

- If the actual sent bytes are smaller than tosend, we need to charge some
  bytes back, as in line 487, which is okay but seems not clean.

- When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
  miss uncharging (msg->sg.size - apply_bytes) bytes.

[...]
415 tosend = msg->sg.size;
416 if (psock->apply_bytes && psock->apply_bytes < tosend)
417   tosend = psock->apply_bytes;
[...]
443 sk_msg_return(sk, msg, tosend);
444 release_sock(sk);
446 origsize = msg->sg.size;
447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
448                             msg, tosend, flags);
449 sent = origsize - msg->sg.size;
[...]
454 lock_sock(sk);
455 if (unlikely(ret < 0)) {
456   int free = sk_msg_free_nocharge(sk, msg);
458   if (!cork)
459     *copied -= free;
460 }
[...]
487 if (eval == __SK_REDIRECT)
488   sk_mem_charge(sk, tosend - sent);
[...]

When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
the following warning will be reported:

------------[ cut here ]------------
WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
Modules linked in:
CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x190/0x1a0
RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
FS:  0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x89/0x130
? inet_sock_destruct+0x190/0x1a0
? report_bug+0xfc/0x1e0
? handle_bug+0x5c/0xa0
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? inet_sock_destruct+0x190/0x1a0
__sk_destruct+0x25/0x220
sk_psock_destroy+0x2b2/0x310
process_scheduled_works+0xa3/0x3e0
worker_thread+0x117/0x240
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---

In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
bytes are finalized, and uncharge this value. When (ret < 0), we shall
invoke sk_msg_free.

Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
warning will be reported in selftest.

[...]
468 case __SK_DROP:
469 default:
470 sk_msg_free_partial(sk, msg, tosend);
471 sk_msg_apply_bytes(psock, tosend);
472 *copied -= (tosend + delta);
473 return -EACCES;
[...]

So instead of sk_msg_free_partial we can do sk_msg_free here.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 8ec95b94716a ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-3-zijianzhang@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-17 13:24:27 +01:00
..
6lowpan
9p 9p/xen: fix release of IRQ 2024-12-17 13:24:22 +01:00
802
8021q Revert "gro: remove rcu_read_lock/rcu_read_unlock from gro_receive handlers" 2024-11-24 00:23:41 +01:00
appletalk
atm
ax25 Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
batman-adv batman-adv: fix random jitter calculation 2024-11-19 17:55:48 +01:00
bluetooth Bluetooth: Fix type of len in rfcomm_sock_getsockopt{,_old}() 2024-12-17 13:24:18 +01:00
bpf
bpfilter
bridge net: bridge: xmit: make sure we have at least eth header len bytes 2024-11-30 02:33:25 +01:00
caif
can can: j1939: j1939_session_new(): fix skb reference counting 2024-12-17 13:24:26 +01:00
ceph libceph: fix race between delayed_work() and ceph_monc_stop() 2024-11-19 14:19:45 +01:00
core bpf, sockmap: Fix sk_msg_reset_curr 2024-12-17 13:24:06 +01:00
dcb
dccp dccp: Fix memory leak in dccp_feat_change_recv 2024-12-17 13:24:26 +01:00
decnet
dns_resolver
dsa
ethernet Revert "gro: remove rcu_read_lock/rcu_read_unlock from gro_receive handlers" 2024-11-24 00:23:41 +01:00
ethtool ethtool: Fix wrong mod state in case of verbose and no_mask bitset 2024-12-17 13:24:27 +01:00
hsr net: hsr: avoid potential out-of-bound access in fill_frame_info() 2024-12-17 13:24:26 +01:00
ieee802154
ife
ipv4 tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg 2024-12-17 13:24:27 +01:00
ipv6 net/ipv6: release expired exception dst cached in socket 2024-12-17 13:24:26 +01:00
iucv Revert "net/iucv: fix use after free in iucv_sock_close()" 2024-11-24 00:23:55 +01:00
kcm kcm: Serialise kcm_sendmsg() for the same socket. 2024-11-23 23:20:48 +01:00
key
l2tp genetlink: hold RCU in genlmsg_mcast() 2024-11-23 23:21:59 +01:00
l3mdev net: Add l3mdev index to flow struct and avoid oif reset for port devices 2024-11-23 23:21:52 +01:00
lapb
llc
mac80211 mac80211: fix user-power when emulating chanctx 2024-12-17 13:23:57 +01:00
mac802154 Revert "net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD()" 2024-11-19 14:52:14 +01:00
mpls
mptcp Revert "mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size" 2024-11-24 00:23:53 +01:00
ncm
ncsi net/ncsi: Fix the multi thread manner of NCSI driver 2024-11-19 14:19:00 +01:00
netfilter netfilter: nft_set_hash: skip duplicated elements pending gc run 2024-12-17 13:24:26 +01:00
netlabel
netlink netlink: terminate outstanding dump on socket close 2024-12-17 13:20:50 +01:00
netrom Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
nfc nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() 2024-11-19 12:27:10 +01:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-11-19 11:32:42 +01:00
openvswitch openvswitch: Set the skbuff pkt_type for proper pmtud support. 2024-11-19 12:27:09 +01:00
packet af_packet: Handle outgoing VLAN packets without hardware offloading 2024-11-23 23:20:12 +01:00
phonet Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
psample
qrtr Revert "net: qrtr: Update packets cloning when broadcasting" 2024-11-24 00:23:18 +01:00
rds Revert "net:rds: Fix possible deadlock in rds_message_put" 2024-11-24 00:23:49 +01:00
rfkill net: rfkill: gpio: Add check for clk_enable() 2024-12-17 13:24:07 +01:00
rose Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
rxrpc
sched net: sched: fix erspan_opt settings in cls_flower 2024-12-17 13:24:26 +01:00
sctp Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
skb_tracer
smc Revert "net/smc: Allow SMC-D 1MB DMB allocations" 2024-11-24 00:23:56 +01:00
strparser
sunrpc sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport 2024-12-17 13:24:23 +01:00
switchdev
tipc tipc: Fix use-after-free of kernel socket in cleanup_bearer(). 2024-12-17 13:24:26 +01:00
tls tls: fix missing memory barrier in tls_init 2024-11-19 12:27:09 +01:00
unix Revert "af_unix: Remove put_pid()/put_cred() in copy_peercred()." 2024-11-24 00:23:43 +01:00
vmw_vsock vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans 2024-11-30 02:33:27 +01:00
wimax
wireless Revert "wifi: nl80211: don't give key data to userspace" 2024-11-24 00:23:55 +01:00
x25 Revert "Make more sysctl constants read-only" 2024-12-03 19:56:17 +01:00
xdp xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING 2024-11-19 11:32:19 +01:00
xfrm xfrm: store and rely on direction to construct offload flags 2024-12-17 13:24:04 +01:00
compat.c
devres.c
Kconfig
Makefile
socket.c
sysctl_net.c
TEST_MAPPING