[ Upstream commit ad4a3ca6a8e886f6491910a3ae5d53595e40597d ]
There are code paths from which the function is called without holding
the RCU read lock, resulting in a suspicious RCU usage warning [1].
Fix by using l3mdev_master_upper_ifindex_by_index() which will acquire
the RCU read lock before calling
l3mdev_master_upper_ifindex_by_index_rcu().
[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gac8f72681cf2 #141 Not tainted
-----------------------------
net/core/dev.c:876 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/361:
#0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
stack backtrace:
CPU: 3 UID: 0 PID: 361 Comm: ip Not tainted 6.12.0-rc3-custom-gac8f72681cf2 #141
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
lockdep_rcu_suspicious.cold+0x4f/0xd6
dev_get_by_index_rcu+0x1d3/0x210
l3mdev_master_upper_ifindex_by_index_rcu+0x2b/0xf0
ip_tunnel_bind_dev+0x72f/0xa00
ip_tunnel_newlink+0x368/0x7a0
ipgre_newlink+0x14c/0x170
__rtnl_newlink+0x1173/0x19c0
rtnl_newlink+0x6c/0xa0
rtnetlink_rcv_msg+0x3cc/0xf60
netlink_rcv_skb+0x171/0x450
netlink_unicast+0x539/0x7f0
netlink_sendmsg+0x8c1/0xd80
____sys_sendmsg+0x8f9/0xc20
___sys_sendmsg+0x197/0x1e0
__sys_sendmsg+0x122/0x1f0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: db53cd3d88dc ("net: Handle l3mdev in ip_tunnel_init_flow")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241022063822.462057-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d352ccf1e9935b5222ca84e8baeb07a0c8f94b9 ]
Currently in case of target hardware restart, we just reconfig and
re-enable the security keys and enable the network queues to start
data traffic back from where it was interrupted.
Many ath10k wifi chipsets have sequence numbers for the data
packets assigned by firmware and the mac sequence number will
restart from zero after target hardware restart leading to mismatch
in the sequence number expected by the remote peer vs the sequence
number of the frame sent by the target firmware.
This mismatch in sequence number will cause out-of-order packets
on the remote peer and all the frames sent by the device are dropped
until we reach the sequence number which was sent before we restarted
the target hardware
In order to fix this, we trigger a sta disconnect, in case of target
hw restart. After this there will be a fresh connection and thereby
avoiding the dropping of frames by remote peer.
The right fix would be to pull the entire data path into the host
which is not feasible or would need lots of complex changes and
will still be inefficient.
Tested on ath10k using WCN3990, QCA6174
Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org>
Link: https://lore.kernel.org/r/20220308115325.5246-2-youghand@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 07a6e3b78a65 ("wifi: iwlwifi: mvm: Fix response handling in iwl_mvm_send_recovery_cmd()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8469721034300bbb6dec5b4bf32492c95e16a0c ]
The series in the "fixes" tag added the ability to consider L4 attributes
in routing rules.
The dst lookup on the outer packet of encapsulated traffic in the xfrm
code was not adapted to this change, thus routing behavior that relies
on L4 information is not respected.
Pass the ip protocol information when performing dst lookups.
Fixes: a25724b05af0 ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit db53cd3d88dc328dea2e968c9c8d3b4294a8a674 upstream.
Ido reported that the commit referenced in the Fixes tag broke
a gre use case with dummy devices. Add a check to ip_tunnel_init_flow
to see if the oif is an l3mdev port and if so set the oif to 0 to
avoid the oif comparison in fib_lookup_good_nhc.
Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 upstream.
The kernel may crash when deleting a genetlink family if there are still
listeners for that family:
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0
Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.
Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: stable@vger.kernel.org
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 40867d74c374b235e14d839f3a77f26684feefe5 ]
The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.
In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:
1. l3mdev_fib_rule_match is only called when walking fib rules and
always after l3mdev_update_flow. That allows an optimization to bail
early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
only that index needs to be checked for the FIB table id.
2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
(e.g., VRF) device. By resetting flowi_oif only for this case the
FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
removing several checks in the datapath. The flowi_iif path can be
simplified to only be called if the it is not loopback (loopback can
not be assigned to an L3 domain) and the l3mdev index is not already
set.
3. Avoid another device lookup in the output path when the fib lookup
returns a reject failure.
Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: Warning: source address might be selected on device other than: eth1
PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.
--- 172.16.3.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
where the test now directly fails:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: connect: No route to host
Signed-off-by: David Ahern <dsahern@kernel.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 05ef7055debc ("netfilter: fib: check correct rtable in vrf setups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6ae073f5903f6c6439d0ac855836a4da5c0a701 ]
When innerprotoinherit is set, the tunneled packets do not have an inner
Ethernet header.
Change 'maclen' to not always assume the header length is ETH_HLEN, as
there might not be a MAC header.
This resolves issues with drivers (e.g. mlx5, in
mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset
to be correct, and use it for TX offloads.
Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: c471236b2359 ("bareudp: Pull inner IP header on xmit.")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 29b359cf6d95fd60730533f7f10464e95bd17c73 upstream.
The generation mask can be updated while netlink dump is in progress.
The pipapo set backend walk iterator cannot rely on it to infer what
view of the datastructure is to be used. Add notation to specify if user
wants to read/update the set.
Based on patch from Florian Westphal.
Fixes: 2b84e215f874 ("netfilter: nft_set_pipapo: .walk does not deal with generations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0870b0d8b393dde53106678a1e2cec9dfa52f9b7 ]
Typically, busy-polling durations are below 100 usec.
When/if the busy-poller thread migrates to another cpu,
local_clock() can be off by +/-2msec or more for small
values of HZ, depending on the platform.
Use ktimer_get_ns() to ensure deterministic behavior,
which is the whole point of busy-polling.
Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()")
Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fa23e0d4b756d25829e124d6b670a4c6bbd4bf7e upstream.
Sven Auhagen reports transaction failures with following error:
./main.nft:13:1-26: Error: Could not process rule: Cannot allocate memory
percpu: allocation failed, size=16 align=8 atomic=1, atomic alloc failed, no space left
This points to failing pcpu allocation with GFP_ATOMIC flag.
However, transactions happen from user context and are allowed to sleep.
One case where we can call into percpu allocator with GFP_ATOMIC is
nft_counter expression.
Normally this happens from control plane, so this could use GFP_KERNEL
instead. But one use case, element insertion from packet path,
needs to use GFP_ATOMIC allocations (nft_dynset expression).
At this time, .clone callbacks always use GFP_ATOMIC for this reason.
Add gfp_t argument to the .clone function and pass GFP_KERNEL or
GFP_ATOMIC flag depending on context, this allows all clone memory
allocations to sleep for the normal (transaction) case.
Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7395dfacfff65e9938ac0889dafa1ab01e987d15 upstream
Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.
Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.
.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.
[ NB: rbtree GC updates has been excluded because GC is asynchronous. ]
Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3d3b2f57d4447e6e9f4096ad01d0e4129f7bc7e9 ]
Struct sctp_ep_common is included in both asoc and ep, but hlist_node
and hashent are only needed by ep after asoc_hashtable was dropped by
Commit b5eff7128366 ("sctp: drop the old assoc hashtable of sctp").
So it is better to move hlist_node and hashent from sctp_ep_common to
sctp_endpoint, and it saves some space for each asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 9ab0faa7f9ff ("sctp: Fix null-ptr-deref in reuseport_add_sock().")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ]
register store validation for NFT_DATA_VALUE is conditional, however,
the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This
only requires a new helper function to infer the register type from the
set datatype so this conditional check can be removed. Otherwise,
pointer to chain object can be leaked through the registers.
Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a48ef70b93b8c7ed5190adfca18849e76387b80 ]
The functions that register an XDP memory model take a struct xdp_rxq as
parameter, but the RXQ is not actually used for anything other than pulling
out the struct xdp_mem_info that it embeds. So refactor the register
functions and export variants that just take a pointer to the xdp_mem_info.
This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
page_pool instance that is not connected to any network device.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com
Stable-dep-of: 7e9f79428372 ("xdp: Remove WARN() from __xdp_reg_mem_model()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ]
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.
In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:
TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout
Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e upstream.
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[Lee: Stable backport]
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 776d451648443f9884be4a1b4e38e8faf1c621f9 upstream.
Bail out on using the tunnel dst template from other than netdev family.
Add the infrastructure to check for the family in objects.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[KN: Backport patch according to v5.10.x source]
Signed-off-by: Kuntal Nayak <kuntal.nayak@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0f495f7617229772403e683033abc473f0f0553c ]
There are currently four copies of reuseport_lookup: one each for
(TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of
those functions as well. This is already the case for sk_lookup
helpers (inet,inet6,udp4,udp6)_lookup_run_bpf.
There are two differences between the reuseport_lookup helpers:
1. They call different hash functions depending on protocol
2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED
Move the check for sk_state into the caller and use the INDIRECT_CALL
infrastructure to cut down the helpers to one per IP version.
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-4-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stable-dep-of: 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce796e60b3b196b61fcc565df195443cbb846ef0 ]
Rename the existing reuseport helpers for IPv4 and IPv6 so that they
can be invoked in the follow up commit. Export them so that building
DCCP and IPv6 as a module works.
No change in functionality.
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-3-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stable-dep-of: 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58fbfecab965014b6e3cc956a76b4a96265a1add ]
The software GRO path for esp transport mode uses skb_mac_header_rebuild
prior to re-injecting the packet via the xfrm_napi_dev. This only
copies skb->mac_len bytes of header which may not be sufficient if the
packet contains 802.1Q tags or other VLAN tags. Worse copying only the
initial header will leave a packet marked as being VLAN tagged but
without the corresponding tag leading to mangling when it is later
untagged.
The VLAN tags are important when receiving the decrypted esp transport
mode packet after GRO processing to ensure it is received on the correct
interface.
Therefore record the full mac header length in xfrm*_transport_input for
later use in corresponding xfrm*_transport_finish to copy the entire mac
header when rebuilding the mac header for GRO. The skb->data pointer is
left pointing skb->mac_header bytes after the start of the mac header as
is expected by the network stack and network and transport header
offsets reset to this location.
Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]
When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).
Let's convert unix_sk(sk)->inflight to the normal unsigned long.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51eda36d33e43201e7a4fd35232e069b2c850b01 ]
syzbot reported sco_sock_setsockopt() is copying data without
checking user input length.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90
net/bluetooth/sco.c:893
Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578
Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
Fixes: b96e9c671b05 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket")
Fixes: 00398e1d5183 ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections")
Fixes: f6873401a608 ("Bluetooth: Allow setting of codec for HFP offload use case")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8a6213d70accb403b82924a1c229e733433a5ef ]
syzbot is able to trigger an uninit-value in geneve_xmit() [1]
Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield())
uses skb_protocol(skb, true), pskb_inet_may_pull() is only using
skb->protocol.
If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol,
pskb_inet_may_pull() does nothing at all.
If a vlan tag was provided by the caller (af_packet in the syzbot case),
the network header might not point to the correct location, and skb
linear part could be smaller than expected.
Add skb_vlan_inet_prepare() to perform a complete mac validation.
Use this in geneve for the moment, I suspect we need to adopt this
more broadly.
v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest
- Only call __vlan_get_protocol() for vlan types.
Link: https://lore.kernel.org/netdev/20240404100035.3270a7d5@kernel.org/
v2,v3 - Addressed Sabrina comments on v1 and v2
Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/
[1]
BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline]
BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
geneve_xmit_skb drivers/net/geneve.c:910 [inline]
geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
__dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
__sys_sendto+0x685/0x830 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
__sys_sendto+0x685/0x830 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb")
Reported-by: syzbot+9ee20ec1de7b3168db09@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000d19c3a06152f9ee4@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada ]
We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.
Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.
When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.
inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.
For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.
For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.
This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.
Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.
It is very possible we can revert in the future commit
3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8a1e58345cf40b7b272e08ac7b32328b2543e40 ]
mac802154_llsec_key_del() can free resources of a key directly without
following the RCU rules for waiting before the end of a grace period. This
may lead to use-after-free in case llsec_lookup_key() is traversing the
list of keys in parallel with a key deletion:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 4 PID: 16000 at lib/refcount.c:25 refcount_warn_saturate+0x162/0x2a0
Modules linked in:
CPU: 4 PID: 16000 Comm: wpan-ping Not tainted 6.7.0 #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0x162/0x2a0
Call Trace:
<TASK>
llsec_lookup_key.isra.0+0x890/0x9e0
mac802154_llsec_encrypt+0x30c/0x9c0
ieee802154_subif_start_xmit+0x24/0x1e0
dev_hard_start_xmit+0x13e/0x690
sch_direct_xmit+0x2ae/0xbc0
__dev_queue_xmit+0x11dd/0x3c20
dgram_sendmsg+0x90b/0xd60
__sys_sendto+0x466/0x4c0
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x45/0xf0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Also, ieee802154_llsec_key_entry structures are not freed by
mac802154_llsec_key_del():
unreferenced object 0xffff8880613b6980 (size 64):
comm "iwpan", pid 2176, jiffies 4294761134 (age 60.475s)
hex dump (first 32 bytes):
78 0d 8f 18 80 88 ff ff 22 01 00 00 00 00 ad de x.......".......
00 00 00 00 00 00 00 00 03 00 cd ab 00 00 00 00 ................
backtrace:
[<ffffffff81dcfa62>] __kmem_cache_alloc_node+0x1e2/0x2d0
[<ffffffff81c43865>] kmalloc_trace+0x25/0xc0
[<ffffffff88968b09>] mac802154_llsec_key_add+0xac9/0xcf0
[<ffffffff8896e41a>] ieee802154_add_llsec_key+0x5a/0x80
[<ffffffff8892adc6>] nl802154_add_llsec_key+0x426/0x5b0
[<ffffffff86ff293e>] genl_family_rcv_msg_doit+0x1fe/0x2f0
[<ffffffff86ff46d1>] genl_rcv_msg+0x531/0x7d0
[<ffffffff86fee7a9>] netlink_rcv_skb+0x169/0x440
[<ffffffff86ff1d88>] genl_rcv+0x28/0x40
[<ffffffff86fec15c>] netlink_unicast+0x53c/0x820
[<ffffffff86fecd8b>] netlink_sendmsg+0x93b/0xe60
[<ffffffff86b91b35>] ____sys_sendmsg+0xac5/0xca0
[<ffffffff86b9c3dd>] ___sys_sendmsg+0x11d/0x1c0
[<ffffffff86b9c65a>] __sys_sendmsg+0xfa/0x1d0
[<ffffffff88eadbf5>] do_syscall_64+0x45/0xf0
[<ffffffff890000ea>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
Handle the proper resource release in the RCU callback function
mac802154_llsec_key_del_rcu().
Note that if llsec_lookup_key() finds a key, it gets a refcount via
llsec_key_get() and locally copies key id from key_entry (which is a
list element). So it's safe to call llsec_key_put() and free the list
entry after the RCU grace period elapses.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 5d637d5aabd8 ("mac802154: add llsec structures and mutators")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Alexander Aring <aahringo@redhat.com>
Message-ID: <20240228163840.6667-1-pchelkin@ispras.ru>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit b8adb69a7d29c2d33eb327bca66476fb6066516b upstream.
Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.
Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c301f0981fdd3fd1ffac6836b423c4d7a8e0eb63 upstream.
The problem is in nft_byteorder_eval() where we are iterating through a
loop and writing to dst[0], dst[1], dst[2] and so on... On each
iteration we are writing 8 bytes. But dst[] is an array of u32 so each
element only has space for 4 bytes. That means that every iteration
overwrites part of the previous element.
I spotted this bug while reviewing commit caf3ef7468f7 ("netfilter:
nf_tables: prevent OOB access in nft_byteorder_eval") which is a related
issue. I think that the reason we have not detected this bug in testing
is that most of time we only write one element.
Fixes: ce1e7989d989 ("netfilter: nft_byteorder: provide 64bit le/be conversion")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[Ajay: Modified to apply on v5.10.y]
Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e3f9bed9bee261e3347131764e42aeedf1ffea61 ]
syzbot reported an uninit-value bug below. [0]
llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
(0x0011), and syzbot abused the latter to trigger the bug.
write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
llc_conn_handler() initialises local variables {saddr,daddr}.mac
based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
them to __llc_lookup().
However, the initialisation is done only when skb->protocol is
htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
__llc_lookup_listener() will read garbage.
The missing initialisation existed prior to commit 211ed865108e
("net: delete all instances of special processing for token ring").
It removed the part to kick out the token ring stuff but forgot to
close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
Let's remove llc_tr_packet_type and complete the deprecation.
[0]:
BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
__llc_lookup_established+0xe9d/0xf90
__llc_lookup net/llc/llc_conn.c:611 [inline]
llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
__netif_receive_skb_one_core net/core/dev.c:5527 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
netif_receive_skb_internal net/core/dev.c:5727 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5786
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x1490 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Local variable daddr created at:
llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Fixes: 211ed865108e ("net: delete all instances of special processing for token ring")
Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d03376c185926098cb4d668d6458801eb785c0a5 ]
This reverts 19f8def031bfa50c579149b200bfeeb919727b27
"Bluetooth: Fix auth_complete_evt for legacy units" which seems to be
working around a bug on a broken controller rather then any limitation
imposed by the Bluetooth spec, in fact if there ws not possible to
re-auth the command shall fail not succeed.
Fixes: 19f8def031bf ("Bluetooth: Fix auth_complete_evt for legacy units")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59b047bc98084f8af2c41483e4d68a5adf2fa7f7 ]
If two Bluetooth devices both support BR/EDR and BLE, and also
support Secure Connections, then they only need to pair once.
The LTK generated during the LE pairing process may be converted
into a BR/EDR link key for BR/EDR transport, and conversely, a
link key generated during the BR/EDR SSP pairing process can be
converted into an LTK for LE transport. Hence, the link type of
the link key and LTK is not fixed, they can be either an LE LINK
or an ACL LINK.
Currently, in the mgmt_new_irk/ltk/crsk/link_key functions, the
link type is fixed, which could lead to incorrect address types
being reported to the application layer. Therefore, it is necessary
to add link_type/addr_type to the smp_irk/ltk/crsk and link_key,
to ensure the generation of the correct address type.
SMP over BREDR:
Before Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
BR/EDR SMP: Identity Address Information (0x09) len 7
Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 00:00:00:00:00:00 (Non-Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
After Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
BR/EDR SMP: Identity Address Information (0x09) len 7
Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 00:00:00:00:00:00 (Non-Resolvable)
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
SMP over LE:
Before Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 5F:5C:07:37:47:D5 (Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated Combination key from P-256 (0x08)
After Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 5E:03:1C:00:38:21 (Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
Store hint: Yes (0x01)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated Combination key from P-256 (0x08)
Cc: stable@vger.kernel.org
Signed-off-by: Xiao Yao <xiaoyao@rock-chips.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fad646e16d3cafd67d3cfff8e66f77401190957e ]
This patch replaces some non-inclusive terms based on the appropriate
language mapping table compiled by the Bluetooth SIG:
https://specificationrefs.bluetooth.com/language-mapping/Appropriate_Language_Mapping_Table.pdf
Specifically, these terms are replaced:
master -> initiator
slave -> responder
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Stable-dep-of: 59b047bc9808 ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd4a816752bab609dd6d65ae021387beb9e2ddbd ]
Lorenzo points out that we effectively clear all unknown
flags from PIO when copying them to userspace in the netlink
RTM_NEWPREFIX notification.
We could fix this one at a time as new flags are defined,
or in one fell swoop - I choose the latter.
We could either define 6 new reserved flags (reserved1..6) and handle
them individually (and rename them as new flags are defined), or we
could simply copy the entire unmodified byte over - I choose the latter.
This unfortunately requires some anonymous union/struct magic,
so we add a static assert on the struct size for a little extra safety.
Cc: David Ahern <dsahern@kernel.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e03781879a0d524ce3126678d50a80484a513c4b upstream.
The "NET_DM" generic netlink family notifies drop locations over the
"events" multicast group. This is problematic since by default generic
netlink allows non-root users to listen to these notifications.
Fix by adding a new field to the generic netlink multicast group
structure that when set prevents non-root users or root without the
'CAP_SYS_ADMIN' capability (in the user namespace owning the network
namespace) from joining the group. Set this field for the "events"
group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
nature of the information that is shared over this group.
Note that the capability check in this case will always be performed
against the initial user namespace since the family is not netns aware
and only operates in the initial network namespace.
A new field is added to the structure rather than using the "flags"
field because the existing field uses uAPI flags and it is inappropriate
to add a new uAPI flag for an internal kernel check. In net-next we can
rework the "flags" field to use internal flags and fold the new field
into it. But for now, in order to reduce the amount of changes, add a
new field.
Since the information can only be consumed by root, mark the control
plane operations that start and stop the tracing as root-only using the
'GENL_ADMIN_PERM' flag.
Tested using [1].
Before:
# capsh -- -c ./dm_repo
# capsh --drop=cap_sys_admin -- -c ./dm_repo
After:
# capsh -- -c ./dm_repo
# capsh --drop=cap_sys_admin -- -c ./dm_repo
Failed to join "events" multicast group
[1]
$ cat dm.c
#include <stdio.h>
#include <netlink/genl/ctrl.h>
#include <netlink/genl/genl.h>
#include <netlink/socket.h>
int main(int argc, char **argv)
{
struct nl_sock *sk;
int grp, err;
sk = nl_socket_alloc();
if (!sk) {
fprintf(stderr, "Failed to allocate socket\n");
return -1;
}
err = genl_connect(sk);
if (err) {
fprintf(stderr, "Failed to connect socket\n");
return err;
}
grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events");
if (grp < 0) {
fprintf(stderr,
"Failed to resolve \"events\" multicast group\n");
return grp;
}
err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
if (err) {
fprintf(stderr, "Failed to join \"events\" multicast group\n");
return err;
}
return 0;
}
$ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c
Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a partial backport of upstream commit 4d54cc32112d ("mptcp:
avoid lock_fast usage in accept path"). It is only a partial backport
because the patch in the link below was erroneously squash-merged into
upstream commit 4d54cc32112d ("mptcp: avoid lock_fast usage in accept
path"). Below is the original patch description from Florian Westphal:
"
genetlink sets NL_CFG_F_NONROOT_RECV for its netlink socket so anyone can
subscribe to multicast messages.
rtnetlink doesn't allow this unconditionally, rtnetlink_bind() restricts
bind requests to CAP_NET_ADMIN for a few groups.
This allows to set GENL_UNS_ADMIN_PERM flag on genl mcast groups to
mandate CAP_NET_ADMIN.
This will be used by the upcoming mptcp netlink event facility which
exposes the token (mptcp connection identifier) to userspace.
"
Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 179d9ba5559a756f4322583388b3213fe4e391b0 upstream.
The dormant flag need to be updated from the preparation phase,
otherwise, two consecutive requests to dorm a table in the same batch
might try to remove the same hooks twice, resulting in the following
warning:
hook not found, pf 3 num 0
WARNING: CPU: 0 PID: 334 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
Modules linked in:
CPU: 0 PID: 334 Comm: kworker/u4:5 Not tainted 5.12.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:__nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
This patch is a partial revert of 0ce7cf4127f1 ("netfilter: nftables:
update table flags from the commit phase") to restore the previous
behaviour.
However, there is still another problem: A batch containing a series of
dorm-wakeup-dorm table and vice-versa also trigger the warning above
since hook unregistration happens from the preparation phase, while hook
registration occurs from the commit phase.
To fix this problem, this patch adds two internal flags to annotate the
original dormant flag status which are __NFT_TABLE_F_WAS_DORMANT and
__NFT_TABLE_F_WAS_AWAKEN, to restore it from the abort path.
The __NFT_TABLE_F_UPDATE bitmask allows to handle the dormant flag update
with one single transaction.
Reported-by: syzbot+7ad5cd1615f2d89c6e7e@syzkaller.appspotmail.com
Fixes: 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ce7cf4127f14078ca598ba9700d813178a59409 upstream.
Do not update table flags from the preparation phase. Store the flags
update into the transaction, then update the flags from the commit
phase.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eb44ad4e635132754bfbcb18103f1dcb7058aedd ]
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>