kernel_samsung_a53x/tools
Amit Cohen 401c14f9b3 mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
[ Upstream commit 6d6eeabcfaba2fcadf5443b575789ea606f9de83 ]

Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table->base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 <TASK>
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa105 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-18 12:12:52 +01:00
..
accounting Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
arch parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes 2024-11-18 12:11:10 +01:00
bootconfig Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
bpf Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
build Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
cgroup Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
crypto Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
debugging Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
edid Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
firewire Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
firmware Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
gpio Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
hv Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
iio tools: iio: iio_generic_buffer ensure alignment 2024-11-18 11:43:05 +01:00
include bpf: Add crosstask check to __bpf_get_stack 2024-11-18 12:12:28 +01:00
io_uring Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
kvm/kvm_stat Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
laptop Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
leds Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
lib libapi: Add missing linux/types.h header to get the __u64 type on io.h 2024-11-18 12:12:49 +01:00
memory-model Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
objtool objtool/x86: add missing embedded_insn check 2024-11-18 10:58:45 +01:00
pci Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
pcmcia Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
perf perf env: Avoid recursively taking env->bpf_progs.lock 2024-11-18 12:12:50 +01:00
power tools/power/turbostat: Fix a knl bug 2024-11-18 11:43:20 +01:00
scripts Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
spi Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
testing mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure 2024-11-18 12:12:52 +01:00
thermal/tmon Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
time Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
usb Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
virtio Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
vm Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
wmi Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00
Makefile Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00