
commit 6715df8d5d24655b9fd368e904028112b54c7de1 upstream. This commits updates the following functions to allow reads from uninitialized stack locations when env->allow_uninit_stack option is enabled: - check_stack_read_fixed_off() - check_stack_range_initialized(), called from: - check_stack_read_var_off() - check_helper_mem_access() Such change allows to relax logic in stacksafe() to treat STACK_MISC and STACK_INVALID in a same way and make the following stack slot configurations equivalent: | Cached state | Current state | | stack slot | stack slot | |------------------+------------------| | STACK_INVALID or | STACK_INVALID or | | STACK_MISC | STACK_SPILL or | | | STACK_MISC or | | | STACK_ZERO or | | | STACK_DYNPTR | This leads to significant verification speed gains (see below). The idea was suggested by Andrii Nakryiko [1] and initial patch was created by Alexei Starovoitov [2]. Currently the env->allow_uninit_stack is allowed for programs loaded by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities. A number of test cases from verifier/*.c were expecting uninitialized stack access to be an error. These test cases were updated to execute in unprivileged mode (thus preserving the tests). The test progs/test_global_func10.c expected "invalid indirect read from stack" error message because of the access to uninitialized memory region. This error is no longer possible in privileged mode. The test is updated to provoke an error "invalid indirect access to stack" because of access to invalid stack address (such error is not verified by progs/test_global_func*.c series of tests). The following tests had to be removed because these can't be made unprivileged: - verifier/sock.c: - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value" BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode. - verifier/var_off.c: - "indirect variable-offset stack access, max_off+size > max_initialized" - "indirect variable-offset stack access, uninitialized" These tests verify that access to uninitialized stack values is detected when stack offset is not a constant. However, variable stack access is prohibited in unprivileged mode, thus these tests are no longer valid. * * * Here is veristat log comparing this patch with current master on a set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg and cilium BPF binaries (see [3]): $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log File Program States (A) States (B) States (DIFF) -------------------------- -------------------------- ---------- ---------- ---------------- bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%) bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%) bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%) bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%) strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%) xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%) xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%) -------------------------- -------------------------- ---------- ---------- ---------------- Note: I limited selection by states_pct<-30%. Inspection of differences in pyperf600_bpf_loop behavior shows that the following patch for the test removes almost all differences: - a/tools/testing/selftests/bpf/progs/pyperf.h + b/tools/testing/selftests/bpf/progs/pyperf.h @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx) } if (event->pthread_match || !pidData->use_tls) { - void* frame_ptr; - FrameData frame; + void* frame_ptr = 0; + FrameData frame = {}; Symbol sym = {}; int cur_cpu = bpf_get_smp_processor_id(); W/o this patch the difference comes from the following pattern (for different variables): static bool get_frame_data(... FrameData *frame ...) { ... bpf_probe_read_user(&frame->f_code, ...); if (!frame->f_code) return false; ... bpf_probe_read_user(&frame->co_name, ...); if (frame->co_name) ...; } int __on_event(struct bpf_raw_tracepoint_args *ctx) { FrameData frame; ... get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback ... } SEC("raw_tracepoint/kfree_skb") int on_event(struct bpf_raw_tracepoint_args* ctx) { ... ret |= __on_event(ctx); ret |= __on_event(ctx); ... } With regards to value `frame->co_name` the following is important: - Because of the conditional `if (!frame->f_code)` each call to __on_event() produces two states, one with `frame->co_name` marked as STACK_MISC, another with it as is (and marked STACK_INVALID on a first call). - The call to bpf_probe_read_user() does not mark stack slots corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks these slots as BPF_MISC, this happens because of the following loop in the check_helper_call(): for (i = 0; i < meta.access_size; i++) { err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, BPF_WRITE, -1, false); if (err) return err; } Note the size of the write, it is a one byte write for each byte touched by a helper. The BPF_B write does not lead to write marks for the target stack slot. - Which means that w/o this patch when second __on_event() call is verified `if (frame->co_name)` will propagate read marks first to a stack slot with STACK_MISC marks and second to a stack slot with STACK_INVALID marks and these states would be considered different. [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/ [3] git@github.com:anakryiko/cilium.git Suggested-by: Andrii Nakryiko <andrii@kernel.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
291 lines
9.5 KiB
C
Executable file
291 lines
9.5 KiB
C
Executable file
{
|
|
"variable-offset ctx access",
|
|
.insns = {
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
/* add it to skb. We now have either &skb->len or
|
|
* &skb->pkt_type, but we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
|
|
/* dereference it */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.errstr = "variable ctx access var_off=(0x0; 0x4)",
|
|
.result = REJECT,
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|
|
{
|
|
"variable-offset stack read, priv vs unpriv",
|
|
.insns = {
|
|
/* Fill the top 8 bytes of the stack */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8),
|
|
/* add it to fp. We now have either fp-4 or fp-8, but
|
|
* we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* dereference it for a stack read */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.result = ACCEPT,
|
|
.result_unpriv = REJECT,
|
|
.errstr_unpriv = "R2 variable stack access prohibited for !root",
|
|
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
|
|
},
|
|
{
|
|
"variable-offset stack read, uninitialized",
|
|
.insns = {
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8),
|
|
/* add it to fp. We now have either fp-4 or fp-8, but
|
|
* we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* dereference it for a stack read */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.result = REJECT,
|
|
.errstr = "invalid variable-offset read from stack R2",
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|
|
{
|
|
"variable-offset stack write, priv vs unpriv",
|
|
.insns = {
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 8-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 8),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
|
|
/* Add it to fp. We now have either fp-8 or fp-16, but
|
|
* we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* Dereference it for a stack write */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
|
|
/* Now read from the address we just wrote. This shows
|
|
* that, after a variable-offset write, a priviledged
|
|
* program can read the slots that were in the range of
|
|
* that write (even if the verifier doesn't actually know
|
|
* if the slot being read was really written to or not.
|
|
*/
|
|
BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_2, 0),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
/* Variable stack access is rejected for unprivileged.
|
|
*/
|
|
.errstr_unpriv = "R2 variable stack access prohibited for !root",
|
|
.result_unpriv = REJECT,
|
|
.result = ACCEPT,
|
|
},
|
|
{
|
|
"variable-offset stack write clobbers spilled regs",
|
|
.insns = {
|
|
/* Dummy instruction; needed because we need to patch the next one
|
|
* and we can't patch the first instruction.
|
|
*/
|
|
BPF_MOV64_IMM(BPF_REG_6, 0),
|
|
/* Make R0 a map ptr */
|
|
BPF_LD_MAP_FD(BPF_REG_0, 0),
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 8-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 8),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
|
|
/* Add it to fp. We now have either fp-8 or fp-16, but
|
|
* we don't know which.
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* Spill R0(map ptr) into stack */
|
|
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
|
|
/* Dereference the unknown value for a stack write */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
|
|
/* Fill the register back into R2 */
|
|
BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -8),
|
|
/* Try to dereference R2 for a memory load */
|
|
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 8),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 1 },
|
|
/* The unpriviledged case is not too interesting; variable
|
|
* stack access is rejected.
|
|
*/
|
|
.errstr_unpriv = "R2 variable stack access prohibited for !root",
|
|
.result_unpriv = REJECT,
|
|
/* In the priviledged case, dereferencing a spilled-and-then-filled
|
|
* register is rejected because the previous variable offset stack
|
|
* write might have overwritten the spilled pointer (i.e. we lose track
|
|
* of the spilled register when we analyze the write).
|
|
*/
|
|
.errstr = "R2 invalid mem access 'inv'",
|
|
.result = REJECT,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, unbounded",
|
|
.insns = {
|
|
BPF_MOV64_IMM(BPF_REG_2, 6),
|
|
BPF_MOV64_IMM(BPF_REG_3, 28),
|
|
/* Fill the top 16 bytes of the stack. */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value. */
|
|
BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_1, offsetof(struct bpf_sock_ops,
|
|
bytes_received)),
|
|
/* Check the lower bound but don't check the upper one. */
|
|
BPF_JMP_IMM(BPF_JSLT, BPF_REG_4, 0, 4),
|
|
/* Point the lower bound to initialized stack. Offset is now in range
|
|
* from fp-16 to fp+0x7fffffffffffffef, i.e. max value is unbounded.
|
|
*/
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 16),
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_10),
|
|
BPF_MOV64_IMM(BPF_REG_5, 8),
|
|
/* Dereference it indirectly. */
|
|
BPF_EMIT_CALL(BPF_FUNC_getsockopt),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.errstr = "invalid unbounded variable-offset indirect access to stack R4",
|
|
.result = REJECT,
|
|
.prog_type = BPF_PROG_TYPE_SOCK_OPS,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, max out of bound",
|
|
.insns = {
|
|
/* Fill the top 8 bytes of the stack */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8),
|
|
/* add it to fp. We now have either fp-4 or fp-8, but
|
|
* we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* dereference it indirectly */
|
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 5 },
|
|
.errstr = "invalid variable-offset indirect access to stack R2",
|
|
.result = REJECT,
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, min out of bound",
|
|
.insns = {
|
|
/* Fill the top 8 bytes of the stack */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 516),
|
|
/* add it to fp. We now have either fp-516 or fp-512, but
|
|
* we don't know which
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* dereference it indirectly */
|
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 5 },
|
|
.errstr = "invalid variable-offset indirect access to stack R2",
|
|
.result = REJECT,
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, min_off < min_initialized",
|
|
.insns = {
|
|
/* Fill only the top 8 bytes of the stack. */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned. */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
|
|
/* Add it to fp. We now have either fp-12 or fp-16, but we don't know
|
|
* which. fp-16 size 8 is partially uninitialized stack.
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* Dereference it indirectly. */
|
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
|
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 5 },
|
|
.errstr = "invalid indirect read from stack R2 var_off",
|
|
.result = REJECT,
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, priv vs unpriv",
|
|
.insns = {
|
|
/* Fill the top 16 bytes of the stack. */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value. */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned. */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
|
|
/* Add it to fp. We now have either fp-12 or fp-16, we don't know
|
|
* which, but either way it points to initialized stack.
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* Dereference it indirectly. */
|
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
|
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 6 },
|
|
.errstr_unpriv = "R2 variable stack access prohibited for !root",
|
|
.result_unpriv = REJECT,
|
|
.result = ACCEPT,
|
|
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
|
|
},
|
|
{
|
|
"indirect variable-offset stack access, ok",
|
|
.insns = {
|
|
/* Fill the top 16 bytes of the stack. */
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
|
/* Get an unknown value. */
|
|
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
|
|
/* Make it small and 4-byte aligned. */
|
|
BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
|
|
BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
|
|
/* Add it to fp. We now have either fp-12 or fp-16, we don't know
|
|
* which, but either way it points to initialized stack.
|
|
*/
|
|
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
|
|
/* Dereference it indirectly. */
|
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
|
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
BPF_EXIT_INSN(),
|
|
},
|
|
.fixup_map_hash_8b = { 6 },
|
|
.result = ACCEPT,
|
|
.prog_type = BPF_PROG_TYPE_LWT_IN,
|
|
},
|