Some protocols (e.g., TCP, UDP) have their own memory accounting for socket buffers and charge memory to global per-protocol counters such as /proc/net/ipv4/tcp_mem. When running under a non-root cgroup, this memory is also charged to the memcg as sock in memory.stat. Sockets of such protocols are still subject to the global limits, thus affected by a noisy neighbour outside cgroup. This makes it difficult to accurately estimate and configure appropriate global limits. This series allows decoupling memcg from the global memory accounting if socket is configured as such by BPF prog. This simplifies the memcg configuration while keeping the global limits within a reasonable range, which is only 10% of the physical memory by default. Overview of the series: patch 1 & 2 are prep patch 3 intorduces SK_BPF_MEMCG_SOCK_ISOLATED for bpf_setsockopt() patch 4 decouples memcg from sk_prot->memory_allocated based on the flag patch 5 is selftest Changes: v5: * Patch 2 * Rename new variants to bpf_sock_create_{get,set}sockopt() * Patch 3 * Limit getsockopt() to BPF_CGROUP_INET_SOCK_CREATE * Patch 5 * Use kern_sync_rcu() * Double NR_SEND to 128 v4: https://lore.kernel.org/netdev/20250829010026.347440-1-kuniyu@xxxxxxxxxx/ * Patch 2 * Use __bpf_setsockopt() instead of _bpf_setsockopt() * Add getsockopt() for a cgroup with multiple bpf progs running * Patch 3 * Only allow inet_create() to set flags * Inherit flags from listener to child in sk_clone_lock() * Support clearing flags * Patch 5 * Only use inet_create() hook * Test bpf_getsockopt() * Add serial_ prefix * Reduce sleep() and the amount of sent data v3: https://lore.kernel.org/netdev/20250826183940.3310118-1-kuniyu@xxxxxxxxxx/ * Drop patches for accept() hook * Patch 1 * Merge if blocks * Patch2 * Drop bpf_func_proto for accept() * Patch 3 * Allow flagging without sk->sk_memcg * Inherit SK_BPF_MEMCG_SOCK_ISOLATED in __inet_accept() v2: https://lore.kernel.org/bpf/20250825204158.2414402-1-kuniyu@xxxxxxxxxx/ * Patch 2 * Define BPF_CGROUP_RUN_PROG_INET_SOCK_ACCEPT() when CONFIG_CGROUP_BPF=n * Patch 5 * Make 2 new bpf_func_proto static * Patch 6 * s/mem_cgroup_sk_set_flag/mem_cgroup_sk_set_flags/ when CONFIG_MEMCG=n * Use finer CONFIG_CGROUP_BPF instead of CONFIG_BPF_SYSCALL for ifdef v1: https://lore.kernel.org/netdev/20250822221846.744252-1-kuniyu@xxxxxxxxxx/ Kuniyuki Iwashima (5): tcp: Save lock_sock() for memcg in inet_csk_accept(). bpf: Support bpf_setsockopt() for BPF_CGROUP_INET_SOCK_CREATE. bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_SOCK_ISOLATED. net-memcg: Allow decoupling memcg from global protocol memory accounting. selftest: bpf: Add test for SK_BPF_MEMCG_SOCK_ISOLATED. include/net/proto_memory.h | 15 +- include/net/sock.h | 50 ++++ include/net/tcp.h | 10 +- include/uapi/linux/bpf.h | 6 + net/core/filter.c | 82 +++++++ net/core/sock.c | 65 ++++-- net/ipv4/af_inet.c | 37 +++ net/ipv4/inet_connection_sock.c | 26 +-- net/ipv4/tcp.c | 3 +- net/ipv4/tcp_output.c | 10 +- net/mptcp/protocol.c | 3 +- net/tls/tls_device.c | 4 +- tools/include/uapi/linux/bpf.h | 6 + .../selftests/bpf/prog_tests/sk_memcg.c | 218 ++++++++++++++++++ tools/testing/selftests/bpf/progs/sk_memcg.c | 38 +++ 15 files changed, 517 insertions(+), 56 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_memcg.c create mode 100644 tools/testing/selftests/bpf/progs/sk_memcg.c -- 2.51.0.338.gd7d06c2dae-goog