On 6/23/25 15:23, Al Viro wrote:
On Mon, Jun 23, 2025 at 10:37:47PM +0100, Al Viro wrote:
Could you explain what exclusion are you trying to get there?
The mechanism is currently broken, but what is it trying to achieve?
While we are at it:
root@kvm1:~# cd /sys/kernel/security/apparmor/policy
root@kvm1:/sys/kernel/security/apparmor/policy# (for i in `seq 270`; do mkdir namespaces/$i; cd namespaces/$i; done)
root@kvm1:/sys/kernel/security/apparmor/policy# rmdir namespaces/1
[ 40.980453] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI
[ 40.980457] CPU: 3 UID: 0 PID: 2223 Comm: rmdir Not tainted 6.12.27-amd64 #11
[ 40.980459] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
[ 40.980460] RIP: 0010:inode_set_ctime_current+0x2c/0x100
[ 40.980490] Code: 1e fa 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 31 db 48 8f
[ 40.980491] RSP: 0018:ffffc1cbc2cfbff8 EFLAGS: 00010292
[ 40.980493] RAX: 0000000000400000 RBX: 0000000000000000 RCX: ffff9dbcc358ac70
[ 40.980494] RDX: 0000000000000001 RSI: ffff9dbcc48c0300 RDI: ffffc1cbc2cfbff8
[ 40.980495] RBP: ffffc1cbc2cfc028 R08: 0000000000000000 R09: ffffffffa484c6c0
[ 40.980495] R10: ffff9dbcc0729cc0 R11: 0000000000000002 R12: ffff9dbcc4a75b28
[ 40.980496] R13: ffff9dbcc4a75b28 R14: ffff9dbcc01fe600 R15: ffff9dbcc51a9e00
[ 40.980498] FS: 00007ffb70ea4740(0000) GS:ffff9dbfefd80000(0000) knlGS:00000
[ 40.980499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 40.980499] CR2: ffffc1cbc2cfbfe8 CR3: 000000010619a000 CR4: 00000000000006f0
[ 40.980501] Call Trace:
[ 40.980510] <TASK>
[ 40.980513] simple_unlink+0x24/0x50
[ 40.980526] aafs_remove+0x9a/0xb0
[ 40.980543] __aafs_ns_rmdir+0x2ec/0x3b0
[ 40.980548] destroy_ns.part.0+0x9f/0xc0
[ 40.980558] __aa_remove_ns+0x44/0x90
[ 40.980560] destroy_ns.part.0+0x40/0xc0
[ 40.980562] __aa_remove_ns+0x44/0x90
[ 40.980563] destroy_ns.part.0+0x40/0xc0
.....
[ 40.981324] ns_rmdir_op+0x189/0x300
[ 40.981327] vfs_rmdir+0x9b/0x200
[ 40.981335] do_rmdir+0x1ac/0x1c0
[ 40.981340] __x64_sys_rmdir+0x3f/0x70
[ 40.981342] do_syscall_64+0x82/0x190
[ 40.981360] ? do_fault+0x31a/0x550
[ 40.981372] ? __handle_mm_fault+0x7c2/0xf70
[ 40.981373] ? syscall_exit_to_user_mode_prepare+0x149/0x170
[ 40.981388] ? __count_memcg_events+0x53/0xf0
[ 40.981392] ? count_memcg_events.constprop.0+0x1a/0x30
[ 40.981394] ? handle_mm_fault+0x1bb/0x2c0
[ 40.981396] ? do_user_addr_fault+0x36c/0x620
[ 40.981408] ? exc_page_fault+0x7e/0x180
[ 40.981412] entry_SYSCALL_64_after_hwframe+0x76/0x7e
.....
[ 40.981486] Kernel panic - not syncing: Fatal exception in interrupt
I realize that anyone who can play with apparmor config can screw the
box into the ground in a lot of ways, but... when you have a recursion
kernel-side, it would be nice to have its depth bounded. Not even root
should be able to panic the box with a single call of rmdir(2)...
Indeed, we can cap this something about ~3x what realistically we would see with
nesting of user namespaces. Some where between between 8-16 deep should be enough.
Something like
diff --git a/security/apparmor/include/policy_ns.h b/security/apparmor/include/policy_ns.h
index d646070fd966..081a0d5988d4 100644
--- a/security/apparmor/include/policy_ns.h
+++ b/security/apparmor/include/policy_ns.h
@@ -19,6 +19,9 @@
#include "policy.h"
+/* maximum nesting of policy namespaces */
+#define MAX_NS_DEPTH 8
+
/* struct aa_ns_acct - accounting of profiles in namespace
* @max_size: maximum space allowed for all profiles in namespace
* @max_count: maximum number of profiles that can be in this namespace
diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c
index 64783ca3b0f2..89a6fecfd39a 100644
--- a/security/apparmor/policy_ns.c
+++ b/security/apparmor/policy_ns.c
@@ -223,6 +223,9 @@ static struct aa_ns *__aa_create_ns(struct aa_ns *parent, const char *name,
AA_BUG(!name);
AA_BUG(!mutex_is_locked(&parent->lock));
+ if (ns->level >= MAX_NS_LEVEL)
+ return ERR_PTR(-EPERM);
+
ns = alloc_ns(parent->base.hname, name);
if (!ns)
return ERR_PTR(-ENOMEM);
would do for the creation side, which should be enough. But We could also
throw in a bug check and bail against the the ns->level on the rmdir as well.