On 5/7/25 08:26, Song Liu wrote: > On Tue, May 6, 2025 at 7:40 AM Maxime Bélair > <maxime.belair@xxxxxxxxxxxxx> wrote: >> >> Add support for the new lsm_manage_policy syscall, providing a unified >> API for loading and modifying LSM policies without requiring the LSM’s >> pseudo-filesystem. >> >> Benefits: >> - Works even if the LSM pseudo-filesystem isn’t mounted or available >> (e.g. in containers) >> - Offers a logical and unified interface rather than multiple >> heterogeneous pseudo-filesystems. > > These two do not feel like real benefits: > - One syscall cannot fit all use cases well... This syscall is not intended to cover every case, nor to replace existing kernel interfaces. Each LSM can decide which operations it wants to support (if any). For example, when loading policies, an LSM may choose to allow only policies that further restrict privileges. > - Not working in containers is often not an issue, but a feature. Indeed, using this syscall requires appropriate capabilities and will not permit unprivileged containers to manage policies arbitrarily. With this syscall, capability checks remain the responsibility of each LSM. For instance, in the AppArmor patch, a profile can be loaded only if aa_policy_admin_capable() succeeds (which requires CAP_MAC_ADMIN). Moreover, by design, policies can be loaded only in the current namespace. I see this syscall as a middle point between exposing the entire sysfs, creating a large attack surface, and blocking everything. Landlock’s existing syscalls already improve security by allowing processes to further restrict their ambient rights while adding only a modest attack surface. This syscall is a further step in that direction: it lets LSMs add restrictive policies without requiring exposing every other interface. Again, each module decides which operations to expose through this syscall. In many cases the operation will still require CAP_SYS_ADMIN or a similar capability, so environments that choose this interface remain secure while gaining its advantages. >> - Avoids overhead of other kernel interfaces for better efficiency > > .. and it is is probably less efficient, because everything need to > fit in the same API. As shown below, the syscall can significantly improve the performance of policy management. A more detailed benchmark is available in [1]. The following table presents the time required to load an AppArmor profile. For every cell, the first value is the total time taken by aa-load, and the value in parentheses is the time spent to load the policy in the kernel only (total - dry‑run). Results are in microseconds and are averaged over 10 000 runs to reduce variance. | t (µs) | syscall | pseudofs | Speedup | |-----------|-------------|-------------|---------------| | 1password | 4257 (1127) | 3333 (192) | x1.28 (x5.86) | | Xorg | 6099 (2961) | 5167 (2020) | x1.18 (x1.47) | If an LSM wants to allow several operations for a single LSM_POLICY_XXX it can multiplex a sub‑opcode in flags, and select the appropriate handler, this incurs negligible overhead. Thanks, Maxime [1] https://gitlab.com/-/snippets/4840792