On Thu, May 29, 2025 at 8:03 PM Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > On Thu, 2025-05-29 at 17:45 +0000, Dennis Marttinen wrote: > > The CephFS kernel driver forgets to set the filesystem magic signature in > > its superblock. As a result, IMA policy rules based on fsmagic matching do > > not apply as intended. This causes a major performance regression in Talos > > Linux [1] when mounting CephFS volumes, such as when deploying Rook Ceph > > [2]. Talos Linux ships a hardened kernel with the following IMA policy > > (irrelevant lines omitted): > > > > # cat /sys/kernel/security/integrity/ima/policy > > [...] > > dont_measure fsmagic=0xc36400 # CEPH_SUPER_MAGIC > > [...] > > measure func=FILE_CHECK mask=^MAY_READ euid=0 > > measure func=FILE_CHECK mask=^MAY_READ uid=0 > > [...] > > > > Currently, IMA compares 0xc36400 == 0x0 for CephFS files, resulting in all > > files opened with O_RDONLY or O_RDWR getting measured with SHA512 on every > > open(2): > > > > # cat /data/cephfs/test-file > > # tail -1 /sys/kernel/security/integrity/ima/ascii_runtime_measurements > > 10 69990c87e8af323d47e2d6ae4... ima-ng sha512:<hash> /data/cephfs/test-file > > > > Since O_WRONLY is rare, this results in an order of magnitude lower > > performance than expected for practically all file operations. Properly > > setting CEPH_SUPER_MAGIC in the CephFS superblock resolves the regression. > > > > Tests performed on a 3x replicated Ceph v19.3.0 cluster across three > > i5-7200U nodes each equipped with one Micron 7400 MAX M.2 disk (BlueStore) > > and Gigabit ethernet, on Talos Linux v1.10.2: > > > > FS-Mark 3.3 > > Test: 500 Files, Empty > > Files/s > Higher Is Better > > 6.12.27-talos . 16.6 |==== > > +twelho patch . 208.4 |==================================================== > > > > FS-Mark 3.3 > > Test: 500 Files, 1KB Size > > Files/s > Higher Is Better > > 6.12.27-talos . 15.6 |======= > > +twelho patch . 118.6 |==================================================== > > > > FS-Mark 3.3 > > Test: 500 Files, 32 Sub Dirs, 1MB Size > > Files/s > Higher Is Better > > 6.12.27-talos . 12.7 |=============== > > +twelho patch . 44.7 |===================================================== > > > > IO500 [3] 2fcd6d6 results (benchmarks within variance omitted): > > > > > IO500 benchmark | 6.12.27-talos | +twelho patch | Speedup | > > > -------------------|----------------|----------------|-----------| > > > mdtest-easy-write | 0.018524 kIOPS | 1.135027 kIOPS | 6027.33 % | > > > mdtest-hard-write | 0.018498 kIOPS | 0.973312 kIOPS | 5161.71 % | > > > ior-easy-read | 0.064727 GiB/s | 0.155324 GiB/s | 139.97 % | > > > mdtest-hard-read | 0.018246 kIOPS | 0.780800 kIOPS | 4179.29 % | > > > > This applies outside of synthetic benchmarks as well, for example, the time > > to rsync a 55 MiB directory with ~12k of mostly small files drops from an > > unusable 10m5s to a reasonable 26s (23x the throughput). > > > > [1]: https://www.talos.dev/ > > [2]: https://www.talos.dev/v1.10/kubernetes-guides/configuration/ceph-with-rook/ > > [3]: https://github.com/IO500/io500 > > > > Signed-off-by: Dennis Marttinen <twelho@xxxxxxxxxx> > > --- > > It took me a year to hunt this down: profiling distributed filesystems is > > non-trivial. Since the regression is associated with IMA use, I received a > > hint to CC the folks associated with IMA code. The patch targets the 6.12 > > kernel series currently used by Talos Linux, but should apply on top of > > master as well. Please note that this is an independent contribution - > > I am not affiliated with any company or organization. > > > > fs/ceph/super.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > > index 73f321b52895e..9549f97233a9e 100644 > > --- a/fs/ceph/super.c > > +++ b/fs/ceph/super.c > > @@ -1217,6 +1217,7 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc) > > s->s_time_min = 0; > > s->s_time_max = U32_MAX; > > s->s_flags |= SB_NODIRATIME | SB_NOATIME; > > + s->s_magic = CEPH_SUPER_MAGIC; > > > > Yeah, makes sense. Thanks a lot for the fix. It's really non-trivial issue. > > Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> Applied. Thanks, Ilya