2 MDSs behind on trimming on my Ceph Cluster since the upgrade from 18.2.6 (reef) to 19.2.2 (squid)

Edouard FAZENDA <e.fazenda@xxxxxxx> · Thu, 8 May 2025 08:15:28 +0000

Dear all,

I have the following issue on my Ceph cluster MDSs behind on trimming on my Ceph Cluster since the upgrade using cephadm from 18.2.6 to 19.2.2.

Here some cluster logs :

8/5/25 09:00 AM[WRN]overall HEALTH_WARN 2 MDSs behind on trimming
8/5/25 08:50 AM[WRN]overall HEALTH_WARN 2 MDSs behind on trimming
8/5/25 08:40 AM[WRN]mds.cephfs.node2.isqjza(mds.0): Behind on trimming (326/128) max_segments: 128, num_segments: 326
8/5/25 08:40 AM[WRN]mds.cephfs.node1.ojmpnk(mds.0): Behind on trimming (326/128) max_segments: 128, num_segments: 326
8/5/25 08:40 AM[WRN][WRN] MDS_TRIM: 2 MDSs behind on trimming
8/5/25 08:40 AM[WRN]Health detail: HEALTH_WARN 2 MDSs behind on trimming
8/5/25 08:33 AM[WRN]Health check update: 2 MDSs behind on trimming (MDS_TRIM)
8/5/25 08:33 AM[WRN]Health check failed: 1 MDSs behind on trimming (MDS_TRIM)
8/5/25 08:30 AM[INF]overall HEALTH_OK
8/5/25 08:22 AM[INF]Cluster is now healthy
8/5/25 08:22 AM[INF]Health check cleared: MDS_TRIM (was: 1 MDSs behind on trimming)
8/5/25 08:22 AM[INF]MDS health message cleared (mds.?): Behind on trimming (525/128)
8/5/25 08:22 AM[WRN]Health check update: 1 MDSs behind on trimming (MDS_TRIM)
8/5/25 08:22 AM[INF]MDS health message cleared (mds.?): Behind on trimming (525/128)
8/5/25 08:20 AM[WRN]overall HEALTH_WARN 2 MDSs behind on trimming
8/5/25 08:10 AM[WRN]mds.cephfs.node2.isqjza(mds.0): Behind on trimming (332/128) max_segments: 128, num_segments: 332
8/5/25 08:10 AM[WRN]mds.cephfs.node1.ojmpnk(mds.0): Behind on trimming (332/128) max_segments: 128, num_segments: 332
8/5/25 08:10 AM[WRN][WRN] MDS_TRIM: 2 MDSs behind on trimming
8/5/25 08:10 AM[WRN]Health detail: HEALTH_WARN 2 MDSs behind on trimming
8/5/25 08:03 AM[WRN]Health check update: 2 MDSs behind on trimming (MDS_TRIM)
8/5/25 08:03 AM[WRN]Health check failed: 1 MDSs behind on trimming (MDS_TRIM)
8/5/25 08:00 AM[INF]overall HEALTH_OK

#ceph fs status
cephfs - 50 clients
======
RANK      STATE                 MDS               ACTIVITY     DNS    INOS   DIRS   CAPS
 0        active      cephfs.node1.ojmpnk  Reqs:   10 /s   305k   294k  91.8k  6818
0-s   standby-replay  cephfs.node2.isqjza  Evts:    0 /s   551k   243k  90.6k     0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata  2630M  2413G
  cephfs_data      data    12.7T  3620G
      STANDBY MDS
cephfs.node3.vdicdn
MDS version: ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)

# ceph versions
{
    "mon": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 3
    },
    "mgr": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 2
    },
    "osd": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 18
    },
    "mds": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 3
    },
    "rgw": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 6
    },
    "overall": {
        "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 32
    }
}

#ceph orch ps --daemon-type mds
NAME                         HOST       PORTS  STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
mds.cephfs.node1.ojmpnk  rke-sh1-1         running (18h)     4m ago  19M    1709M        -  19.2.2   4892a7ef541b  8dd8db30a1de
mds.cephfs.node2.isqjza  rke-sh1-2         running (18h)     2m ago   3y    1720M        -  19.2.2   4892a7ef541b  7b9d5b692764
mds.cephfs.node3.vdicdn  rke-sh1-3         running (18h)   108s ago  18M    27.9M        -  19.2.2   4892a7ef541b  d2de22a15e18

root@node1:~# ceph config show-with-defaults mds.cephfs.rke-sh1-3.vdicdn | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
mds_cache_memory_limit                                      4294967296                                                                                                                                                                                                                                                                                                                                                                               default
mds_cache_trim_decay_rate                                   1.000000                                                                                                                                                                                                                                                                                                                                                                                 default
mds_cache_trim_threshold                                    262144                                                                                                                                                                                                                                                                                                                                                                                   default
mds_recall_max_caps                                         30000                                                                                                                                                                                                                                                                                                                                                                                    default
mds_recall_max_decay_rate                                   1.500000                                                                                                                                                                                                                                                                                                                                                                                 default
root@node2:~# ceph config show-with-defaults mds.cephfs.rke-sh1-2.isqjza | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
mds_cache_memory_limit                                      4294967296                                                                                                                                                                                                                                                                                                                                                                               default
mds_cache_trim_decay_rate                                   1.000000                                                                                                                                                                                                                                                                                                                                                                                 default
mds_cache_trim_threshold                                    262144                                                                                                                                                                                                                                                                                                                                                                                   default
mds_recall_max_caps                                         30000                                                                                                                                                                                                                                                                                                                                                                                    default
mds_recall_max_decay_rate                                   1.500000                                                                                                                                                                                                                                                                                                                                                                                 default
root@node3:~# ceph config show-with-defaults mds.cephfs.rke-sh1-1.ojmpnk | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
mds_cache_memory_limit                                      4294967296                                                                                                                                                                                                                                                                                                                                                                               default
mds_cache_trim_decay_rate                                   1.000000                                                                                                                                                                                                                                                                                                                                                                                 default
mds_cache_trim_threshold                                    262144                                                                                                                                                                                                                                                                                                                                                                                   default
mds_recall_max_caps                                         30000                                                                                                                                                                                                                                                                                                                                                                                    default
mds_recall_max_decay_rate                                   1.500000                                                                                                                                                                                                                                                                                                                                                                                 default

# ceph mds stat
cephfs:1 {0=cephfs.node1.ojmpnk=up:active} 1 up:standby-replay 1 up:standby

Do you have an idea on what could happen ? Should I increate mds_cache_trim_decay_rate ? 

I saw the folloing issue : Bug #66948: mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log - CephFS - Ceph ( squid: mds: trim mdlog when segments exceed threshold and trim was idle by vshankar · Pull Request #60838 · ceph/ceph · GitHub ) maybe related ? 

Thanks for the help 😊 

Best Regards, Edouard Fazenda.

Swiss Cloud Provider
  Edouard Fazenda
Technical Support
       Chemin du Curé-Desclouds, 2
CH-1226 Thonex
+41 22 869 04 40
www.csti.ch

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx