Dear all, I have the following issue on my Ceph cluster MDSs behind on trimming on my Ceph Cluster since the upgrade using cephadm from 18.2.6 to 19.2.2. Here some cluster logs : [WRN] [WRN] [WRN] [WRN] [WRN] [WRN] [WRN] [WRN] [INF] [INF] [INF] [INF] [WRN] [INF] [WRN] [WRN] [WRN] [WRN] [WRN] [WRN] [WRN] [INF] #ceph fs status cephfs - 50 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.node1.ojmpnk Reqs: 10 /s 305k 294k 91.8k 6818 0-s standby-replay cephfs.node2.isqjza Evts: 0 /s 551k 243k 90.6k 0 POOL TYPE USED AVAIL cephfs_metadata metadata 2630M 2413G cephfs_data data 12.7T 3620G STANDBY MDS cephfs.node3.vdicdn MDS version: ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable) # ceph versions { "mon": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 3 }, "mgr": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 2 }, "osd": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 18 }, "mds": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 3 }, "rgw": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 6 }, "overall": { "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)": 32 } } #ceph orch ps --daemon-type mds NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.node1.ojmpnk rke-sh1-1 running (18h) 4m ago 19M 1709M - 19.2.2 4892a7ef541b 8dd8db30a1de mds.cephfs.node2.isqjza rke-sh1-2 running (18h) 2m ago 3y 1720M - 19.2.2 4892a7ef541b 7b9d5b692764 mds.cephfs.node3.vdicdn rke-sh1-3 running (18h) 108s ago 18M 27.9M - 19.2.2 4892a7ef541b d2de22a15e18 root@node1:~# ceph config show-with-defaults mds.cephfs.rke-sh1-3.vdicdn | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" mds_cache_memory_limit 4294967296 default mds_cache_trim_decay_rate 1.000000 default mds_cache_trim_threshold 262144 default mds_recall_max_caps 30000 default mds_recall_max_decay_rate 1.500000 default root@node2:~# ceph config show-with-defaults mds.cephfs.rke-sh1-2.isqjza | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" mds_cache_memory_limit 4294967296 default mds_cache_trim_decay_rate 1.000000 default mds_cache_trim_threshold 262144 default mds_recall_max_caps 30000 default mds_recall_max_decay_rate 1.500000 default root@node3:~# ceph config show-with-defaults mds.cephfs.rke-sh1-1.ojmpnk | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" mds_cache_memory_limit 4294967296 default mds_cache_trim_decay_rate 1.000000 default mds_cache_trim_threshold 262144 default mds_recall_max_caps 30000 default mds_recall_max_decay_rate 1.500000 default # ceph mds stat cephfs:1 {0=cephfs.node1.ojmpnk=up:active} 1 up:standby-replay 1 up:standby Do you have an idea on what could happen ? Should I increate mds_cache_trim_decay_rate ? I saw the folloing issue : Bug #66948: mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log - CephFS - Ceph ( squid: mds: trim mdlog when segments exceed threshold and trim was idle by vshankar · Pull Request #60838 · ceph/ceph · GitHub ) maybe related ? Thanks for the help 😊 Best Regards, Edouard Fazenda.
|
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx