Re: Ceph MDS stuck in reconnect -> rejoin -> failover loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ISSUE FIXED

After rebooting all clients, and setting - ceph config set mon mds_beacon_grace 120
The MDS finally went active

________________________________
From: Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx>
Sent: Monday, March 31, 2025 09:46
To: ceph-users <ceph-users@xxxxxxx>
Subject:  Ceph MDS stuck in reconnect -> rejoin -> failover loop

Hi

Ceph pacific 16.2.15

I have 5 MDS hosts, 4 active (4 FS), and 1 standby.

One MDS was restarted today (as part of OS Patching), resulting in a failover. This is usually not an issue but today,it got stuck in a reconnect -> rejoin -> failover loop for the specific FS.

A ceph fs status shows that during the time the FS is in state "rejoin" the INOS rise to +50M (usually it is around 10-12M )

The memory on the MDS host is eaten, (MDS cache size is 36GB, but it rises to +140 GB.)

Finaly it fails over, and the cycle starts over.

We are currently restarting all clients, in an effort to rule out buggy clients.


Any help on this issue will be very much appreciated. Thank you



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux