Re: OSD failed: still recovering

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Tue, 25 Mar 2025 17:37:35 +0100 (CET)

Hi Alan,

----- Le 25 Mar 25, à 16:47, Alan Murrell Alan@xxxxxxxx a écrit :

> OK, so just an update that the recovery did finally complete, and I am pretty
> sure that the "inconsistent" PGs were PGs that the failed OSD were part of.
> Running 'ceph pg repair' has them sorted out, along with the 600+ "scrub
> errors" I had.
> 
> I was able to remove the OSD from the cluster, and am now just awaiting a
> replacement drive.  My cluster is now showing healthy.
> 
> Related question: the OSD had its DB/WAL on a partition on an SSD.  Would I just
> "zap" the partition like I would a drive, so it is available to be used again
> when I replace the HDD,

Yes. Spot the 'db device' path to zap for this specific OSD in the output of 'cephadm ceph-volume lvm list', then zap it with:

cephadm ceph-volume lvm zap --destroy /dev/ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/osd-db-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Refresh devices info with 'ceph orch device ls --hostname=<hostname> --wide --refresh' and the orchestrator should bootstrap a new OSD right away.

Regards,
Frédéric.

> or is there another method for "reclaiming" that DB/WAL
> partition?
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx