Re: ceph 19.2.2 - adding new hard drives messed up the order of existing ones - OSD down

Frédéric Nass <frederic.nass@xxxxxxxxx> · Fri, 8 Aug 2025 10:30:59 +0200

Hi Steven,

The fact that the device name has changed should not be an issue for
Ceph as it normally relies on LVM labels to keep track of disks.
Device naming consistency has never been guaranteed upon reboot and
it's even more true now with EL9 systems that implemented asynchronous
device probing and discovery on boot to improve boot time.
Anyway, the 'block' and 'block.db' links should still be present and
point to the correct LVM devices /dev/ceph-<VG>/osd-block-<LV>
whatever the device names.

To fix that, you could check ceph volumes and LVM signatures with
'cephadm ceph-volume lvm list', check that the LVM volumes associated
to that OSD exists for LVM (lvdisplay), recreate the missing 'block'
(and eventually block.db) symlink(s) pointing to the correct device(s)
and restart the systemd service associated to that OSD.

Have a look at this documentation [1] made by Eugen.

Best regards,
Frédéric

[1] https://heiterbiswolkig.blogs.nde.ag/2021/02/08/cephadm-reusing-osds-on-reinstalled-server/

--
Frédéric Nass
Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
Try our Ceph Analyzer -- https://analyzer.clyso.com/
https://clyso.com | frederic.nass@xxxxxxxxx

Le jeu. 7 août 2025 à 18:00, Steven Vacaroaia <stef97@xxxxxxxxx> a écrit :
>
> Hi,
>
> I needed to add more spinning HDD to my nodes ( SuperMicro SSG-641E-E1CR36L)
> and made the mistake of NOT setting up osd_auto_discovery to "false" so
> ceph created OSDs on all 5 new spinning  HDD
>
> This was an issue as I wanted to configure the OSDs the same as existing
> ones ( i.e. with WAL/DB on NVME) when the other 37 drives arrive
>
> No big harm done though because I could zap them and reconfigure ( after
> running ceph orch apply osd --all-available-devices --unmaged=true) when I
> will receive the remaining 37 drives ( as I will add 6 drives on each
> server)
>
> The interesting part is that, for whatever reason, one of the existing SSD
> based OSD is down now because the SSD drive it used changed from
> /dev/sdp to /dev/sdu therefore
> there is no "block" entry under /var/lib/ceph/FSID/osd.XX
>
> I am not sure why adding spinning disks mess up the order/naming of sdd
> disks
>
> I would appreciate some advice regarding the best course of action
> to reconfigure  the OSD that is down
>
> The cluster is healthy , not busy, with all the other OSDs  working as
> expected
> Has 7 hosts  with 12 SSD and 6 HDD drives each  ( one of them is the one
> with issues),
>  2 EC 4+2 pool , 2 MDS and few metadata pools replicated on NVME
> There are also 3 NVME disks on each dedicated to pools
>
>
> Many thanks
> Steven
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx