That's quite a large number of storage units per machine.
My suspicion is that since you have apparently an unusually high number
of LVs coming online at boot, the time it takes to linearly activate
them is long enough to overlap with the point in time that ceph starts
bringing up its storage-dependent components. Likely not only OSDs, but
other resources that might keep internal databases and the like.
The cure for that under systemd would be to make Ceph - or at least its
storage-dependent services - wait on LV availability.
The fun part is figuring out how to do that. Offhand, I don't know what
in systemd controls the activation of LVM resources and it's almost
certainly being done asynchronously, so you'd need to provide a detector
service that could determine when things were available. Then you'd have
to tweak Ceph not to start until the safe time has arrived. You might be
able to edit the master ceph target to add such a dependency using an
/etc/systemd/system override, but admittedly that doesn't cover allowing
everything to come up as soon as possible but no sooner.
In particular, it would be hard to edit the individual OSDs to wait on
their LVs, as the systemd components for OSDs on an administered system
are constructed dynamically and do not persist when the system reboots,
so it would likely require a worst-case delay.
Regards,
Tim
On 4/10/25 07:45, Alex from North wrote:
Hello Dominique!
Os is quite new - Ubuntu 22.04 with all the latest upgrades.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx