Re: Failing upgrade 18.2.7 to 19.2.3 - failing to activate via raw takes 5 minutes (before proceeding to lvm)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all

Yes, there was also a similar O(n^2) bug caused by indentation (in
lsblk_all if i recall correctly). That time it took well over
15 minutes for me to run through it, so it was even worse.
This time, it's not quite such a subtle bug.

I plan to write up a bug report on this with the details, I just got my bug
tracker account approved.

I suspect only users with large jbods + multipath would see this issue as
bad as I do.
If I didn't have multipath devices that caused re-triggering the
expensive "disk.get_devices()"
repeatedly, it would "only" have taken me an extra ~30 seconds to launch an
OSD daemon. Not good, but it would be well within the systemd timeout and
wouldn't break the daemon completely. It's slow because "ceph-volume
activate" also attempts to find raw devices before proceeding to lvm.

The change was introduced in https://github.com/ceph/ceph/pull/60395  and I
can confirm from
https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py
it does not have the specific problematic code. In 19.2.2 it's still
technically O(n^2) but since it uses a local info_devices variable that's
just generated once, it won't have the multipath issue that makes it 10x
worse.
https://github.com/ceph/ceph/blob/v19.2.2/src/ceph-volume/ceph_volume/devices/raw/list.py

I worked around this problem by setting up a little container mirror
locally where i monkeypatched out the raw part from ceph-volume activate.
Dockerfile:
FROM quay.io/ceph/ceph:v19.2.3
RUN sed -i '46,52d'
/usr/lib/python3.9/site-packages/ceph_volume/activate/main.py

which just deletes the "first try raw" section from ceph-volume activate:
https://github.com/ceph/ceph/blob/50d6a3d454763cea76ca45a846cde9702364c773/src/ceph-volume/ceph_volume/activate/main.py#L46-L52
since like all recommended setups these days I use LVMs for all devices (I
don't understand why one must try raw first)
ceph-volume raw list still takes 5 minutes (and it correctly outputs 0
devices as i don't use raw) but i don't care about that since i will only
use "ceph-volume lvm list". At least this way activation is fast.

On Sun, Sep 14, 2025 at 11:46 AM Michel Jouvin <
michel.jouvin@xxxxxxxxxxxxxxx> wrote:

> Hi Mikael,
>
> Thanks for the report. I was also considering upgrading from 19.2.2 to
> 19.23. Should be related to a change between those 2 versions as I
> experienced no problem during the 18.2.7 to 19.2.2. upgrade... It
> reminds me a problem in one of the Quincy update if I'm right with
> something similar (but probably a different cause) where the device
> activation was doing just too many times the same command (at that time
> was a trivial indentation issue in the code)... but at least it seems
> that activation of many OSDs per node was unsufficiently tested. I don't
> know if testing was improved...
>
> Best regards,
>
> Michel
>
> Le 14/09/2025 à 10:23, Eugen Block a écrit :
> > This is interesting, I was planning to upgrade our own cluster next
> > week from 18.2.7 to 19.2.3 as well, now I'm hesitating. Although we
> > don't have that many OSDs per node, so we probably will not have this
> > issue you're describing. But I can confirm that 'cephadm ceph-volume
> > raw list' on my virtual test environment with only 3 OSDs per node
> > takes around 11 seconds (and empty output). On Reef the output is not
> > empty (probably because exclude_lvm_osd_devices is not present there
> > as I understand it) and it only takes 4 seconds to complete with
> > around 10 OSDs per node.
> > I'll have to check with my colleagues if we should still move forward
> > with the upgrade...
> >
> > Thanks for reporting that! Did you check if there's a tracker issue
> > for that?
> >
> > Thanks,
> > Eugen
> >
> > Zitat von Mikael Öhman <micketeer@xxxxxxxxx>:
> >
> >> I'm fighting with a ceph upgrade, going 18.2.7 to 19.2.3.
> >>
> >> This time again the ceph-volume activate step is taking to long,
> >> triggering
> >> failures due to systemd service timing out so the orch daemon fails
> >> (though
> >> the osd does eventually come up, the daemon is still dead, and upgrade
> >> halts).
> >>
> >> I can also reproduce the slowdown of startup with
> >> cephadm ceph-volume raw list
> >>
> >> (I don't use raw devices, but the ceph-volume activation method
> >> hardcodes
> >> checking raw first
> >>
> https://github.com/ceph/ceph/blob/4d5ad8c1ef04f38d14402f0d89f2df2b7d254c2c/src/ceph-volume/ceph_volume/activate/main.py#L46
> >>
> >> )
> >>
> >> That's takes 6s on 18.2.7, but 4m32s minutes on 19.2.3 !
> >> I have 42 spinning drives per host (with multipath).
> >>
> >> It's spending all of it's time in the new method:
> >> self.exclude_lvm_osd_devices()
> >> and the list of items to scan, given all the duplication from
> >> multipath +
> >> and mapper names, it ends up with 308 items to scan in my setup.
> >>
> >> With good old print debugging, i found that while the threadpool speeds
> >> things up a bit, it simply takes to long to construct all those Device()
> >> objects.
> >> In fact, just creating a single Device() object, since it needs to call
> >> disk.get_devices()
> >> at least once, since this list does not include all devices, it
> >> filters out
> >> things like
> >> "/dev/mapper/mpathxx" from the list, but the code always regenerates
> >> (the
> >> same) device list if the path isn't found:
> >>
> >>        if not sys_info.devices.get(self.path):
> >>            sys_info.devices = disk.get_devices()
> >>
> >> will now force it to re-generate this list >400 times (initial 32
> >> times in
> >> parallel, followed by about 400 more which will never match the device
> >> name).
> >> In the end, it's again O(n^2) computational time to list these raw
> >> devices
> >> with ceph-volume.
> >> So with 32 threads in the pool, it's also now requires running heavy
> >> load
> >> for 5 minutes before completing this trivial task every time the deamon
> >> needs to start.
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux