Failing upgrade 18.2.7 to 19.2.3 - failing to activate via raw takes 5 minutes (before proceeding to lvm)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm fighting with a ceph upgrade, going 18.2.7 to 19.2.3.

This time again the ceph-volume activate step is taking to long, triggering
failures due to systemd service timing out so the orch daemon fails (though
the osd does eventually come up, the daemon is still dead, and upgrade
halts).

I can also reproduce the slowdown of startup with
cephadm ceph-volume raw list

(I don't use raw devices, but the ceph-volume activation method hardcodes
checking raw first
https://github.com/ceph/ceph/blob/4d5ad8c1ef04f38d14402f0d89f2df2b7d254c2c/src/ceph-volume/ceph_volume/activate/main.py#L46
)

That's takes 6s on 18.2.7, but 4m32s minutes on 19.2.3 !
I have 42 spinning drives per host (with multipath).

It's spending all of it's time in the new method:
self.exclude_lvm_osd_devices()
and the list of items to scan, given all the duplication from multipath +
and mapper names, it ends up with 308 items to scan in my setup.

With good old print debugging, i found that while the threadpool speeds
things up a bit, it simply takes to long to construct all those Device()
objects.
In fact, just creating a single Device() object, since it needs to call
disk.get_devices()
at least once, since this list does not include all devices, it filters out
things like
"/dev/mapper/mpathxx" from the list, but the code always regenerates (the
same) device list if the path isn't found:

       if not sys_info.devices.get(self.path):
           sys_info.devices = disk.get_devices()

will now force it to re-generate this list >400 times (initial 32 times in
parallel, followed by about 400 more which will never match the device
name).
In the end, it's again O(n^2) computational time to list these raw devices
with ceph-volume.
So with 32 threads in the pool, it's also now requires running heavy load
for 5 minutes before completing this trivial task every time the deamon
needs to start.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux