Why does recovering objects take much longer than the outage that caused them?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I noticed that for my clusters, even a short 5-minute network outage or single-host reboot can cause

    pgs:     5586988/366684639 objects misplaced (1.524%)

which at the speed of

    recovery: 2.2 GiB/s, 676 objects/s

can take hours to recover.

I don't understand how this can be. If it's down for so short, how can rebalancing can take this long?

I'm using Ceph 19.2.2 on HDDs with SSDs as BlueStore "db" device.
Is this perhaps that writes of new files are written linearly to HDD (fast) but recovery seeks around on my HDDs in random order (slow)?

In any case, this asymmetry is quite annoying.
Could anything be done against it?

Thanks!
Niklas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux