Re: OSD failed: still recovering

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 23 Mar 2025 23:12:30 -0700

Hint: the default max misplaced setting for the balancer module is 5% .   This is a common question.  I should see if there’s somewhere in the docs where this could be called out.  

Most likely it IS making progress.  In you look at `ceph health detail` periodically the misplaced PGs should be different over time.  The stddev reported by `ceph osd df` should decrease over time too.  

Unless you have PGs stuck in backfill_wait or backfill_toofull forever.  

> On Mar 23, 2025, at 9:58 PM, Alan Murrell <Alan@xxxxxxxx> wrote:
> 
> Hello,
> 
> We had a drive (OSD) failed innour 5 node cluster three days ago (late afternoon of Mar 20).  The PGs have sorted themselves out, but the cluster has has been recovering with backfill since then.  Every time I run a 'ceph -s' it shows a little over 5% misplaced objects with several jobs of backfill_wait and some scrubbing.
> 
> What is sort of weird is that if I run the 'ceph -s' as few times in a row, I can see the the percentage of misplaced objects go down a bit but then if I leave it for a while and run 'ceph -s' again, it is still just over 5% misplaced objects but has typically slightly increased.
> 
> For example, it might be 5.364% when I check it, and then after checking it several times in a row it might go down to 5.276% but then if I check it again after a few hours, it might be something like 5.478% (so still in the 5% range but slightly increased from last check)
> 
> The cluster is on 10Gbit, and I have increased the max_backfills to 4 while the recovery runs, but it just doesn't seem to be making much progress.
> 
> I know the failed drive needs to be replaced, but I think it is recommended to wait until the cluster is finished recovering?
> 
> Your thoughts/advice (as usual) are greatly appreciated.
> 
> Sent from my mobile device.  Please excuse brevity and ttpos.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx