Hi, After checking osd scrub config via "ceph config get osd | grep scrub". I saw that my osd_deep_scrub_interval / osd_scrub_max_interval / osd_scrub_min_interval was set to 0 for an obscure reason. Put it back to 7 / 7 / 1 days and all is active+clean without deep_scrubbing >Switching osd_op_queue to WPQ or setting >osd_scrub_disable_reservation_queuing = true with mClock could help if >that's the case. I'll keep that in mind if that happen again Thanks to everyone in this thread Vivien ________________________________ De : Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx> Envoyé : lundi 4 août 2025 17:53:32 À : Frédéric Nass; GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting Hi Vivien, You may want to check, if not done already that the scrubbing PGs are still the same. The number may look constant but the PG scrubbed may change (even though the probability of the number remaining always 25 is suspect and may be related to the problem mentioned by Frederic). Michel Sent from my mobile Le 4 août 2025 17:20:32 Frédéric Nass <frederic.nass@xxxxxxxxx> a écrit : Hi Vivien, Great to hear that all PGs are now active+clean. Just so you know, the PG export/import procedure Eugen mentioned should have worked to restore them without dropping their data. Regarding the PGs scrubbing for 5 days, it might be a consequence of many PGs now being processed concurrently when they couldn't while not active+clean. Alternatively, you might be encountering this bug [1] with mClock. Switching osd_op_queue to WPQ or setting osd_scrub_disable_reservation_queuing = true with mClock could help if that's the case. Regards, Frédéric. [1] https://tracker.ceph.com/issues/69078 -- Frédéric Nass Ceph Ambassador France | Senior Ceph Engineer @ CLYSO Try our Ceph Analyzer -- https://analyzer.clyso.com/ https://clyso.com | frederic.nass@xxxxxxxxx Le lun. 4 août 2025 à 09:39, GLE, Vivien <Vivien.GLE@xxxxxxxx> a écrit : Hi, I got 3 incomplete PG that I put as mark-complete because they were empty (I think I lost data from them) 1 was recovery_unfound, I mark_unfound_lost revert this one but I have beetwen 5-25 deep_scrubbing PGs, I believe this is not normal ? (it's been since 5 days) Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : vendredi 1 août 2025 15:58:22 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting Dont worry, I just wanted to point out that careful reading is crucial. :-) So you got the OSDs back up, but were you also able to recover the pg? Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: I lost all perspective and didn't read carefully this message.. Sorry for that Thanks for your help I'm very grateful Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : vendredi 1 août 2025 15:27:56 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting That’s why I mentioned this two days ago: cephadm shell -- ceph-objectstore-tool --op list … That’s how you can execute commands directly with cephadm shell, this is useful for batch operations like a for loop or similar. Of course, first entering the shell and then execute commands works quite as well. Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: I was using ceph-objectstore-tool the wrong way by doing it on host instead of inside container via cephadm shell --name osd.x ________________________________ De : GLE, Vivien <Vivien.GLE@xxxxxxxx> Envoyé : vendredi 1 août 2025 09:02:59 À : Eugen Block Cc : ceph-users@xxxxxxx Objet : Re: Pgs troubleshooting Hi, What is the good way of using objectstore tool ? My OSD are up ! I purged ceph-* on my host following this thread : https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/ " Make sure that the base OS does not have any ceph packages installed, with Ubuntu in the past had issues with ceph-common being installed on the host OS and it trying to take ownership of the containerized ceph deployment. If you run into any issues check the base OS for ceph-* packages and uninstall. " I believe the only good way to use ceph commands is in cephadm Thanks for your help ! ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : jeudi 31 juillet 2025 19:42:21 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting To use the objectstore tool within the container you don’t have to specify the cluster’s FSID because it’s mapped into the container. By using the objectstore tool you might have changed the ownership of the directory, change it back to the previous state. Other OSDs will show you which uid/user and/or gid/group that is. Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: I'm sorry for the confusion ! I paste the wrong output. ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list --pgid 11.4 --no-mon-config OSD.1 log 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 set uid:gid to 167:167 (ceph:ceph) 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process ceph-osd, pid 7 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 pidfile_write: ignore empty --pid-file 2025-07-31T12:06:56.274+0000 7a9c2bf47680 1 bdev(0x57bd64210e00 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory ---------------------- I retried on OSD.2 with PG 2.1 to see if I disabled instead of just stopped the OSD.2 before objectstore-tool operation will change something but same error occurred ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : jeudi 31 juillet 2025 13:27:51 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting Why did you look at OSD.2? According to the query output you provided I would have looked at OSD.1 (acting set). And you pasted the output of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing. Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: I dont get why is he searching in this path because there is nothing and this is the command I used to check bluestore ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list --pgid 2.1 --no-mon-config ________________________________ De : GLE, Vivien Envoyé : jeudi 31 juillet 2025 09:38:25 À : Eugen Block Cc : ceph-users@xxxxxxx Objet : RE: Re: Pgs troubleshooting Hi, Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not entirely sure and am on vacation. 😅 it could be worth a try. But don’t forget to reset min_size back to 2 afterwards. Did it but nothing really changed, how many time should I wait to see if it does something ? No, you use the ceph-objectstore-tool to export the PG from the intact OSD (you need to stop it though, set noout flag), make sure you have enough disk space. I stopped my OSD and noout to check if my PG is stored in bluestore (he is not) but when I tried to restart my OSD, OSD superblock was gone 2025-07-31T08:33:14.696+0000 7f0c7c889680 1 bdev(0x60945520ae00 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00 /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or directory Did I miss something? Thanks Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : mercredi 30 juillet 2025 16:56:50 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Pgs troubleshooting Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not entirely sure and am on vacation. 😅 it could be worth a try. But don’t forget to reset min_size back to 2 afterwards. Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: Hi, did the two replaced OSDs fail at the sime time (before they were completely drained)? This would most likely mean that both those failed OSDs contained the other two replicas of this PG Unfortunately yes This would most likely mean that both those failed OSDs contained the other two replicas of this PG. A pg query should show which OSDs are missing. If I understand well I need to move my PG on the OSD 1 ? ceph -w osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost ceph pg query 11.4 "up": [ 1, 4, 5 ], "acting": [ 1, 4, 5 ], "avail_no_missing": [], "object_location_counts": [ { "shards": "3,4,5", "objects": 2 } ], "blocked_by": [], "up_primary": 1, "acting_primary": 1, "purged_snaps": [] }, Thanks Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : mardi 29 juillet 2025 16:48:41 À : ceph-users@xxxxxxx Objet : Re: Pgs troubleshooting Hi, did the two replaced OSDs fail at the sime time (before they were completely drained)? This would most likely mean that both those failed OSDs contained the other two replicas of this PG. A pg query should show which OSDs are missing. You could try with objectstore-tool to export the PG from the remaining OSD and import it on different OSDs. Or you mark the data as lost if you don't care about the data and want a healthy state quickly. Regards, Eugen Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: Thanks for your help ! This is my new pg stat with no more peering pgs (after rebooting some OSD) ceph pg stat -> 498 pgs: 1 active+recovery_unfound+degraded, 3 recovery_unfound+undersized+degraded+remapped+peered, 14 active+clean+scrubbing+deep, 480 active+clean; 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0 B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946 objects unfound (0.036%) ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to repair but nothing happened ceph -w -> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost ________________________________ De : Frédéric Nass <frederic.nass@xxxxxxxxx> Envoyé : mardi 29 juillet 2025 14:03:37 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Pgs troubleshooting Hi Vivien, Unless you ran 'ceph pg stat' command when peering was occuring, the 37 peering PGs might indicate a temporary peering issue with one or more OSDs. If that's the case then restarting associated OSDs could help with the peering or ceph pg. You could list those PGs and associated OSDs with 'ceph pg ls peering' and trigger peering by either restarting one common OSD or by using 'ceph pg repeer <pg_id>'. Regarding the unfound object and its associated backfill_unfound PG, you could identify this PG with 'ceph pg ls backfill_unfound' and investigate this PG with 'ceph pg <pg_id> query'. Depending on the output, you could try running a 'ceph pg repair <pg_id>'. Could you confirm that this PG is not part of a size=2 pool? Best regards, Frédéric. -- Frédéric Nass Ceph Ambassador France | Senior Ceph Engineer @ CLYSO Try our Ceph Analyzer -- https://analyzer.clyso.com/ https://clyso.com | frederic.nass@xxxxxxxxx<mailto:frederic.nass@xxxxxxxxx> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien <Vivien.GLE@xxxxxxxx<mailto:Vivien.GLE@xxxxxxxx>> a écrit : Hi, After replacing 2 OSD (data corruption), this is the stats of my testing ceph cluster ceph pg stat 498 pgs: 37 peering, 1 active+remapped+backfilling, 1 active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1 backfill_unfound+undersized+degraded+remapped+peered, 1 remapped+peering, 12 active+clean+scrubbing+deep, 1 active+undersized, 442 active+clean, 1 active+recovering+undersized+remapped 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1 op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced (0.015%); 1/13256 objects unfound (0.008%) ceph osd stat 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 remapped pgs Anyone had an idea of where to start to get a healthy cluster ? Thanks ! Vivien _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto: ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx