Hi, I got 3 incomplete PG that I put as mark-complete because they were empty (I think I lost data from them) 1 was recovery_unfound, I mark_unfound_lost revert this one but I have beetwen 5-25 deep_scrubbing PGs, I believe this is not normal ? (it's been since 5 days) Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : vendredi 1 août 2025 15:58:22 À : GLE, Vivien Cc : ceph-users@xxxxxxx Objet : Re: Re: Pgs troubleshooting Dont worry, I just wanted to point out that careful reading is crucial. :-) So you got the OSDs back up, but were you also able to recover the pg? Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: > I lost all perspective and didn't read carefully this message.. > Sorry for that > > > Thanks for your help I'm very grateful > > > Vivien > > ________________________________ > De : Eugen Block <eblock@xxxxxx> > Envoyé : vendredi 1 août 2025 15:27:56 > À : GLE, Vivien > Cc : ceph-users@xxxxxxx > Objet : Re: Re: Pgs troubleshooting > > That’s why I mentioned this two days ago: > > cephadm shell -- ceph-objectstore-tool --op list … > > That’s how you can execute commands directly with cephadm shell, this > is useful for batch operations like a for loop or similar. Of course, > first entering the shell and then execute commands works quite as well. > > Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: > >> I was using ceph-objectstore-tool the wrong way by doing it on host >> instead of inside container via cephadm shell --name osd.x >> >> >> ________________________________ >> De : GLE, Vivien <Vivien.GLE@xxxxxxxx> >> Envoyé : vendredi 1 août 2025 09:02:59 >> À : Eugen Block >> Cc : ceph-users@xxxxxxx >> Objet : Re: Pgs troubleshooting >> >> Hi, >> >> >> What is the good way of using objectstore tool ? >> >> >> My OSD are up ! I purged ceph-* on my host following this thread : >> https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/ >> >> >> " Make sure that the base OS does not have any ceph packages >> installed, with Ubuntu in the past had issues with ceph-common being >> installed on the host OS and it trying to take ownership of the >> containerized ceph deployment. If you run into any issues check the >> base OS for ceph-* packages and uninstall. " >> >> >> I believe the only good way to use ceph commands is in cephadm >> >> >> Thanks for your help ! >> >> ________________________________ >> De : Eugen Block <eblock@xxxxxx> >> Envoyé : jeudi 31 juillet 2025 19:42:21 >> À : GLE, Vivien >> Cc : ceph-users@xxxxxxx >> Objet : Re: Re: Pgs troubleshooting >> >> To use the objectstore tool within the container you don’t have to >> specify the cluster’s FSID because it’s mapped into the container. By >> using the objectstore tool you might have changed the ownership of the >> directory, change it back to the previous state. Other OSDs will show >> you which uid/user and/or gid/group that is. >> >> Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: >> >>> I'm sorry for the confusion ! >>> >>> I paste the wrong output. >>> >>> >>> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list >>> --pgid 11.4 --no-mon-config >>> >>> OSD.1 log >>> >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 set uid:gid to 167:167 >>> (ceph:ceph) >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 ceph version 19.2.2 >>> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process >>> ceph-osd, pid 7 >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 pidfile_write: ignore >>> empty --pid-file >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 1 bdev(0x57bd64210e00 >>> /var/lib/ceph/osd/ceph-1/block) open path >>> /var/lib/ceph/osd/ceph-1/block >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00 >>> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 ** ERROR: unable to >>> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or >>> directory >>> >>> ---------------------- >>> >>> I retried on OSD.2 with PG 2.1 to see if I disabled instead of just >>> stopped the OSD.2 before objectstore-tool operation will change >>> something but same error occurred >>> >>> >>> >>> ________________________________ >>> De : Eugen Block <eblock@xxxxxx> >>> Envoyé : jeudi 31 juillet 2025 13:27:51 >>> À : GLE, Vivien >>> Cc : ceph-users@xxxxxxx >>> Objet : Re: Re: Pgs troubleshooting >>> >>> Why did you look at OSD.2? According to the query output you provided >>> I would have looked at OSD.1 (acting set). And you pasted the output >>> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing. >>> >>> >>> Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: >>> >>>> I dont get why is he searching in this path because there is nothing >>>> and this is the command I used to check bluestore >>>> >>>> >>>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list >>>> --pgid 2.1 --no-mon-config >>>> >>>> ________________________________ >>>> De : GLE, Vivien >>>> Envoyé : jeudi 31 juillet 2025 09:38:25 >>>> À : Eugen Block >>>> Cc : ceph-users@xxxxxxx >>>> Objet : RE: Re: Pgs troubleshooting >>>> >>>> >>>> Hi, >>>> >>>> >>>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not >>>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t >>>>> forget to reset min_size back to 2 afterwards. >>>> >>>> >>>> Did it but nothing really changed, how many time should I wait to >>>> see if it does something ? >>>> >>>> >>>>> No, you use the ceph-objectstore-tool to export the PG from the intact >>>>> OSD (you need to stop it though, set noout flag), make sure you have >>>>> enough disk space. >>>> >>>> >>>> I stopped my OSD and noout to check if my PG is stored in bluestore >>>> (he is not) but when I tried to restart my OSD, OSD superblock was >>>> gone >>>> >>>> >>>> 2025-07-31T08:33:14.696+0000 7f0c7c889680 1 bdev(0x60945520ae00 >>>> /var/lib/ceph/osd/ceph-2/block) open path >>>> /var/lib/ceph/osd/ceph-2/block >>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00 >>>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied >>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 ** ERROR: unable to >>>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or >>>> directory >>>> >>>> Did I miss something? >>>> >>>> Thanks >>>> Vivien >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> De : Eugen Block <eblock@xxxxxx> >>>> Envoyé : mercredi 30 juillet 2025 16:56:50 >>>> À : GLE, Vivien >>>> Cc : ceph-users@xxxxxxx >>>> Objet : Re: Pgs troubleshooting >>>> >>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not >>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t >>>> forget to reset min_size back to 2 afterwards. >>>> >>>> Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: >>>> >>>>> Hi, >>>>> >>>>> >>>>>> did the two replaced OSDs fail at the sime time (before they were >>>>>> completely drained)? This would most likely mean that both those >>>>>> failed OSDs contained the other two replicas of this PG >>>>> >>>>> >>>>> Unfortunately yes >>>>> >>>>> >>>>>> This would most likely mean that both those >>>>>> failed OSDs contained the other two replicas of this PG. A pg query >>>>>> should show which OSDs are missing. >>>>> >>>>> >>>>> If I understand well I need to move my PG on the OSD 1 ? >>>>> >>>>> >>>>> ceph -w >>>>> >>>>> >>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost >>>>> >>>>> >>>>> ceph pg query 11.4 >>>>> >>>>> >>>>> >>>>> "up": [ >>>>> 1, >>>>> 4, >>>>> 5 >>>>> ], >>>>> "acting": [ >>>>> 1, >>>>> 4, >>>>> 5 >>>>> ], >>>>> "avail_no_missing": [], >>>>> "object_location_counts": [ >>>>> { >>>>> "shards": "3,4,5", >>>>> "objects": 2 >>>>> } >>>>> ], >>>>> "blocked_by": [], >>>>> "up_primary": 1, >>>>> "acting_primary": 1, >>>>> "purged_snaps": [] >>>>> }, >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Vivien >>>>> >>>>> ________________________________ >>>>> De : Eugen Block <eblock@xxxxxx> >>>>> Envoyé : mardi 29 juillet 2025 16:48:41 >>>>> À : ceph-users@xxxxxxx >>>>> Objet : Re: Pgs troubleshooting >>>>> >>>>> Hi, >>>>> >>>>> did the two replaced OSDs fail at the sime time (before they were >>>>> completely drained)? This would most likely mean that both those >>>>> failed OSDs contained the other two replicas of this PG. A pg query >>>>> should show which OSDs are missing. >>>>> You could try with objectstore-tool to export the PG from the >>>>> remaining OSD and import it on different OSDs. Or you mark the data as >>>>> lost if you don't care about the data and want a healthy state quickly. >>>>> >>>>> Regards, >>>>> Eugen >>>>> >>>>> Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: >>>>> >>>>>> Thanks for your help ! This is my new pg stat with no more peering >>>>>> pgs (after rebooting some OSD) >>>>>> >>>>>> ceph pg stat -> >>>>>> >>>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3 >>>>>> recovery_unfound+undersized+degraded+remapped+peered, 14 >>>>>> active+clean+scrubbing+deep, 480 active+clean; >>>>>> >>>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0 >>>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946 >>>>>> objects unfound (0.036%) >>>>>> >>>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to >>>>>> repair but nothing happened >>>>>> >>>>>> >>>>>> ceph -w -> >>>>>> >>>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> De : Frédéric Nass <frederic.nass@xxxxxxxxx> >>>>>> Envoyé : mardi 29 juillet 2025 14:03:37 >>>>>> À : GLE, Vivien >>>>>> Cc : ceph-users@xxxxxxx >>>>>> Objet : Re: Pgs troubleshooting >>>>>> >>>>>> Hi Vivien, >>>>>> >>>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the >>>>>> 37 peering PGs might indicate a temporary peering issue with one or >>>>>> more OSDs. If that's the case then restarting associated OSDs could >>>>>> help with the peering or ceph pg. You could list those PGs and >>>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by >>>>>> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'. >>>>>> >>>>>> Regarding the unfound object and its associated backfill_unfound PG, >>>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and >>>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the >>>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you >>>>>> confirm that this PG is not part of a size=2 pool? >>>>>> >>>>>> Best regards, >>>>>> Frédéric. >>>>>> >>>>>> -- >>>>>> Frédéric Nass >>>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO >>>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/ >>>>>> https://clyso.com | >>>>>> frederic.nass@xxxxxxxxx<mailto:frederic.nass@xxxxxxxxx> >>>>>> >>>>>> >>>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien >>>>>> <Vivien.GLE@xxxxxxxx<mailto:Vivien.GLE@xxxxxxxx>> a écrit : >>>>>> Hi, >>>>>> >>>>>> After replacing 2 OSD (data corruption), this is the stats of my >>>>>> testing ceph cluster >>>>>> >>>>>> ceph pg stat >>>>>> >>>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1 >>>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1 >>>>>> backfill_unfound+undersized+degraded+remapped+peered, 1 >>>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1 >>>>>> active+undersized, 442 active+clean, 1 >>>>>> active+recovering+undersized+remapped >>>>>> >>>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1 >>>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced >>>>>> (0.015%); 1/13256 objects unfound (0.008%) >>>>>> >>>>>> ceph osd stat >>>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 >>>>>> remapped pgs >>>>>> >>>>>> Anyone had an idea of where to start to get a healthy cluster ? >>>>>> >>>>>> Thanks ! >>>>>> >>>>>> Vivien >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >>>>>> To unsubscribe send an email to >>>>>> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx