> 'ceph osd ok-to-stop' is a safety check, nothing more. Oh ok I though that you can add true to ignore pgs becoming inactive This is the ouput for one OSD , all of my OSD are unsafe to stop. root@ceph-monitor-1:/# ceph osd ok-to-stop 1 {"ok_to_stop":false,"osds":[1],"num_ok_pgs":30,"num_not_ok_pgs":170,"bad_become_inactive":["2.0","2.2","2.3","2.4","2.7","2.f","2.11","2.14","2.17","2.18","2.19","2.1a","2.1d","3.2","3.5","3.a","3.d","7.0","7.6","7.7","7.8","7.b","7.d","7.11","7.16","7.17","7.19","7.1f","10.2","10.9","10.b","10.c","10.d","10.e","10.10","10.11","10.14","10.1a","10.1b","10.1d","10.1f","11.1","11.4","11.5","11.7","11.9","11.a","11.c","11.d","11.e","11.11","11.13","11.14","11.15","11.16","11.1a","11.1b","11.1e","15.1","15.2","15.9","15.a","15.b","15.d","15.f","15.10","15.11","15.13","15.14","15.16","15.17","15.19","15.1c","15.1d","15.1f","16.2","16.3","16.5","16.6","16.9","16.c","16.d","16.f","16.18","16.19","16.1b","16.1c","16.1d","16.1e","17.6","17.a","17.b","17.c","17.d","17.10","17.11","17.18","17.1f","25.2","25.3","25.5","25.7","25.8","25.b","25.10","25.11","25.13","25.14","25.17","25.19","25.1a","25.1b","25.21","25.22","25.23","25.25","25.27","25.2a","25.2b","25.2e","25.31","25.35","25.37","25.3d","26.1","26.8","26.9","26.a","26.f","26.10","26.14","26.16","26.1a","26.1e","27.0","27.1","27.2","27.4","27.5","27.6","27.c","27.d","27.f","28.2","28.3","28.6","28.7","28.a","28.10","28.14","28.18","28.1a","28.1d","28.1e","28.1f","28.20","28.21","28.25","28.26","28.2a","28.2b","28.2c","28.2d","28.2f","28.33","28.34","28.37","28.3a","28.3d","28.3f"],"ok_become_degraded":["4.0","4.3","4.6","4.8","4.a","4.b","4.d","4.e","4.11","4.13","4.14","4.15","4.17","4.18","4.1b","4.1d","4.1f","18.0","18.2","18.6","18.c","18.d","18.e","18.14","18.16","18.17","18.19","18.1a","18.1d","18.1e"]} Error EBUSY: unsafe to stop osd(s) at this time (170 PGs are or would become offline) root@ceph-monitor-1:/# ceph osd pool ls detail pool 1 '.mgr' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 886201 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.98 pool 2 'rbd' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 670923 lfor 0/1645/1643 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 2.19 pool 3 'cephfs.toto-fs.meta' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 886203 lfor 0/0/64 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 2.19 pool 4 'cephfs.toto-fs.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3008 lfor 0/3008/3006 flags hashpspool,bulk max_bytes 32212254720 stripe_width 0 application cephfs read_balance_score 1.53 pool 7 '.nfs' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886199 lfor 0/0/140 flags hashpspool stripe_width 0 application nfs read_balance_score 1.53 pool 10 'prbd' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 670919 lfor 0/0/833 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.53 pool 11 '.rgw.root' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886205 lfor 0/0/833 flags hashpspool stripe_width 0 application rgw read_balance_score 1.53 pool 15 'testzone.rgw.log' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886225 lfor 0/0/1153 flags hashpspool stripe_width 0 application rgw read_balance_score 1.97 pool 16 'testzone.rgw.control' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886213 lfor 0/0/1153 flags hashpspool stripe_width 0 application rgw read_balance_score 1.53 pool 17 'testzone.rgw.meta' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886215 lfor 0/0/1875 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw read_balance_score 1.53 pool 18 'testzone.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1965 lfor 0/0/1877 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw read_balance_score 1.31 pool 25 'pool_VM' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 886217 lfor 0/0/885938 flags hashpspool,selfmanaged_snaps max_bytes 107374182400 stripe_width 0 target_size_bytes 536870912000 application rbd read_balance_score 1.64 pool 26 'k8s' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 886233 lfor 0/0/886231 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.53 pool 27 'cephfs.testfs.meta' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 886221 lfor 0/0/885939 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 1.75 pool 28 'cephfs.testfs.data' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 886242 lfor 0/0/886240 flags hashpspool,bulk stripe_width 0 application cephfs read_balance_score 1.31 root@ceph-monitor-1:/# ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 6.63879 - 6.4 TiB 104 GiB 95 GiB 164 KiB 9.6 GiB 6.3 TiB 1.60 1.00 - root inist -15 0.90970 - 932 GiB 12 GiB 11 GiB 35 KiB 889 MiB 920 GiB 1.29 0.80 - host ceph-monitor-1 6 hdd 0.90970 1.00000 932 GiB 12 GiB 11 GiB 35 KiB 889 MiB 920 GiB 1.29 0.80 197 up osd.6 -21 5.72910 - 5.5 TiB 92 GiB 84 GiB 129 KiB 8.7 GiB 5.4 TiB 1.65 1.03 - datacenter dcml -20 2.72910 - 2.7 TiB 46 GiB 42 GiB 62 KiB 4.0 GiB 2.7 TiB 1.64 1.02 - room it02 -19 2.72910 - 2.7 TiB 46 GiB 42 GiB 62 KiB 4.0 GiB 2.7 TiB 1.64 1.02 - row left -18 2.72910 - 2.7 TiB 46 GiB 42 GiB 62 KiB 4.0 GiB 2.7 TiB 1.64 1.02 - rack 10 -3 0.90970 - 932 GiB 14 GiB 13 GiB 17 KiB 1.2 GiB 918 GiB 1.50 0.94 - host ceph-node-1 2 hdd 0.90970 1.00000 932 GiB 14 GiB 13 GiB 17 KiB 1.2 GiB 918 GiB 1.50 0.94 201 up osd.2 -5 0.90970 - 932 GiB 16 GiB 14 GiB 28 KiB 1.6 GiB 916 GiB 1.70 1.06 - host ceph-node-2 1 hdd 0.90970 1.00000 932 GiB 16 GiB 14 GiB 28 KiB 1.6 GiB 916 GiB 1.70 1.06 200 up osd.1 -9 0.90970 - 932 GiB 16 GiB 15 GiB 17 KiB 1.2 GiB 915 GiB 1.72 1.07 - host ceph-node-3 5 hdd 0.90970 1.00000 932 GiB 16 GiB 15 GiB 17 KiB 1.2 GiB 915 GiB 1.72 1.07 200 up osd.5 -36 3.00000 - 2.7 TiB 47 GiB 42 GiB 67 KiB 4.8 GiB 2.7 TiB 1.67 1.04 - room it06 -35 3.00000 - 2.7 TiB 47 GiB 42 GiB 67 KiB 4.8 GiB 2.7 TiB 1.67 1.04 - row left06 -34 3.00000 - 2.7 TiB 47 GiB 42 GiB 67 KiB 4.8 GiB 2.7 TiB 1.67 1.04 - rack 08 -7 1.00000 - 932 GiB 15 GiB 13 GiB 21 KiB 1.6 GiB 917 GiB 1.57 0.98 - host ceph-node-4 0 hdd 1.00000 1.00000 932 GiB 15 GiB 13 GiB 21 KiB 1.6 GiB 917 GiB 1.57 0.98 207 up osd.0 -13 1.00000 - 932 GiB 15 GiB 14 GiB 31 KiB 1.6 GiB 916 GiB 1.63 1.02 - host ceph-node-5 3 hdd 1.00000 1.00000 932 GiB 15 GiB 14 GiB 31 KiB 1.6 GiB 916 GiB 1.63 1.02 217 up osd.3 -11 1.00000 - 932 GiB 17 GiB 15 GiB 15 KiB 1.5 GiB 915 GiB 1.80 1.12 - host ceph-node-6 4 hdd 1.00000 1.00000 932 GiB 17 GiB 15 GiB 15 KiB 1.5 GiB 915 GiB 1.80 1.12 221 up osd.4 TOTAL 6.4 TiB 104 GiB 95 GiB 168 KiB 9.6 GiB 6.3 TiB 1.60 MIN/MAX VAR: 0.80/1.12 STDDEV: 0.16 Vivien ________________________________ De : Eugen Block <eblock@xxxxxx> Envoyé : lundi 18 août 2025 12:37:14 À : ceph-users@xxxxxxx Objet : Re: Ceph upgrade OSD unsafe to stop Hi, 'ceph osd ok-to-stop' is a safety check, nothing more. It basically checks if PGs would become inactive if you stopped said OSD or if those PGs would become only degraded. Which OSD does report that it's unsafe to stop? Can you paste the output of 'ceph osd ok-to-stop <OSD_ID>'? And with that also 'ceph osd pool ls detail' to see which pool(s) is/are affected. And 'ceph osd df tree' can also be useful here. Regards, Eugen Zitat von "GLE, Vivien" <Vivien.GLE@xxxxxxxx>: > Hi, > > > I'm trying to update my cluster (19.2.2 -> 19.2.3), mon and mgr > upgrade goes well but I had some issue with OSD : > > > Upgrade: unsafe to stop osd(s) at this time (165 PGs are or would > become offline) > > > Cluster is in health_ok > > All pools are replica 3 and pgs active+clean > > autoscaler is off following the ceph docs > > > Does ceph osd ok-to-stop lead to lost data ? > > > The only rules used in the cluster is replicated_rule : > > > root@ceph-monitor-1:/# ceph osd crush rule dump replicated_rule > > { > "rule_id": 0, > "rule_name": "replicated_rule", > "type": 1, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "inist" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > > root@ceph-monitor-1:/# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS > REWEIGHT PRI-AFF > -1 6.63879 root inist > -15 0.90970 host ceph-monitor-1 > 6 hdd 0.90970 osd.6 up > 1.00000 1.00000 > -21 5.72910 datacenter bat1 > -20 2.72910 room room01 > -19 2.72910 row left > -18 2.72910 rack 10 > -3 0.90970 host ceph-node-1 > 2 hdd 0.90970 osd.2 up > 1.00000 1.00000 > -5 0.90970 host ceph-node-2 > 1 hdd 0.90970 osd.1 up > 1.00000 1.00000 > -9 0.90970 host ceph-node-3 > 5 hdd 0.90970 osd.5 up > 1.00000 1.00000 > -36 3.00000 room room03 > -35 3.00000 row left06 > -34 3.00000 rack 08 > -7 1.00000 host ceph-node-4 > 0 hdd 1.00000 osd.0 up > 1.00000 1.00000 > -13 1.00000 host ceph-node-5 > 3 hdd 1.00000 osd.3 up > 1.00000 1.00000 > -11 1.00000 host ceph-node-6 > 4 hdd 1.00000 osd.4 up > 1.00000 1.00000 > > Thanks ! > > Vivien > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx