Igor, Thank you kindly, sir! You were exactly correct on the root cause, and the explanation makes perfect sense. Thank you so much.... Marco On Tue, Apr 29, 2025 at 1:34 PM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > Marco, > > this validation was introduced in v18.2.5 as not following the rule could > result in OSD crash in some cases. > > So better to catch that sooner than later. > > > Thanks, > > Igor > On 29.04.2025 14:27, Marco Pizzolo wrote: > > Hi Igor, > > Thank you so very much for responding so quickly. Interestingly, I don't > remember setting these values, but I did see a global level override for > 0.8 on one, and 0.2 on another, so I removed the global overrides and am > rebooting the server to see what happens. > > I should know soon enough how things are looking. > > I'll report back, but I don't understand why I would have been able to > upgrade this over the past 4-5 years from 14 --> 15 --> 16 --> 17 --> > 18.2.4 without issues, but now going from 18.2.4 --> 18.2.6 I am dead in > the water. > > Thanks, > Marco > > On Tue, Apr 29, 2025 at 1:18 PM Igor Fedotov <igor.fedotov@xxxxxxxx> > wrote: > >> Hi Marco, >> >> the following log line (unfortunately it was cut off) sheds some light: >> >> " >> Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1 >> bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes >> bluestore_cache_meta_> >> >> " >> >> Likely it says that sum of bluestore_cache_meta_ratio + >> bluestore_cache_kv_ratio + bluestore_cache_kv_onode_ratio config >> parameters exceeds 1.0 >> >> So one has to tune the parameters in a way to get the sum less or equal >> to 1.0. >> >> Default settings are: >> >> bluestore_cache_meta_ratio = 0.45 >> >> bluestore_cache_kv_ratio = 0.45 >> >> bluestore_cache_kv_onode_ratio = 0.04 >> >> >> Thanks, >> >> Igor >> >> >> >> On 29.04.2025 13:36, Marco Pizzolo wrote: >> > Hello Everyone, >> > >> > I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8 >> > NVMe's per node. Each NVMe is split into 2 OSDs. The upgrade went >> through >> > the mgr, mon, crash and began upgrading OSDs. >> > >> > The OSDs it was upgrading were not coming back online. >> > >> > I tried rebooting, and no luck. >> > >> > journalctl -xe shows the following: >> > >> > ░░ The unit >> > >> docker-02cb79ef9a657cdaa26b781966aa6d2f1d5e54cdc9efa6c5ff1f0e98c3a866e4.scope >> > has successfully entered the 'dead' state. >> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]: >> > time="2025-04-29T06:24:09.282073583-04:00" level=info msg="ignoring >> event" >> > container=76c56ddd668015de0022bfa2527060e64a9513> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.282129114-04:00" level=info msg="shim >> > disconnected" id=76c56ddd668015de0022bfa2527060e64a95137> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.282219664-04:00" level=warning msg="cleaning >> up >> > after shim disconnected" id=76c56ddd668015de00> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.282242484-04:00" level=info msg="cleaning up >> dead >> > shim" >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 1 mClockScheduler: >> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p> >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 0 osd.3:0.OSDShard using op >> > scheduler mclock_scheduler, cutoff=196 >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000 >> > /var/lib/ceph/osd/ceph-3/block) open path /var/lib/cep> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.292047607-04:00" level=warning msg="cleanup >> > warnings time=\"2025-04-29T06:24:09-04:00\" level=> >> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]: >> > time="2025-04-29T06:24:09.292163618-04:00" level=info msg="ignoring >> event" >> > container=02cb79ef9a657cdaa26b781966aa6d2f1d5e54> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.292216428-04:00" level=info msg="shim >> > disconnected" id=02cb79ef9a657cdaa26b781966aa6d2f1d5e54c> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.292277279-04:00" level=warning msg="cleaning >> up >> > after shim disconnected" id=02cb79ef9a657cdaa2> >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.292291949-04:00" level=info msg="cleaning up >> dead >> > shim" >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000 >> > /var/lib/ceph/osd/ceph-3/block) open size 640122932428> >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1 >> > bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes >> bluestore_cache_meta_> >> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug >> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000 >> > /var/lib/ceph/osd/ceph-3/block) close >> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]: >> > time="2025-04-29T06:24:09.303385220-04:00" level=warning msg="cleanup >> > warnings time=\"2025-04-29T06:24:09-04:00\" level=> >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 1 mClockScheduler: >> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p> >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 0 osd.0:0.OSDShard using op >> > scheduler mclock_scheduler, cutoff=196 >> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug >> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 osd.15 0 OSD:init: unable >> to >> > mount object store >> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug >> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 ** ERROR: osd init failed: >> > (22) Invalid argument >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000 >> > /var/lib/ceph/osd/ceph-0/block) open path /var/lib/cep> >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000 >> > /var/lib/ceph/osd/ceph-0/block) open size 640122932428> >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 -1 >> > bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes >> bluestore_cache_meta_> >> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug >> > 2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000 >> > /var/lib/ceph/osd/ceph-0/block) close >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 1 mClockScheduler: >> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p> >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 0 osd.8:0.OSDShard using op >> > scheduler mclock_scheduler, cutoff=196 >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000 >> > /var/lib/ceph/osd/ceph-8/block) open path /var/lib/cep> >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000 >> > /var/lib/ceph/osd/ceph-8/block) open size 640122932428> >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 -1 >> > bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes >> bluestore_cache_meta_> >> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug >> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000 >> > /var/lib/ceph/osd/ceph-8/block) close >> > Apr 29 06:24:09 prdhcistonode01 systemd[1]: >> > ceph-fbc38f5c-a3a6-11ea-805c-3b954db9ce7a@osd.12.service: Main process >> > exited, code=exited, status=1/FAILURE >> > >> > >> > Any help you can offer would be greatly appreciated. This is running in >> > docker: >> > >> > Client: Docker Engine - Community >> > Version: 24.0.7 >> > API version: 1.43 >> > Go version: go1.20.10 >> > Git commit: afdd53b >> > Built: Thu Oct 26 09:08:01 2023 >> > OS/Arch: linux/amd64 >> > Context: default >> > >> > Server: Docker Engine - Community >> > Engine: >> > Version: 24.0.7 >> > API version: 1.43 (minimum version 1.12) >> > Go version: go1.20.10 >> > Git commit: 311b9ff >> > Built: Thu Oct 26 09:08:01 2023 >> > OS/Arch: linux/amd64 >> > Experimental: false >> > containerd: >> > Version: 1.6.25 >> > GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f >> > runc: >> > Version: 1.1.10 >> > GitCommit: v1.1.10-0-g18a0cb0 >> > docker-init: >> > Version: 0.19.0 >> > GitCommit: de40ad0 >> > >> > Thanks, >> > Marco >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx