Re: 18.2.6 upgrade OSDs fail to mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

Thank you so very much for responding so quickly.  Interestingly, I don't
remember setting these values, but I did see a global level override for
0.8 on one, and 0.2 on another, so I removed the global overrides and am
rebooting the server to see what happens.

I should know soon enough how things are looking.

I'll report back, but I don't understand why I would have been able to
upgrade this over the past 4-5 years from 14 --> 15 --> 16 --> 17 -->
18.2.4 without issues, but now going from 18.2.4 --> 18.2.6 I am dead in
the water.

Thanks,
Marco

On Tue, Apr 29, 2025 at 1:18 PM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Hi Marco,
>
> the following log line (unfortunately it was cut off) sheds some light:
>
> "
> Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
> bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes bluestore_cache_meta_>
>
> "
>
> Likely it says that sum of bluestore_cache_meta_ratio +
> bluestore_cache_kv_ratio + bluestore_cache_kv_onode_ratio config
> parameters exceeds 1.0
>
> So one has to tune the parameters in a way to get the sum less or equal
> to 1.0.
>
> Default settings are:
>
> bluestore_cache_meta_ratio = 0.45
>
> bluestore_cache_kv_ratio = 0.45
>
> bluestore_cache_kv_onode_ratio = 0.04
>
>
> Thanks,
>
> Igor
>
>
>
> On 29.04.2025 13:36, Marco Pizzolo wrote:
> > Hello Everyone,
> >
> > I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8
> > NVMe's per node.  Each NVMe is split into 2 OSDs.  The upgrade went
> through
> > the mgr, mon, crash and began upgrading OSDs.
> >
> > The OSDs it was upgrading were not coming back online.
> >
> > I tried rebooting, and no luck.
> >
> > journalctl -xe shows the following:
> >
> > ░░ The unit
> >
> docker-02cb79ef9a657cdaa26b781966aa6d2f1d5e54cdc9efa6c5ff1f0e98c3a866e4.scope
> > has successfully entered the 'dead' state.
> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
> > time="2025-04-29T06:24:09.282073583-04:00" level=info msg="ignoring
> event"
> > container=76c56ddd668015de0022bfa2527060e64a9513>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.282129114-04:00" level=info msg="shim
> > disconnected" id=76c56ddd668015de0022bfa2527060e64a95137>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.282219664-04:00" level=warning msg="cleaning up
> > after shim disconnected" id=76c56ddd668015de00>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.282242484-04:00" level=info msg="cleaning up
> dead
> > shim"
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 mClockScheduler:
> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  0 osd.3:0.OSDShard using op
> > scheduler mclock_scheduler, cutoff=196
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
> > /var/lib/ceph/osd/ceph-3/block) open path /var/lib/cep>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.292047607-04:00" level=warning msg="cleanup
> > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
> > time="2025-04-29T06:24:09.292163618-04:00" level=info msg="ignoring
> event"
> > container=02cb79ef9a657cdaa26b781966aa6d2f1d5e54>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.292216428-04:00" level=info msg="shim
> > disconnected" id=02cb79ef9a657cdaa26b781966aa6d2f1d5e54c>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.292277279-04:00" level=warning msg="cleaning up
> > after shim disconnected" id=02cb79ef9a657cdaa2>
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.292291949-04:00" level=info msg="cleaning up
> dead
> > shim"
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
> > /var/lib/ceph/osd/ceph-3/block) open size 640122932428>
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes
> bluestore_cache_meta_>
> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
> > /var/lib/ceph/osd/ceph-3/block) close
> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
> > time="2025-04-29T06:24:09.303385220-04:00" level=warning msg="cleanup
> > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 mClockScheduler:
> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740  0 osd.0:0.OSDShard using op
> > scheduler mclock_scheduler, cutoff=196
> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 osd.15 0 OSD:init: unable to
> > mount object store
> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1  ** ERROR: osd init failed:
> > (22) Invalid argument
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
> > /var/lib/ceph/osd/ceph-0/block) open path /var/lib/cep>
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
> > /var/lib/ceph/osd/ceph-0/block) open size 640122932428>
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740 -1
> > bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes
> bluestore_cache_meta_>
> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
> > /var/lib/ceph/osd/ceph-0/block) close
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 mClockScheduler:
> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  0 osd.8:0.OSDShard using op
> > scheduler mclock_scheduler, cutoff=196
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
> > /var/lib/ceph/osd/ceph-8/block) open path /var/lib/cep>
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
> > /var/lib/ceph/osd/ceph-8/block) open size 640122932428>
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 -1
> > bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes
> bluestore_cache_meta_>
> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
> > /var/lib/ceph/osd/ceph-8/block) close
> > Apr 29 06:24:09 prdhcistonode01 systemd[1]:
> > ceph-fbc38f5c-a3a6-11ea-805c-3b954db9ce7a@osd.12.service: Main process
> > exited, code=exited, status=1/FAILURE
> >
> >
> > Any help you can offer would be greatly appreciated.  This is running in
> > docker:
> >
> > Client: Docker Engine - Community
> >   Version:           24.0.7
> >   API version:       1.43
> >   Go version:        go1.20.10
> >   Git commit:        afdd53b
> >   Built:             Thu Oct 26 09:08:01 2023
> >   OS/Arch:           linux/amd64
> >   Context:           default
> >
> > Server: Docker Engine - Community
> >   Engine:
> >    Version:          24.0.7
> >    API version:      1.43 (minimum version 1.12)
> >    Go version:       go1.20.10
> >    Git commit:       311b9ff
> >    Built:            Thu Oct 26 09:08:01 2023
> >    OS/Arch:          linux/amd64
> >    Experimental:     false
> >   containerd:
> >    Version:          1.6.25
> >    GitCommit:        d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
> >   runc:
> >    Version:          1.1.10
> >    GitCommit:        v1.1.10-0-g18a0cb0
> >   docker-init:
> >    Version:          0.19.0
> >    GitCommit:        de40ad0
> >
> > Thanks,
> > Marco
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux