Re: 18.2.6 upgrade OSDs fail to mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Igor,

Thank you kindly, sir!

You were exactly correct on the root cause, and the explanation makes
perfect sense.

Thank you so much....

Marco


On Tue, Apr 29, 2025 at 1:34 PM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Marco,
>
> this validation was introduced in v18.2.5 as not following the rule could
> result in OSD crash in some cases.
>
> So better to catch that sooner than later.
>
>
> Thanks,
>
> Igor
> On 29.04.2025 14:27, Marco Pizzolo wrote:
>
> Hi Igor,
>
> Thank you so very much for responding so quickly.  Interestingly, I don't
> remember setting these values, but I did see a global level override for
> 0.8 on one, and 0.2 on another, so I removed the global overrides and am
> rebooting the server to see what happens.
>
> I should know soon enough how things are looking.
>
> I'll report back, but I don't understand why I would have been able to
> upgrade this over the past 4-5 years from 14 --> 15 --> 16 --> 17 -->
> 18.2.4 without issues, but now going from 18.2.4 --> 18.2.6 I am dead in
> the water.
>
> Thanks,
> Marco
>
> On Tue, Apr 29, 2025 at 1:18 PM Igor Fedotov <igor.fedotov@xxxxxxxx>
> wrote:
>
>> Hi Marco,
>>
>> the following log line (unfortunately it was cut off) sheds some light:
>>
>> "
>> Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
>> bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes
>> bluestore_cache_meta_>
>>
>> "
>>
>> Likely it says that sum of bluestore_cache_meta_ratio +
>> bluestore_cache_kv_ratio + bluestore_cache_kv_onode_ratio config
>> parameters exceeds 1.0
>>
>> So one has to tune the parameters in a way to get the sum less or equal
>> to 1.0.
>>
>> Default settings are:
>>
>> bluestore_cache_meta_ratio = 0.45
>>
>> bluestore_cache_kv_ratio = 0.45
>>
>> bluestore_cache_kv_onode_ratio = 0.04
>>
>>
>> Thanks,
>>
>> Igor
>>
>>
>>
>> On 29.04.2025 13:36, Marco Pizzolo wrote:
>> > Hello Everyone,
>> >
>> > I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8
>> > NVMe's per node.  Each NVMe is split into 2 OSDs.  The upgrade went
>> through
>> > the mgr, mon, crash and began upgrading OSDs.
>> >
>> > The OSDs it was upgrading were not coming back online.
>> >
>> > I tried rebooting, and no luck.
>> >
>> > journalctl -xe shows the following:
>> >
>> > ░░ The unit
>> >
>> docker-02cb79ef9a657cdaa26b781966aa6d2f1d5e54cdc9efa6c5ff1f0e98c3a866e4.scope
>> > has successfully entered the 'dead' state.
>> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
>> > time="2025-04-29T06:24:09.282073583-04:00" level=info msg="ignoring
>> event"
>> > container=76c56ddd668015de0022bfa2527060e64a9513>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.282129114-04:00" level=info msg="shim
>> > disconnected" id=76c56ddd668015de0022bfa2527060e64a95137>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.282219664-04:00" level=warning msg="cleaning
>> up
>> > after shim disconnected" id=76c56ddd668015de00>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.282242484-04:00" level=info msg="cleaning up
>> dead
>> > shim"
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 mClockScheduler:
>> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  0 osd.3:0.OSDShard using op
>> > scheduler mclock_scheduler, cutoff=196
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
>> > /var/lib/ceph/osd/ceph-3/block) open path /var/lib/cep>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.292047607-04:00" level=warning msg="cleanup
>> > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
>> > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
>> > time="2025-04-29T06:24:09.292163618-04:00" level=info msg="ignoring
>> event"
>> > container=02cb79ef9a657cdaa26b781966aa6d2f1d5e54>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.292216428-04:00" level=info msg="shim
>> > disconnected" id=02cb79ef9a657cdaa26b781966aa6d2f1d5e54c>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.292277279-04:00" level=warning msg="cleaning
>> up
>> > after shim disconnected" id=02cb79ef9a657cdaa2>
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.292291949-04:00" level=info msg="cleaning up
>> dead
>> > shim"
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
>> > /var/lib/ceph/osd/ceph-3/block) open size 640122932428>
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
>> > bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes
>> bluestore_cache_meta_>
>> > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
>> > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
>> > /var/lib/ceph/osd/ceph-3/block) close
>> > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
>> > time="2025-04-29T06:24:09.303385220-04:00" level=warning msg="cleanup
>> > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 mClockScheduler:
>> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740  0 osd.0:0.OSDShard using op
>> > scheduler mclock_scheduler, cutoff=196
>> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
>> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 osd.15 0 OSD:init: unable
>> to
>> > mount object store
>> > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
>> > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1  ** ERROR: osd init failed:
>> > (22) Invalid argument
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
>> > /var/lib/ceph/osd/ceph-0/block) open path /var/lib/cep>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
>> > /var/lib/ceph/osd/ceph-0/block) open size 640122932428>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740 -1
>> > bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes
>> bluestore_cache_meta_>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
>> > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
>> > /var/lib/ceph/osd/ceph-0/block) close
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 mClockScheduler:
>> > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  0 osd.8:0.OSDShard using op
>> > scheduler mclock_scheduler, cutoff=196
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
>> > /var/lib/ceph/osd/ceph-8/block) open path /var/lib/cep>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
>> > /var/lib/ceph/osd/ceph-8/block) open size 640122932428>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740 -1
>> > bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes
>> bluestore_cache_meta_>
>> > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
>> > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
>> > /var/lib/ceph/osd/ceph-8/block) close
>> > Apr 29 06:24:09 prdhcistonode01 systemd[1]:
>> > ceph-fbc38f5c-a3a6-11ea-805c-3b954db9ce7a@osd.12.service: Main process
>> > exited, code=exited, status=1/FAILURE
>> >
>> >
>> > Any help you can offer would be greatly appreciated.  This is running in
>> > docker:
>> >
>> > Client: Docker Engine - Community
>> >   Version:           24.0.7
>> >   API version:       1.43
>> >   Go version:        go1.20.10
>> >   Git commit:        afdd53b
>> >   Built:             Thu Oct 26 09:08:01 2023
>> >   OS/Arch:           linux/amd64
>> >   Context:           default
>> >
>> > Server: Docker Engine - Community
>> >   Engine:
>> >    Version:          24.0.7
>> >    API version:      1.43 (minimum version 1.12)
>> >    Go version:       go1.20.10
>> >    Git commit:       311b9ff
>> >    Built:            Thu Oct 26 09:08:01 2023
>> >    OS/Arch:          linux/amd64
>> >    Experimental:     false
>> >   containerd:
>> >    Version:          1.6.25
>> >    GitCommit:        d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
>> >   runc:
>> >    Version:          1.1.10
>> >    GitCommit:        v1.1.10-0-g18a0cb0
>> >   docker-init:
>> >    Version:          0.19.0
>> >    GitCommit:        de40ad0
>> >
>> > Thanks,
>> > Marco
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux