Re: Disk failure (with osds failure) cause 'unrelated|different' osd device to crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi ,


The issue here is that one nvme failed so related osds ( let's say osd.1 , osd.2 , osd.3 and osd.4 ) all crashes and in the meantime another osd ( osd.108 ) on the same host crashes too (2 times with 2 different failed nvme ) on the same assert_line and assert_file as below ) .

So why this happened just one the nvme failed? Is they are related ?

I can,'t have a clear vue ...


Regards

On 8/21/25 15:57, Anthony D'Atri wrote:

We had a ceph cluster version 18.2.7 deployed with four osds per nvme device.
Post-Quincy the benefits of multiple OSDs per NVMe device are mostly obviated.

Or are you saying that you have HDD OSDs that offload WAL+DB?

Two weeks ago, we lost one hard drive ( so 4 osds ) , just after the osds had crashed , we had another "healthy" osd which crashed ( few minutes after the initial hard drive failure ) .

Again , this week we lost another hard drive ( lost 4 osds ) and again the same osd crashed too ( few minutes later ) .

Here below the crash information , the same crash information for both crashes :


  ceph crash info 2025-08-21T00:31:13.929426Z_576e6e42-7c6b-49b8-90f1-9d51730f8ac2
{
     "assert_condition": "r == 0",
     "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.7/rpm/el9/BUILD/ceph-18.2.7/src/os/bluestore/BlueStore.cc",
     "assert_func": "void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)",
     "assert_line": 12944,
     "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.7/rpm/el9/BUILD/ceph-18.2.7/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f5fba6a3640 time 2025-08-21T00:31:13.926894+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.7/rpm/el9/BUILD/ceph-18.2.7/src/os/bluestore/BlueStore.cc: 12944: FAILED ceph_assert(r == 0)\n",
     "assert_thread_name": "bstore_kv_sync",
     "backtrace": [
         "/lib64/libc.so.6(+0x3ebf0) [0x7f5fce5a5bf0]",
         "/lib64/libc.so.6(+0x8bf5c) [0x7f5fce5f2f5c]",
         "raise()",
         "abort()",
         "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x169) [0x560bc2d0e44d]",
         "/usr/bin/ceph-osd(+0x3cc5ae) [0x560bc2d0e5ae]",
         "/usr/bin/ceph-osd(+0x3b8864) [0x560bc2cfa864]",
         "(BlueStore::_kv_sync_thread()+0x1073) [0x560bc32bb623]",
         "/usr/bin/ceph-osd(+0x8fd3b1) [0x560bc323f3b1]",
         "/lib64/libc.so.6(+0x8a21a) [0x7f5fce5f121a]",
         "/lib64/libc.so.6(+0x10f290) [0x7f5fce676290]"
     ],
     "ceph_version": "18.2.7",
     "crash_id": "2025-08-21T00:31:13.929426Z_576e6e42-7c6b-49b8-90f1-9d51730f8ac2",
     "entity_name": "osd.108",
     "os_id": "centos",
     "os_name": "CentOS Stream",
     "os_version": "9",
     "os_version_id": "9",
     "process_name": "ceph-osd",
     "stack_sig": "0080731b49e5583e6d168903c1ea7df8bc2caded6b5e24ee4381077d54b045e2",
     "timestamp": "2025-08-21T00:31:13.929426Z",
     "utsname_hostname": "pub1-cephosd-9",
     "utsname_machine": "x86_64",
     "utsname_release": "6.1.0-31-amd64",
     "utsname_sysname": "Linux",
     "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07)"


Regards
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux