Hi Sönke,
have you verified the OSD's metadata?
osd metadata 0 -f json | jq -r '.bluefs_dedicated_db,.devices'
And I would also check if the labels are correct:
cephadm shell -- ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block | grep osd_key
We wrote two blog posts ([0],[1]) about rocksDB migration, maybe they
can help you figure out what's wrong with the OSD(s).
My suspicion is that the LV tags were not set properly after the
migration, but it might be something else, of course.
Regards,
Eugen
[0]
https://heiterbiswolkig.blogs.nde.ag/2018/04/08/migrating-bluestores-block-db/
[1]
https://heiterbiswolkig.blogs.nde.ag/2025/02/05/cephadm-migrate-db-wal-to-new-device/
Zitat von Soenke Schippmann <schippmann@xxxxxxxxxxxxx>:
Hi,
after upgrading one of our ceph clusters from 18.2.7 to 19.2.3 some
OSDs fail to start. For these OSDs, db devices were moved manually
months ago from a partition to a lvm volume.
OSD log shows:
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 set uid:gid to 167:167
(ceph:ceph)
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 ceph version 19.2.3
(c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable), process
ceph-osd, pid 7
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 pidfile_write: ignore
empty --pid-file
2025-09-04T11:38:22.055+0000 7fec1bbc4740 1 bdev(0x556f24a77400
/var/lib/ceph/osd/ceph-256/block) open path
/var/lib/ceph/osd/ceph-256/block
2025-09-04T11:38:22.055+0000 7fec1bbc4740 -1 bdev(0x556f24a77400
/var/lib/ceph/osd/ceph-256/block) open stat got: (1) Operation not
permitted
2025-09-04T11:38:22.055+0000 7fec1bbc4740 -1 ** ERROR: unable to
open OSD superblock on /var/lib/ceph/osd/ceph-256: (2) No such file
or directory
Links to block and block.db within the osd path get deleted after
each startup attempt. Recreating the links manually does not help.
ceph-bluestore-tool fsck --path... shows no errors if links to block
and block.db are recreated.
Running "ceph-volume activate --osd-id 256" manually within cephadm
shell fails with the follwing error:
--> Failed to activate via LVM: could not find db with uuid
6d676bcd-1f3c-e740-8fdf-6a5156605a3f
ceph-volume lvm list shows outdated db uuid and db device:
===== osd.256 ======
[block]
/dev/ceph-402b182b-c5bd-416d-bde6-668e772b0c4c/osd-block-93811afc-17a3-4458-8e00-506eb9c92cb0
block device
/dev/ceph-402b182b-c5bd-416d-bde6-668e772b0c4c/osd-block-93811afc-17a3-4458-8e00-506eb9c92cb0
block uuid GRd7Zo-dXdx-23Wf-507d-dHPw-6UOA-oaf7Qy
cephx lockbox secret
cluster fsid 1f3b3198-08b1-418c-a279-7050a2eb1ce3
cluster name ceph
crush device class None
db device /dev/sdai1
db uuid 6d676bcd-1f3c-e740-8fdf-6a5156605a3f
encrypted 0
osd fsid 93811afc-17a3-4458-8e00-506eb9c92cb0
osd id 256
type block
vdo 0
devices /dev/sdc
[db] /dev/sdai1
PARTUUID 6d676bcd-1f3c-e740-8fdf-6a5156605a3f
db was migrated from partition /dev/sdai1 to lvm volume
ceph-blockdb-01/osd-db-01 on /dev/sdaa months ago and running fine
with Ceph 18.2. Migration was done manually by using
"ceph-bluestore-tool bluefs-bdev-migrate" (ceph-volume lvm migrate
failed though).
Is there any way to fix this?
Best,
Sönke
--
Sönke Schippmann
Universität Bremen
Dezernat 8 - IT Service Center
Referat 82 Serverbetrieb
Büroanschrift:
Universität Bremen
Dez. 8-Bi, SFG 1390
Enrique-Schmidt-Str. 7
28359 Bremen
E-Mail: schippmann@xxxxxxxxxxxxx
Tel: +49 421 218-61327
Fax: +49 421 218-98-61327
http://www.uni-bremen.de/zfn/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx