Hi Shoenke,
Migrating DB volume using ceph-bluestore-tool was a wrong step. It
doesn't setup LV tags for underlying volumes which prevents preper OSD
devices detection after reboot.
One should set these tags manually using lvchange --addtag command. To a
major degree DB tags are similar to ones for the block device. But some
additional tuning is still required.
Unfortunately AFAIK there is no full how-to-do manual available. One of
Eugene's link covers the topic just partly. So you should rather use
existing valid OSD deployment as a reference.
Thanks,
Igor
On 9/4/2025 3:55 PM, Soenke Schippmann wrote:
Hi,
after upgrading one of our ceph clusters from 18.2.7 to 19.2.3 some
OSDs fail to start. For these OSDs, db devices were moved manually
months ago from a partition to a lvm volume.
OSD log shows:
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 set uid:gid to 167:167
(ceph:ceph)
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 ceph version 19.2.3
(c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable), process
ceph-osd, pid 7
2025-09-04T11:38:22.055+0000 7fec1bbc4740 0 pidfile_write: ignore
empty --pid-file
2025-09-04T11:38:22.055+0000 7fec1bbc4740 1 bdev(0x556f24a77400
/var/lib/ceph/osd/ceph-256/block) open path
/var/lib/ceph/osd/ceph-256/block
2025-09-04T11:38:22.055+0000 7fec1bbc4740 -1 bdev(0x556f24a77400
/var/lib/ceph/osd/ceph-256/block) open stat got: (1) Operation not
permitted
2025-09-04T11:38:22.055+0000 7fec1bbc4740 -1 ** ERROR: unable to open
OSD superblock on /var/lib/ceph/osd/ceph-256: (2) No such file or
directory
Links to block and block.db within the osd path get deleted after each
startup attempt. Recreating the links manually does not help.
ceph-bluestore-tool fsck --path... shows no errors if links to block
and block.db are recreated.
Running "ceph-volume activate --osd-id 256" manually within cephadm
shell fails with the follwing error:
--> Failed to activate via LVM: could not find db with uuid
6d676bcd-1f3c-e740-8fdf-6a5156605a3f
ceph-volume lvm list shows outdated db uuid and db device:
===== osd.256 ======
[block]
/dev/ceph-402b182b-c5bd-416d-bde6-668e772b0c4c/osd-block-93811afc-17a3-4458-8e00-506eb9c92cb0
block device
/dev/ceph-402b182b-c5bd-416d-bde6-668e772b0c4c/osd-block-93811afc-17a3-4458-8e00-506eb9c92cb0
block uuid GRd7Zo-dXdx-23Wf-507d-dHPw-6UOA-oaf7Qy
cephx lockbox secret
cluster fsid 1f3b3198-08b1-418c-a279-7050a2eb1ce3
cluster name ceph
crush device class None
db device /dev/sdai1
db uuid 6d676bcd-1f3c-e740-8fdf-6a5156605a3f
encrypted 0
osd fsid 93811afc-17a3-4458-8e00-506eb9c92cb0
osd id 256
type block
vdo 0
devices /dev/sdc
[db] /dev/sdai1
PARTUUID 6d676bcd-1f3c-e740-8fdf-6a5156605a3f
db was migrated from partition /dev/sdai1 to lvm volume
ceph-blockdb-01/osd-db-01 on /dev/sdaa months ago and running fine
with Ceph 18.2. Migration was done manually by using
"ceph-bluestore-tool bluefs-bdev-migrate" (ceph-volume lvm migrate
failed though).
Is there any way to fix this?
Best,
Sönke
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx