> I’ve modified Ceph to allow using a RAID1 array as the metadata device. Specifically, I've updated ceph_volume/util/device.py and ceph_volume/util/disk.py to recognize the "raid1" device type as valid device for OSDs and metadata storage. IMHO this isn't the right layer for this. An admin wishing to mirror the offload device should do so (via MD or (sigh) an HBA) and present that device in the OSD spec. ymmv. > The changes are listed below. > > My questions: > - What are the risks or issues which may arise from using RAID1 for metadata? (e.g., performance, reliability, data integrity) > - Why this could be a bad idea, or in what scenarios could it be beneficial? Conventional wisdom has favored instead offloading fewer OSDs to each SSD to reduce write amp and the blast radius. > > My test setup: > > - Ceph version: 18.2.4 Reef (stable) > - OS: Ubuntu 22.04.5 LTS > - Hardware: 1 host, RAID1 array on two NVMe SSDs (/dev/nvme0n1 and /dev/nvme0n2, 19.98 GiB) That's an unusual size, and as you write, that appears to be an MD array of two namespaces on the same NVMe device. Is that what you intended? What SSD model is this? Is it serving purposes other than OSD WAL+DB offload? > via mdadm for block.db, 4 HDDs for OSD data > - Additional details: Tested on a small cluster on VMs; no noticeable performance changes so far, but I’m concerned about long-term implications. Honestly IMHO the economics and hassle would favor an all-NVMe chassis and monolithic NVMe OSDs. Especially when you consider the cost of the HBA. > > Code changes: > > diff --git a/src/ceph-volume/ceph_volume/util/device.py b/src/ceph-volume/ceph_volume/util/device.py > index 1b52774d1a1..148a8e326f8 100644 > --- a/src/ceph-volume/ceph_volume/util/device.py > +++ b/src/ceph-volume/ceph_volume/util/device.py > @@ -237,7 +237,7 @@ class Device(object): > self.disk_api = dev > device_type = dev.get('TYPE', '') > # always check is this is an lvm member > - valid_types = ['part', 'disk', 'mpath'] > + valid_types = ['part', 'disk', 'mpath', 'raid1'] > if allow_loop_devices(): > valid_types.append('loop') > if device_type in valid_types: > > @@ -489,7 +489,7 @@ class Device(object): > elif self.blkid_api: > api = self.blkid_api > if api: > - valid_types = ['disk', 'device', 'mpath'] > + valid_types = ['disk', 'device', 'mpath', 'raid1'] > if allow_loop_devices(): > valid_types.append('loop') > return self.device_type in valid_types > > > diff --git a/src/ceph-volume/ceph_volume/util/disk.py b/src/ceph-volume/ceph_volume/util/disk.py > index 2984c391d06..5dafd6dc62a 100644 > --- a/src/ceph-volume/ceph_volume/util/disk.py > +++ b/src/ceph-volume/ceph_volume/util/disk.py > @@ -362,7 +362,7 @@ def is_device(dev): > > TYPE = lsblk(dev).get('TYPE') > if TYPE: > - return TYPE in ['disk', 'mpath'] > + return TYPE in ['disk', 'mpath', 'raid1'] > > # fallback to stat > return _stat_is_device(os.lstat(dev).st_mode) and not is_partition(dev) > > > Any feedback is really appreciated as all of us care about not to lost our precious bytes :) > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx