md raid0 Direct IO DMA alignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I have a question around DMA alignment for MD block devices (RAID0 in our case, but applicable to other MD array types). Many underlying block devices support more permissive DMA alignment requirements. For example, in https://github.com/torvalds/linux/commit/52fde2c07da606f3f120af4f734eadcfb52b04be#diff-dc92ff74575224dc8a460fa8ea47dd00968c082be4205ecc672530e116a0043bL1776 the NVMe controller DMA requirements were relaxed to only require 8 byte alignment on the buffer provided for Direct IO.

However, when NVMe devices (or any other devices with less restrictive DMA alignment) are used to back a MD device (RAID0 in our case), the dma_alignment on the block device queue is set to a much more restrictive value than what the device supports. From initial exploration, I don't see why that is necessary. If the underlying devices support less strictly aligned Direct IO buffers, and the sector/block sizes are a multiple of that alignment, all possible addresses handed off to the backing devices will be correctly aligned. For example, even if the buffer is split across multiple stripes on a mdraid array, since the IO starts with sector alignment on the disk, any multiple of sectors from the start of the buffer will still be correctly aligned.

Within the md driver and block layer, when setting up the md block device queue limits, md_init_stacking_limits() is called which in turn sets up default values from blk_set_stacking_limits here: https://github.com/torvalds/linux/blob/9afe652958c3ee88f24df1e4a97f298afce89407/block/blk-settings.c#L42. The DMA alignment requirement initialized there (SECTOR_SIZE - 1) is far stricter than required by many/most actual backing devices. Then when the md layer later calls into mddev_stack_rdev_limits, it calls into queue_limits_stack_bdev which takes the max of dma_alignment on the current queue limits and the next device in the mddev.

It seems that rather than setting dma_alignment to SECTOR_SIZE - 1 in md_init_stacking_limits, it should be set to zero, and as queue_limits_stack_bdev is called on each backing device, the dma_alignment value will be updated to the largest dma_alignment value among all backing devices. Are there any thoughts/concerns about updating the mddev dma_alignment computation to track the underlying backing device more closely, without the minimum SECTOR_SIZE - 1 lower bound today?

Regards,
   Jason





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux