Keith Busch <kbusch@xxxxxxxxxx> writes: > On Mon, Aug 25, 2025 at 02:07:15PM +0200, Jan Kara wrote: >> On Fri 22-08-25 18:57:08, Ritesh Harjani wrote: >> > Keith Busch <kbusch@xxxxxxxx> writes: >> > > >> > > - EXT4 falls back to buffered io for writes but not for reads. >> > >> > ++linux-ext4 to get any historical context behind why the difference of >> > behaviour in reads v/s writes for EXT4 DIO. >> >> Hum, how did you test? Because in the basic testing I did (with vanilla >> kernel) I get EINVAL when doing unaligned DIO write in ext4... We should be >> falling back to buffered IO only if the underlying file itself does not >> support any kind of direct IO. > > Simple test case (dio-offset-test.c) below. > > I also ran this on vanilla kernel and got these results: > > # mkfs.ext4 /dev/vda > # mount /dev/vda /mnt/ext4/ > # make dio-offset-test > # ./dio-offset-test /mnt/ext4/foobar > write: Success > read: Invalid argument > > I tracked the "write: Success" down to ext4's handling for the "special" > -ENOTBLK error after ext4_want_directio_fallback() returns "true". > Right. Ext4 has fallback only for dio writes but not for DIO reads... buffered static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written) { /* must be a directio to fall back to buffered */ if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != (IOMAP_WRITE | IOMAP_DIRECT)) return false; ... } So basically the path is ext4_file_[read|write]_iter() -> iomap_dio_rw -> iomap_dio_bio_iter() -> return -EINVAL. i.e. from... if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) || !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) return -EINVAL; EXT4 then fallsback to buffered-io only for writes, but not for reads. -ritesh > dio-offset-test.c: > --- > #ifndef _GNU_SOURCE > #define _GNU_SOURCE > #endif > > #include <sys/uio.h> > #include <err.h> > #include <errno.h> > #include <fcntl.h> > #include <stdlib.h> > #include <stdio.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > unsigned int pagesize; > struct iovec iov[2]; > int ret, fd; > void *buf; > > if (argc < 2) > err(EINVAL, "usage: %s <file>", argv[0]); > > pagesize = sysconf(_SC_PAGE_SIZE); > ret = posix_memalign((void **)&buf, pagesize, 2 * pagesize); > if (ret) > err(errno, "%s: failed to allocate buf", __func__); > > fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC | O_DIRECT); > if (fd < 0) > err(errno, "%s: failed to open %s", __func__, argv[1]); > > iov[0].iov_base = buf; > iov[0].iov_len = 256; > iov[1].iov_base = buf + pagesize; > iov[1].iov_len = 256; > ret = pwritev(fd, iov, 2, 0); > perror("write"); > > ret = preadv(fd, iov, 2, 0); > perror("read"); > > return 0; > } > --