On Fri, Jul 18, 2025 at 09:15:07AM +0100, John Garry wrote: > On 17/07/2025 17:02, Darrick J. Wong wrote: > > On Thu, Jul 17, 2025 at 03:19:00PM +0000, John Garry wrote: > > > Atomic writes are not currently supported for DAX, but two problems exist: > > > - we may go down DAX write path for IOCB_ATOMIC, which does not handle > > > IOCB_ATOMIC properly > > > - we report non-zero atomic write limits in statx (for DAX inodes) > > > > > > We may want atomic writes support on DAX in future, but just disallow for > > > now. > > > > > > For this, ensure when IOCB_ATOMIC is set that we check the write size > > > versus the atomic write min and max before branching off to the DAX write > > > path. This is not strictly required for DAX, as we should not get this far > > > in the write path as FMODE_CAN_ATOMIC_WRITE should not be set. > > > > > > In addition, due to reflink being supported for DAX, we automatically get > > > CoW-based atomic writes support being advertised. Remedy this by > > > disallowing atomic writes for a DAX inode for both sw and hw modes. > > > > ...because it's fsdax and who's really going to test/use software atomic > > writes there ? > > Right > > > > > > Finally make atomic write size and DAX mount always mutually exclusive. > > > > Why? You could have a rt-on-pmem filesystem with S_DAX files, and still > > want to do atomic writes to files on the data device. > > How can I test that, i.e. put something on data device? > > I tried something like this: > > $mkfs.xfs -f -m rmapbt=0,reflink=1 -d rtinherit=1 -r rtdev=/dev/pmem0 > /dev/pmem1 > $mount /dev/pmem1 mnt -o dax=always,rtdev=/dev/pmem0 -o > max_atomic_write=16k > $mkdir mnt/non_rt > $xfs_io -c "chattr -t" mnt/non_rt/ #make non-rt > $touch mnt/non_rt/file > $xfs_io -c "lsattr -v" mnt/non_rt/file > [has-xattr] mnt/non_rt/file > $xfs_io -c "statx -r -m 0x10000" mnt/non_rt/file > --- > stat.atomic_write_unit_min = 0 > stat.atomic_write_unit_max = 0 > stat.atomic_write_segments_max = 0 > --- > > I thought that losing the rtinherit flag would put the file on the data > device. From adding some kernel debug, it seems that this file is still > IS_DAX() == true The data device should be a regular scsi disk, not pmem. --D > > > > > Reported-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > > Fixes: 9dffc58f2384 ("xfs: update atomic write limits") > > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> > > > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > > index db21b5a4b881..84876f41da93 100644 > > > --- a/fs/xfs/xfs_file.c > > > +++ b/fs/xfs/xfs_file.c > > > @@ -1102,9 +1102,6 @@ xfs_file_write_iter( > > > if (xfs_is_shutdown(ip->i_mount)) > > > return -EIO; > > > - if (IS_DAX(inode)) > > > - return xfs_file_dax_write(iocb, from); > > > - > > > if (iocb->ki_flags & IOCB_ATOMIC) { > > > if (ocount < xfs_get_atomic_write_min(ip)) > > > return -EINVAL; > > > @@ -1117,6 +1114,9 @@ xfs_file_write_iter( > > > return ret; > > > } > > > + if (IS_DAX(inode)) > > > + return xfs_file_dax_write(iocb, from); > > > + > > > if (iocb->ki_flags & IOCB_DIRECT) { > > > /* > > > * Allow a directio write to fall back to a buffered > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h > > > index 07fbdcc4cbf5..b142cd4f446a 100644 > > > --- a/fs/xfs/xfs_inode.h > > > +++ b/fs/xfs/xfs_inode.h > > > @@ -356,11 +356,22 @@ static inline bool xfs_inode_has_bigrtalloc(const struct xfs_inode *ip) > > > (XFS_IS_REALTIME_INODE(ip) ? \ > > > (ip)->i_mount->m_rtdev_targp : (ip)->i_mount->m_ddev_targp) > > > -static inline bool xfs_inode_can_hw_atomic_write(const struct xfs_inode *ip) > > > +static inline bool xfs_inode_can_hw_atomic_write(struct xfs_inode *ip) > > > > Why drop const here? VFS_IC() should be sufficient, I think. > > > > I dropped that const as I got a complaint about ignoring the const when > passing to VFS_I(). But, as you say, I can use VFS_IC() > > Thanks, > John