On Mon, Jul 07, 2025 at 08:56:58PM -0600, Keith Busch wrote: > On Tue, Jul 08, 2025 at 10:46:06AM +0800, Ming Lei wrote: > > On Mon, Jul 07, 2025 at 08:27:43PM -0600, Keith Busch wrote: > > > On Tue, Jul 08, 2025 at 09:27:06AM +0800, Ming Lei wrote: > > > > On Mon, Jul 07, 2025 at 04:18:34PM +0200, Christoph Hellwig wrote: > > > > > Hi all, > > > > > > > > > > I'm a bit lost on what to do about the sad state of NVMe atomic writes. > > > > > > > > > > As a short reminder the main issues are: > > > > > > > > > > 1) there is no flag on a command to request atomic (aka non-torn) > > > > > behavior, instead writes adhering to the atomicy requirements will > > > > > never be torn, and writes not adhering them can be torn any time. > > > > > This differs from SCSI where atomic writes have to be be explicitly > > > > > requested and fail when they can't be satisfied > > > > > 2) the original way to indicate the main atomicy limit is the AWUPF > > > > > field, which is in Identify Controller, but specified in logical > > > > > blocks which only exist at a namespace layer. This a) lead to > > > > > > > > If controller-wide AWUPF is a must property, the length has to be aligned > > > > with block size. > > > > > > What block size? The controller doesn't have one. Block sizes are > > > > It should be any NS format's block size. > > That requires an artificial reduction to a meaningless value. Any value has to be 'block size' aligned. > > > > properties of namespaces, not controllers or subsystems. If you have 10 > > > namespaces with 10 different block formats, what does AUWPF mean? If the > > > controller must report something, the only rational thing it could > > > declare is reduced to the greatest common denominator, which is out of > > > sync with the true value reported in the appropriately scoped NAUWPF > > > value. > > > > Yes, please see the words I quoted from NVMe spec, also `6.4 Atomic Operations` > > mentioned: `NAWUPF >= AWUPF`. > > The problem is when Namespace X changes its format that then alters > Namesace Y's reported atomic size. That's unacceptable for any > filesystem utilizing this feature. When X changes its format, FS has to be umount. The actual length(byte unit) of atomic write does not changed for Y, just the unit(block size) is changed, at least from Yi's report. Thanks, Ming