On 7/10/25 2:58 AM, Keith Busch wrote: > On Wed, Jul 09, 2025 at 01:21:17PM +0530, Nilay Shroff wrote: >> I believe there are multi-controller NVMe disks in the field (including the >> one I have) that do not exhibit such inconsistencies, i.e., they report a >> consistent AWUPF value across controllers and do not change it based on >> namespace format. The NVMe specification states this (quoting it from >> NVM-Command-Set-Specification-1.0e): >> >> "The values (referencing AWUPF / AWUN) reported in the Identify Controller >> data structure are valid across all namespaces with any supported namespace >> format, forming a baseline value that is guaranteed not to change." > > I don't think that's a backward compatible requirement. Controllers > often rescale these after a format command, and it was the only way for > 1.0 and 1.1 controllers to report atomic sizes. > > Lets say the controller can do 128k byte atomic writes, If all > namespaces used 512b LBA format, then AWUPF would be 255. If you change > one namespace format to 4k, AWUPF scales down to 31, yielding a > sub-optimal result for all the other namespaces. > On the multi-controller disk I’ve been testing, each controller consistently reports an AWUPF value of 63. I created shared namespaces with mixed LBA formats — some using 512-byte LBAs and others using 4KB LBAs — and observed that the AWUPF value remained constant at 63 across all controllers and formats. This implies that: - A namespace with 4KB LBA format can support up to 256KB of atomic writes (4KB × 64), - A namespace with 512-byte LBA format can only support up to 32KB of atomic writes (512B × 64). So in this case, it's actually the opposite of what one might assume: Users of namespaces with 4KB LBA format would see the best possible atomic write performance, while those using 512-byte LBA format may observe sub-optimal performance, since the maximum atomic write size scales down with smaller LBAs. >> While the spec doesn´t explicitly require that AWUPF be consistent across >> controllers within the same subsystem, it seems to be implied. That said, >> I agree this should have been stated explicitly in the specification. > > Considering multi-controller subsystems, some controllers might have > namespaces with only 512b formats attached, and other controllers might > have some 4k mixed in, so then they can't all consistently report the > desired AWUPF value. They'd have to just scale AWUPF based on the > largest sector size supported. Which I guess is what the current wording > is guiding toward, but that just suggests host drivers disregard the > value and use NAWUPF instead. So still option III. Yes, I agree — option III seems to be the best possible way forward. However, does this mean we would disregard atomic write support for any multi-controller NVMe vendor that consistently reports a valid AWUPF value across all controllers and namespace formats, but sets NAWUPF to zero? Thanks, --Nilay