On 5/29/2025 8:32 AM, Martin K. Petersen wrote: > > Hi Anuj! > > Thanks for working on this! > Hi Martin, Thanks for the feedback! >> 4. tuple_size: size (in bytes) of the protection information tuple. >> 6. pi_offset: offset of protection info within the tuple. > > I find this a little confusing. The T10 PI tuple is <guard, app, ref>. > > I acknowledge things currently are a bit muddy in the block layer since > tuple_size has been transmogrified to hold the NVMe metadata size. > > But for a new user-visible interface I think we should make the > terminology clear. The tuple is the PI and not the rest of the metadata. > > So I think you'd want: > > 4. metadata_size: size (in bytes) of the metadata associated with each interval. > 6. pi_offset: offset of protection information tuple within the metadata. > Yes, this representation looks better. Will make this change. >> +#define FILE_PI_CAP_INTEGRITY (1 << 0) >> +#define FILE_PI_CAP_REFTAG (1 << 1) > > You'll also need to have corresponding uapi defines for: > > enum blk_integrity_checksum { > BLK_INTEGRITY_CSUM_NONE = 0, > BLK_INTEGRITY_CSUM_IP = 1, > BLK_INTEGRITY_CSUM_CRC = 2, > BLK_INTEGRITY_CSUM_CRC64 = 3, > } __packed ; > Right, I'll add these definitions to the UAPI. >> + >> +/* >> + * struct fs_pi_cap - protection information(PI) capability descriptor >> + * @flags: Bitmask of capability flags >> + * @interval: Number of bytes of data per PI tuple >> + * @csum_type: Checksum type >> + * @tuple_size: Size in bytes of the PI tuple >> + * @tag_size: Size of the tag area within the tuple >> + * @pi_offset: Offset in bytes of the PI metadata within the tuple >> + * @rsvd: Reserved for future use > > See above for distinction between metadata and PI tuple. The question is > whether we need to report the size of those two separately (both in > kernel and in this structure). Otherwise how do we know how big the PI > tuple is? Or do we infer that from the csum_type? > The block layer currently infers this by looking at the csum_type (e.g., in blk_integrity_generate). I assumed userspace could do the same, so I didn't expose a separate pi_tuple_size field. Do you see this differently? As you mentioned, the other option would be to report the PI tuple size explicitly in both the kernel and in the uapi struct. > Also, for the extended NVMe PI types we'd probably need to know the size > of the ref tag and the storage tag. > Makes sense, I will introduce ref_tag_size and storage_tag_size in the UAPI struct to account for this. I did a respin based on your feedback here [1]. If this looks good to you, I'll roll out a v2. Thanks, Anuj Gupta [1] [PATCH] fs: add ioctl to query protection info capabilities Add a new ioctl, FS_IOC_GETPICAP, to query protection info (PI) capabilities. This ioctl returns information about the files integrity profile. This is useful for userspace applications to understand a files end-to-end data protection support and configure the I/O accordingly. For now this interface is only supported by block devices. However the design and placement of this ioctl in generic FS ioctl space allows us to extend it to work over files as well. This maybe useful when filesystems start supporting PI-aware layouts. A new structure struct fs_pi_cap is introduced, which contains the following fields: 1. flags: bitmask of capability flags. 2. interval: the data block interval (in bytes) for which the protection information is generated. 3. csum type: type of checksum used. 4. metadata_size: size (in bytes) of the metadata associated with each interval. 5. tag_size: size (in bytes) of tag information. 6. pi_offset: offset of protection information tuple within the metadata. 7. ref_tag_size: size in bytes of the reference tag. 8. storage_tag_size: size in bytes of the storage tag. 9. rsvd: reserved for future use. The internal logic to fetch the capability is encapsulated in a helper function blk_get_pi_cap(), which uses the blk_integrity profile associated with the device. The ioctl returns -EOPNOTSUPP, if CONFIG_BLK_DEV_INTEGRITY is not enabled. Signed-off-by: Anuj Gupta <anuj20.g@xxxxxxxxxxx> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx> --- block/blk-integrity.c | 38 +++++++++++++++++++++++++++++++++++ block/ioctl.c | 3 +++ include/linux/blk-integrity.h | 6 ++++++ include/uapi/linux/fs.h | 36 +++++++++++++++++++++++++++++++++ 4 files changed, 83 insertions(+) diff --git a/block/blk-integrity.c b/block/blk-integrity.c index a1678f0a9f81..9bd2888a85ce 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -13,6 +13,7 @@ #include <linux/scatterlist.h> #include <linux/export.h> #include <linux/slab.h> +#include <linux/t10-pi.h> #include "blk.h" @@ -54,6 +55,43 @@ int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio) return segments; } +int blk_get_pi_cap(struct block_device *bdev, struct fs_pi_cap __user *argp) +{ + struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk); + struct fs_pi_cap pi_cap = {}; + + if (!bi) + goto out; + + if (bi->flags & BLK_INTEGRITY_DEVICE_CAPABLE) + pi_cap.flags |= FILE_PI_CAP_INTEGRITY; + if (bi->flags & BLK_INTEGRITY_REF_TAG) + pi_cap.flags |= FILE_PI_CAP_REFTAG; + pi_cap.csum_type = bi->csum_type; + pi_cap.tuple_size = bi->tuple_size; + pi_cap.tag_size = bi->tag_size; + pi_cap.interval = 1 << bi->interval_exp; + pi_cap.pi_offset = bi->pi_offset; + switch (bi->csum_type) { + case BLK_INTEGRITY_CSUM_CRC64: + pi_cap.ref_tag_size = sizeof_field(struct crc64_pi_tuple + , ref_tag); + break; + case BLK_INTEGRITY_CSUM_CRC: + case BLK_INTEGRITY_CSUM_IP: + pi_cap.ref_tag_size = sizeof_field(struct t10_pi_tuple, + ref_tag); + break; + default: + break; + } + +out: + if (copy_to_user(argp, &pi_cap, sizeof(struct fs_pi_cap))) + return -EFAULT; + return 0; +} + /** * blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist * @rq: request to map diff --git a/block/ioctl.c b/block/ioctl.c index e472cc1030c6..53b35bf3e6fa 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -13,6 +13,7 @@ #include <linux/uaccess.h> #include <linux/pagemap.h> #include <linux/io_uring/cmd.h> +#include <linux/blk-integrity.h> #include <uapi/linux/blkdev.h> #include "blk.h" #include "blk-crypto-internal.h" @@ -643,6 +644,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode, return blkdev_pr_preempt(bdev, mode, argp, true); case IOC_PR_CLEAR: return blkdev_pr_clear(bdev, mode, argp); + case FS_IOC_GETPICAP: + return blk_get_pi_cap(bdev, argp); default: return -ENOIOCTLCMD; } diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h index c7eae0bfb013..6118a0c28605 100644 --- a/include/linux/blk-integrity.h +++ b/include/linux/blk-integrity.h @@ -29,6 +29,7 @@ int blk_rq_map_integrity_sg(struct request *, struct scatterlist *); int blk_rq_count_integrity_sg(struct request_queue *, struct bio *); int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf, ssize_t bytes); +int blk_get_pi_cap(struct block_device *bdev, struct fs_pi_cap __user *argp); static inline bool blk_integrity_queue_supports_integrity(struct request_queue *q) @@ -92,6 +93,11 @@ static inline struct bio_vec rq_integrity_vec(struct request *rq) rq->bio->bi_integrity->bip_iter); } #else /* CONFIG_BLK_DEV_INTEGRITY */ +static inline int blk_get_pi_cap(struct block_device *bdev, + struct fs_pi_cap __user *argp) +{ + return -EOPNOTSUPP; +} static inline int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *b) { diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index e762e1af650c..c70584b09bed 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -91,6 +91,40 @@ struct fs_sysfs_path { __u8 name[128]; }; +/* Protection info capability flags */ +#define FILE_PI_CAP_INTEGRITY (1 << 0) +#define FILE_PI_CAP_REFTAG (1 << 1) + +/* Checksum types for Protection Information */ +#define FS_PI_CSUM_NONE 0 +#define FS_PI_CSUM_IP 1 +#define FS_PI_CSUM_CRC 2 +#define FS_PI_CSUM_CRC64 3 + +/* + * struct fs_pi_cap - protection information(PI) capability descriptor + * @flags: Bitmask of capability flags + * @interval: Number of bytes of data per PI tuple + * @csum_type: Checksum type + * @metadata_size: Size in bytes of the metadata associated with each interval + * @tag_size: Size of the tag area within the tuple + * @pi_offset: Offset of protection information tuple within the metadata + * @ref_tag_size: Size in bytes of the reference tag + * @storage_tag_size: Size in bytes of the storage tag + * @rsvd: Reserved for future use + */ +struct fs_pi_cap { + __u32 flags; + __u16 interval; + __u8 csum_type; + __u8 tuple_size; + __u8 tag_size; + __u8 pi_offset; + __u8 ref_tag_size; + __u8 storage_tag_size; + __u8 rsvd[4]; +}; + /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ #define FILE_DEDUPE_RANGE_SAME 0 #define FILE_DEDUPE_RANGE_DIFFERS 1 @@ -247,6 +281,8 @@ struct fsxattr { * also /sys/kernel/debug/ for filesystems with debugfs exports */ #define FS_IOC_GETFSSYSFSPATH _IOR(0x15, 1, struct fs_sysfs_path) +/* Get protection info capability details */ +#define FS_IOC_GETPICAP _IOR('f', 3, struct fs_pi_cap) /* * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS) -- 2.25.1