On Tue, Jun 24, 2025 at 4:15 PM Jan Kara <jack@xxxxxxx> wrote: > > On Tue 24-06-25 12:59:26, Christian Brauner wrote: > > > For pidfs this means a file handle can function as full replacement for > > > storing a pid in a file. Instead a file handle can be stored and > > > reopened purely based on the file handle. > > > > One thing I want to comment on generally. I know we document that a file > > handle is an opaque thing and userspace shouldn't rely on the layout or > > format (Propably so that we're free to redefine it.). > > > > Realistically though that's just not what's happening. I've linked Amir > > to that code already a few times but I'm doing it here for all of you > > again: > > > > [1]: https://github.com/systemd/systemd/blob/7e1647ae4e33dd8354bd311a7f7f5eb701be2391/src/basic/cgroup-util.c#L62-L77 > > > > Specifically: > > > > /* The structure to pass to name_to_handle_at() on cgroupfs2 */ > > typedef union { > > struct file_handle file_handle; > > uint8_t space[offsetof(struct file_handle, f_handle) + sizeof(uint64_t)]; > > } cg_file_handle; > > > > #define CG_FILE_HANDLE_INIT \ > > (cg_file_handle) { \ > > .file_handle.handle_bytes = sizeof(uint64_t), \ > > .file_handle.handle_type = FILEID_KERNFS, \ > > } > > > > #define CG_FILE_HANDLE_CGROUPID(fh) (*CAST_ALIGN_PTR(uint64_t, (fh).file_handle.f_handle)) > > > > cg_file_handle fh = CG_FILE_HANDLE_INIT; > > CG_FILE_HANDLE_CGROUPID(fh) = id; > > > > return RET_NERRNO(open_by_handle_at(cgroupfs_fd, &fh.file_handle, O_DIRECTORY|O_CLOEXEC)); > > > > Another example where the layout is assumed to be uapi/uabi is: > > > > [2]: https://github.com/systemd/systemd/blob/7e1647ae4e33dd8354bd311a7f7f5eb701be2391/src/basic/pidfd-util.c#L232-L259 > > > > int pidfd_get_inode_id_impl(int fd, uint64_t *ret) { > > <snip> > > union { > > struct file_handle file_handle; > > uint8_t space[offsetof(struct file_handle, f_handle) + sizeof(uint64_t)]; > > } fh = { > > .file_handle.handle_bytes = sizeof(uint64_t), > > .file_handle.handle_type = FILEID_KERNFS, > > }; > > int mnt_id; > > > > r = RET_NERRNO(name_to_handle_at(fd, "", &fh.file_handle, &mnt_id, AT_EMPTY_PATH)); > > if (r >= 0) { > > if (ret) > > *ret = *CAST_ALIGN_PTR(uint64_t, fh.file_handle.f_handle); > > return 0; > > } > > Thanks for sharing. Sigh... Personal note for the future: If something > should be opaque blob for userspace, don't forget to encrypt the data > before handing it over to userspace. :-P > > > In (1) you can see that the layout is assumed to be uabi by > > reconstructing the handle. In (2) you can see that the layout is assumed > > to be uabi by extrating the inode number. > > > > So both points mean that the "don't rely on it"-ship has already sailed. > > If we get regressions reports for this (and we surely would) because we > > changed it we're bound by the no-regression rule. > > Yep, FILEID_KERNFS is pretty much set in stone. OTOH I don't expect these > kinds of hacks to be very widespread (I guess I'm naive ;) so if we really > need to change it we could talk to systemd folks. > > > So, for pidfs I'm very tempted to explicitly give the guarantee that > > systemd just assumes currently. > > > > The reason is that it will allow userspace to just store the 64-bit > > pidfs inode number in a file or wherever they want and then reconstruct > > the file handle without ever having to involve name_to_handle_at(). That > > means you can just read the pid file and see the inode number you're > > dealing with and not some binary gunk. > > Well, you could just fprintf() the fhandle into the pid file if you don't > like binary gunk. Those numbers would be telling about as much as the pidfs > inode number tells you, don't they? I mean I'm still not at the point where > I would *encourage* userspace to decode what's supposed to be opaque blob > ;). But maybe I'll get used to that idea... As I was also writing a similar response, I will share my somewhat duplicate feedback ;) I still hold the option that as a rule, we should try not to encourage crafting file handles in userspace. Exceptions to the rule are easier to accept for non-persistent pseudo filesystems and especially in the case that the handle format is simply the u64 inode number. But if you want to hand craft the pidfs file handles, I get that the addition of FILEID_IS_AUTONOMOUS flag is a nuisance, because you'd want something as simple as the examples you quoted and not more complicated encodings. Therefore, see my suggestion to toss the FILEID_IS_AUTONOMOUS flag as well as the new FILEID_PIDFS type and opt-in to opening fd without a path fd with FD_PIDFS_ROOT instead, so systemd can continue to use the same simple hand crafted file handles. Thanks, Amir.