On Mon 16-06-25 19:00:42, Amir Goldstein wrote: > On Mon, Jun 16, 2025 at 11:07 AM Jan Kara <jack@xxxxxxx> wrote: > > On Tue 10-06-25 17:25:48, Amir Goldstein wrote: > > > On Tue, Jun 10, 2025 at 3:49 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Wed 04-06-25 18:09:15, Amir Goldstein wrote: > > > > > If we decide that we want to support FAN_PATH_ACCESS from all the > > > > > path-less lookup_one*() helpers, then we need to support reporting > > > > > FAN_PATH_ACCESS event with directory fid. > > > > > > > > > > If we allow FAN_PATH_ACCESS event from path-less vfs helpers, we still > > > > > have to allow setting FAN_PATH_ACCESS in a mount mark/ignore mask, because > > > > > we need to provide a way for HSM to opt-out of FAN_PATH_ACCESS events > > > > > on its "work" mount - the path via which directories are populated. > > > > > > > > > > There may be a middle ground: > > > > > - Pass optional path arg to __lookup_slow() (i.e. from walk_component()) > > > > > - Move fsnotify hook into __lookup_slow() > > > > > - fsnotify_lookup_perm() passes optional path data to fsnotify() > > > > > - fanotify_handle_event() returns -EPERM for FAN_PATH_ACCESS without > > > > > path data > > > > > > > > > > This way, if HSM is enabled on an sb and not ignored on specific dir > > > > > after it was populated, path lookup from syscall will trigger > > > > > FAN_PATH_ACCESS events and overalyfs/nfsd will fail to lookup inside > > > > > non-populated directories. > > > > > > > > OK, but how will this manifest from the user POV? If we have say nfs > > > > exported filesystem that is HSM managed then there would have to be some > > > > knowledge in nfsd to know how to access needed files so that HSM can pull > > > > them? I guess I'm missing the advantage of this middle-ground solution... > > > > > > The advantage is that an admin is able to set up a "lazy populated fs" > > > with the guarantee that: > > > 1. Non-populated objects can never be accessed > > > 2. If the remote fetch service is up and the objects are accessed > > > from a supported path (i.e. not overlayfs layer) then the objects > > > will be populated on access > > > > > > This is stronger and more useful than silently serving invalid content IMO. > > > > > > This is related to the discussion about persistent marks and how to protect > > > against access to non-populated objects while service is down, but since > > > we have at least one case that can result in an EIO error (service down) > > > then another case (access from overlayfs) maybe is not a game changer(?) > > > > Yes, reporting error for unpopulated content would be acceptable behavior. > > I just don't see this would be all that useful. > > > > Regarding overlayfs, I think there is an even bigger problem. > There is the promise that we are not calling the blocking pre-content hook > with freeze protection held. > In overlayfs it is very common to take the upper layer freeze protection > for a relatively large scope (e.g. ovl_want_write() in ovl_create_object()) > and perform lookups on upper fs or lower fs within this scope. > I am afraid that cleaning that up is not going to be realistic. > > IMO, it is perfectly reasonable that overlayfs and HSM (at least pre-dir-access) > will be mutually exclusive features. > > This is quite similar to overlayfs resulting in EIO if lower fs has an > auto mount point. > > Is it quite common for users to want overlayfs mounted over > /var/lib/docker/overlay2 > on the root fs. > HSM is not likely to be running on / and /etc, but likely on a very > distinct lazy populated source dir or something. > We can easily document and deny mounting overlayfs over subtrees where > HSM is enabled (or just pre-path events). > > This way we can provide HSM lazy dir populate to the users that do not care > about overlayfs without having to solve very hard to unsolvable issues. > > I will need to audit all the other users of vfs lookup helpers other than > overlayfs and nfsd, to estimate how many of them are pre-content event > safe and how many are a hopeless case. > > On the top of my head, trying to make a cachefilesd directory an HSM > directory is absolutely insane, so not every user of vfs lookup helpers > should be able to populate HSM content - should should simply fail > (with a meaningful kmsg log). Right. What you write makes a lot of sense. You've convinced me that returning error from overlayfs (or similar users) when they try to access HSM managed dir is the least painful solution :). > > As I wrote in my first email what I'd like to avoid is having part of the > > functionality accessible in one way (say through FAN_REPORT_DIR_FD) and > > having to switch to different way (FAN_REPORT_DFID_NAME) for full > > functionality. That is in my opinion confusing to users and makes the api > > messy in the long term. So I'd lean more towards implementing fid-based > > events from the start. I don't think implementation-wise it's going to be > > much higher effort than FAN_REPORT_DIR_FD. I agree that for users it is > > somewhat more effort - they have to keep the private mount, open fhandle to > > get to the dir so that they can fill it in. But that doesn't seem to be > > that high bar either? > > > > ok. > > > We even have some precedens that events for regular files support both fd > > and fid events and for directory operations only fid events are supported. > > We could do it similarly for HSM events... > > That's true. > > Another advantage is that FAN_REPORT_FID | FAN_CLASS_PRE_CONTENT > has not been allowed so far, so we can use it to set new semantics > that do not allow FAN_ONDIR and FAN_EVENT_ON_CHILD at all. > The two would be fully implied from the event type, unlike today > where we ignore them for some event types and use different meanings > to other event types. Right. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR