Re: [RFC PATCH v2 0/3] fanotify HSM events for directories

Jan Kara <jack@xxxxxxx> · Tue, 17 Jun 2025 11:43:11 +0200

On Mon 16-06-25 19:00:42, Amir Goldstein wrote:
> On Mon, Jun 16, 2025 at 11:07 AM Jan Kara <jack@xxxxxxx> wrote:
> > On Tue 10-06-25 17:25:48, Amir Goldstein wrote:
> > > On Tue, Jun 10, 2025 at 3:49 PM Jan Kara <jack@xxxxxxx> wrote:
> > > > On Wed 04-06-25 18:09:15, Amir Goldstein wrote:
> > > > > If we decide that we want to support FAN_PATH_ACCESS from all the
> > > > > path-less lookup_one*() helpers, then we need to support reporting
> > > > > FAN_PATH_ACCESS event with directory fid.
> > > > >
> > > > > If we allow FAN_PATH_ACCESS event from path-less vfs helpers, we still
> > > > > have to allow setting FAN_PATH_ACCESS in a mount mark/ignore mask, because
> > > > > we need to provide a way for HSM to opt-out of FAN_PATH_ACCESS events
> > > > > on its "work" mount - the path via which directories are populated.
> > > > >
> > > > > There may be a middle ground:
> > > > > - Pass optional path arg to __lookup_slow() (i.e. from walk_component())
> > > > > - Move fsnotify hook into __lookup_slow()
> > > > > - fsnotify_lookup_perm() passes optional path data to fsnotify()
> > > > > - fanotify_handle_event() returns -EPERM for FAN_PATH_ACCESS without
> > > > >   path data
> > > > >
> > > > > This way, if HSM is enabled on an sb and not ignored on specific dir
> > > > > after it was populated, path lookup from syscall will trigger
> > > > > FAN_PATH_ACCESS events and overalyfs/nfsd will fail to lookup inside
> > > > > non-populated directories.
> > > >
> > > > OK, but how will this manifest from the user POV? If we have say nfs
> > > > exported filesystem that is HSM managed then there would have to be some
> > > > knowledge in nfsd to know how to access needed files so that HSM can pull
> > > > them? I guess I'm missing the advantage of this middle-ground solution...
> > >
> > > The advantage is that an admin is able to set up a "lazy populated fs"
> > > with the guarantee that:
> > > 1. Non-populated objects can never be accessed
> > > 2. If the remote fetch service is up and the objects are accessed
> > >     from a supported path (i.e. not overlayfs layer) then the objects
> > >     will be populated on access
> > >
> > > This is stronger and more useful than silently serving invalid content IMO.
> > >
> > > This is related to the discussion about persistent marks and how to protect
> > > against access to non-populated objects while service is down, but since
> > > we have at least one case that can result in an EIO error (service down)
> > > then another case (access from overlayfs) maybe is not a game changer(?)
> >
> > Yes, reporting error for unpopulated content would be acceptable behavior.
> > I just don't see this would be all that useful.
> >
> 
> Regarding overlayfs, I think there is an even bigger problem.
> There is the promise that we are not calling the blocking pre-content hook
> with freeze protection held.
> In overlayfs it is very common to take the upper layer freeze protection
> for a relatively large scope (e.g. ovl_want_write() in ovl_create_object())
> and perform lookups on upper fs or lower fs within this scope.
> I am afraid that cleaning that up is not going to be realistic.
> 
> IMO, it is perfectly reasonable that overlayfs and HSM (at least pre-dir-access)
> will be mutually exclusive features.
> 
> This is quite similar to overlayfs resulting in EIO if lower fs has an
> auto mount point.
> 
> Is it quite common for users to want overlayfs mounted over
> /var/lib/docker/overlay2
> on the root fs.
> HSM is not likely to be running on / and /etc, but likely on a very
> distinct lazy populated source dir or something.
> We can easily document and deny mounting overlayfs over subtrees where
> HSM is enabled (or just pre-path events).
> 
> This way we can provide HSM lazy dir populate to the users that do not care
> about overlayfs without having to solve very hard to unsolvable issues.
> 
> I will need to audit all the other users of vfs lookup helpers other than
> overlayfs and nfsd, to estimate how many of them are pre-content event
> safe and how many are a hopeless case.
> 
> On the top of my head, trying to make a cachefilesd directory an HSM
> directory is absolutely insane, so not every user of vfs lookup helpers
> should be able to populate HSM content - should should simply fail
> (with a meaningful kmsg log).

Right. What you write makes a lot of sense. You've convinced me that
returning error from overlayfs (or similar users) when they try to access
HSM managed dir is the least painful solution :).

> > As I wrote in my first email what I'd like to avoid is having part of the
> > functionality accessible in one way (say through FAN_REPORT_DIR_FD) and
> > having to switch to different way (FAN_REPORT_DFID_NAME) for full
> > functionality. That is in my opinion confusing to users and makes the api
> > messy in the long term. So I'd lean more towards implementing fid-based
> > events from the start. I don't think implementation-wise it's going to be
> > much higher effort than FAN_REPORT_DIR_FD. I agree that for users it is
> > somewhat more effort - they have to keep the private mount, open fhandle to
> > get to the dir so that they can fill it in. But that doesn't seem to be
> > that high bar either?
> >
> 
> ok.
> 
> > We even have some precedens that events for regular files support both fd
> > and fid events and for directory operations only fid events are supported.
> > We could do it similarly for HSM events...
> 
> That's true.
> 
> Another advantage is that FAN_REPORT_FID | FAN_CLASS_PRE_CONTENT
> has not been allowed so far, so we can use it to set new semantics
> that do not allow FAN_ONDIR and FAN_EVENT_ON_CHILD at all.
> The two would be fully implied from the event type, unlike today
> where we ignore them for some event types and use different meanings
> to other event types.

Right.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR