Re: [RFC PATCH v2 0/3] fanotify HSM events for directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Amir!

On Wed 04-06-25 18:09:15, Amir Goldstein wrote:
> In v1 there was only patch 1 [1] to allow FAN_PRE_ACCESS events
> on readdir (with FAN_ONDIR).
> 
> Following your feedback on v1, v2 adds support for FAN_PATH_ACCESS
> event so that a non-populated directory could be populted either on
> first readdir or on first lookup.

OK, it's good that now we have a bit more wider context for the discussion
:). First, when reading this I've started wondering whether we need both
FAN_PRE_ACCESS on directories and FAN_PATH_ACCESS (only on directories).
Firstly, I don't love adding more use to the FAN_ONDIR flag when creating
marks because you can only specify you want FAN_PRE_ACCESS on files,
FAN_PRE_ACCESS on files & dirs but there's no way to tell you care only
about FAN_PRE_ACCESS on dirs. You have to filter that when receiving
events. Secondly, the distinction between FAN_PRE_ACCESS and
FAN_PATH_ACCESS is somewhat weak - it's kind of similar to the situation
with regular files when we notify about access to the whole file vs only to
a specific range.  So what if we had an event like FAN_PRE_DIR_ACCESS that
would report looked up name on lookup and nothing on readdir meaning you
need to fetch everything?

> I am still tagging this as RFC for two semi-related reasons:
> 
> 1) In my original draft of man-page for FAN_PATH_ACCESS [2],
> I had introduced a new class FAN_CLASS_PRE_PATH, which FAN_PATH_ACCESS
> requires and is defined as:
> "Unlike FAN_CLASS_PRE_CONTENT, this class can be used along with
>  FAN_REPORT_DFID_NAME to report the names of the looked up files along
>  with O_PATH file descriptos in the new path lookup events."
> 
> I am not sure if we really need FAN_CLASS_PRE_PATH, so wanted to ask
> your opinion.
> 
> The basic HSM (as implemented in my POC) does not need to get the lookup
> name in the event - it populates dir on first readdir or lookup access.
> So I think that support for (FAN_CLASS_PRE_CONTENT | FAN_REPORT_DFID_NAME)
> could be added later per demand.

The question here is what a real user is going to do? I know Meta guys
don't care about directory events for their usecase. You seem to care about
them so I presume you have some production use in mind? How's that going to
work? Because if I should guess I'd think that someone asks for the name
being looked up sooner rather than later because for large dirs not having
to fetch everything on lookup would be a noticeable win... And if that's
the case then IMHO we should design (but not necessarily fully implement)
API that's the simplest and most logical when this is added.

This ties to the discussion how the FAN_PATH_ACCESS / FAN_PRE_DIR_ACCESS
event is going to report the name.

> 2) Current code does not generate FAN_PRE_ACCESS from vfs internal
> lookup helpers such as  lookup_one*() helpers from overalyfs and nfsd.
> This is related to the API of reporting an O_PATH event->fd for
> FAN_PATH_ACCESS event, which requires a mount.

AFAIU this means that you could not NFS export a filesystem that is HSM
managed and you could not use HSM managed filesystem to compose overlayfs.
I don't find either of those a critical feature but OTOH it would be nice
if the API didn't restrict us from somehow implementing this in the future.

> If we decide that we want to support FAN_PATH_ACCESS from all the
> path-less lookup_one*() helpers, then we need to support reporting
> FAN_PATH_ACCESS event with directory fid.
> 
> If we allow FAN_PATH_ACCESS event from path-less vfs helpers, we still
> have to allow setting FAN_PATH_ACCESS in a mount mark/ignore mask, because
> we need to provide a way for HSM to opt-out of FAN_PATH_ACCESS events
> on its "work" mount - the path via which directories are populated.
> 
> There may be a middle ground:
> - Pass optional path arg to __lookup_slow() (i.e. from walk_component())
> - Move fsnotify hook into __lookup_slow()
> - fsnotify_lookup_perm() passes optional path data to fsnotify()
> - fanotify_handle_event() returns -EPERM for FAN_PATH_ACCESS without
>   path data
> 
> This way, if HSM is enabled on an sb and not ignored on specific dir
> after it was populated, path lookup from syscall will trigger
> FAN_PATH_ACCESS events and overalyfs/nfsd will fail to lookup inside
> non-populated directories.

OK, but how will this manifest from the user POV? If we have say nfs
exported filesystem that is HSM managed then there would have to be some
knowledge in nfsd to know how to access needed files so that HSM can pull
them? I guess I'm missing the advantage of this middle-ground solution...

> Supporting populate events from overalyfs/nfsd could be implemented
> later per demand by reporting directory fid instead of O_PATH fd.
> 
> If you think that is worth checking, I can prepare a patch for the above
> so we can expose it to performance regression bots.
> 
> Better yet, if you have no issues with the implementation in this
> patch set, maybe let it soak in for_next/for_testing as is to make
> sure that it does not already introduce any performance regressions.
> 
> Thoughts?

If I should summarize the API situation: If we ever want to support HSM +
NFS export / overlayfs, we must implement support for pre-content events
with FID (DFID + name to be precise). If we want to support HSM events on
lookup with looked up name, we don't have to go for full DFID + name but we
must at least add additional info with the name to the event. Also if we go
for reporting directory pre-content events with standard events, you want
to add support for returning O_PATH fds for better efficiency and the code
to handle FMODE_NONOTIFY directory fds in path lookup.

Frankly seeing all this I think that going for DFID + name events for
directory HSM events from the start may be the cleanest choice long term
because then we'll have one way how to access the directory HSM
functionality with possibility of extensions without having to add
different APIs for it. We'd just have to implement replying to FID events
because we won't have fd to identify event we reply to so that will need
some thought.

What are your thoughts? Am I missing something?

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux