Re: [RFC PATCH] fanotify: wake-up all waiters on release

Jan Kara <jack@xxxxxxx> · Mon, 26 May 2025 18:47:38 +0200

On Mon 26-05-25 23:12:20, Sergey Senozhatsky wrote:
> On (25/05/26 14:52), Jan Kara wrote:
> > > > We don't use exclusive waits with access_waitq so wake_up() and
> > > > wake_up_all() should do the same thing?
> > > 
> > > Oh, non-exclusive waiters, I see.  I totally missed that, thanks.
> > > 
> > > So... the problem is somewhere else then.  I'm currently looking
> > > at some crashes (across all LTS kernels) where group owner just
> > > gets stuck and then hung-task watchdog kicks in and panics the
> > > system.  Basically just a single backtrace in the kernel logs:
> > > 
> > >  schedule+0x534/0x2540
> > >  fsnotify_destroy_group+0xa7/0x150
> > >  fanotify_release+0x147/0x160
> > >  ____fput+0xe4/0x2a0
> > >  task_work_run+0x71/0xb0
> > >  do_exit+0x1ea/0x800
> > >  do_group_exit+0x81/0x90
> > >  get_signal+0x32d/0x4e0
> > > 
> > > My assumption was that it's this wait:
> > > 	wait_event(group->notification_waitq, !atomic_read(&group->user_waits));
> > 
> > Well, you're likely correct we are sleeping in this wait. But likely
> > there's some process that's indeed waiting for response to fanotify event
> > from userspace. Do you have a reproducer? Can you dump all blocked tasks
> > when this happens?
> 
> Unfortunately, no.  This happens on consumer devices, which are
> not available for any sort of debugging, due to various privacy
> protection reasons.  We only get anonymized kernel ramoops/dmesg
> on crashes.
> 
> So my only option is to add something to the kernel, then roll-out
> the patched kernel to the fleet and wait for new crash reports.  The
> problem is, all that I can think of sort of fixes the crash as far as
> the hung-task watchdog is concerned.  Let me think more about it.
> 
> Another silly question: what decrements group->user_waits in case of
> that race-condition?
> 
> ---
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 9dac7f6e72d2b..38b977fe37a71 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -945,8 +945,10 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>         if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
>                 fsid = fanotify_get_fsid(iter_info);
>                 /* Racing with mark destruction or creation? */
> -               if (!fsid.val[0] && !fsid.val[1])
> -                       return 0;
> +               if (!fsid.val[0] && !fsid.val[1]) {
> +                       ret = 0;
> +                       goto finish;
> +               }
>         }

This code is not present in current upstream kernel. This seems to have
been inadvertedly fixed by commit 30ad1938326b ("fanotify: allow "weak" fsid
when watching a single filesystem") which you likely don't have in your
kernel.

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR