Re: [PATCH v2 0/3] Fix localio hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 17 Jul 2025, Mike Snitzer wrote:
> On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote:
> > On Thu, 17 Jul 2025, Trond Myklebust wrote:
> > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > 
> > > The following patch series fixes a series of issues with the current
> > > localio code, as reported in the link
> > > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/
> > > 
> > > 
> > > Trond Myklebust (3):
> > >   NFS/localio: nfs_close_local_fh() fix check for file closed
> > >   NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
> > >   NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
> > 
> > That all looks good to me - thanks a lot for finding and fixing my bugs.
> > 
> > Reviewed-by: NeilBrown <neil@xxxxxxxxxx>
> > 
> > I'd still like to fix the nfsd_file_cache_purge() issue but that is
> > quite separate especially now that you've prevented it causing problems
> > for nfs_uuid_put().
> > 
> > thanks,
> > NeilBrown
> 
> Unfortunately even with these 3 v2 fixes I was just able to hit the
> same hang on NFSD shutdown.  It took 5 iterations of the fio test,
> reported here:
> https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/
> So it is harder to hit with these v2 fixes, nevertheless:
> 
> [  369.528839] task:rpc.nfsd        state:D stack:0     pid:10569 tgid:10569 ppid:1      flags:0x00004006

Are there any other tasks which are in "state:D", or any nfsd or nfs
processes that are waiting in any state?

I'll see if I can work out any way that an nfsd_net_ref reference might leak.

Thanks,
NeilBrown

> [  369.528985] Call Trace:
> [  369.529127]  <TASK>
> [  369.529295]  __schedule+0x26d/0x530
> [  369.529435]  schedule+0x27/0xa0
> [  369.529566]  schedule_timeout+0x14e/0x160
> [  369.529700]  ? svc_destroy+0xce/0x160 [sunrpc]
> [  369.529882]  ? lockd_put+0x5f/0x90 [lockd]
> [  369.530022]  __wait_for_common+0x8f/0x1d0
> [  369.530154]  ? __pfx_schedule_timeout+0x10/0x10
> [  369.530329]  nfsd_destroy_serv+0x13f/0x1a0 [nfsd]
> [  369.530516]  nfsd_svc+0xe0/0x170 [nfsd]
> [  369.530684]  write_threads+0xc3/0x190 [nfsd]
> [  369.530845]  ? simple_transaction_get+0xc2/0xe0
> [  369.530973]  ? __pfx_write_threads+0x10/0x10 [nfsd]
> [  369.531133]  nfsctl_transaction_write+0x47/0x80 [nfsd]
> [  369.531324]  vfs_write+0xfa/0x420
> [  369.531448]  ? do_filp_open+0xae/0x150
> [  369.531574]  ksys_write+0x63/0xe0
> [  369.531693]  do_syscall_64+0x7d/0x160
> [  369.531816]  ? do_sys_openat2+0x81/0xd0
> [  369.531937]  ? syscall_exit_work+0xf3/0x120
> [  369.532058]  ? syscall_exit_to_user_mode+0x32/0x1b0
> [  369.532178]  ? do_syscall_64+0x89/0x160
> [  369.532344]  ? __mod_memcg_lruvec_state+0x95/0x150
> [  369.532465]  ? __lruvec_stat_mod_folio+0x84/0xd0
> [  369.532584]  ? syscall_exit_work+0xf3/0x120
> [  369.532705]  ? syscall_exit_to_user_mode+0x32/0x1b0
> [  369.532827]  ? do_syscall_64+0x89/0x160
> [  369.532947]  ? __handle_mm_fault+0x326/0x730
> [  369.533066]  ? __mod_memcg_lruvec_state+0x95/0x150
> [  369.533187]  ? __count_memcg_events+0x53/0xf0
> [  369.533306]  ? handle_mm_fault+0x245/0x340
> [  369.533427]  ? do_user_addr_fault+0x341/0x6b0
> [  369.533547]  ? exc_page_fault+0x70/0x160
> [  369.533666]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  369.533787] RIP: 0033:0x7f1db10fd617
> 
> crash> dis -l nfsd_destroy_serv+0x13f
> /root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468
> 0xffffffffc172e36f <nfsd_destroy_serv+319>:     mov    %r12,%rdi
> 
> which is the percpu_ref_exit() in nfsd_shutdown_net():
> 
> static void nfsd_shutdown_net(struct net *net)
> {
>         struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> 
>         if (!nn->nfsd_net_up)
>                 return;
> 
>         percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done);
>         wait_for_completion(&nn->nfsd_net_confirm_done);
> 
>         nfsd_export_flush(net);
>         nfs4_state_shutdown_net(net);
>         nfsd_reply_cache_shutdown(nn);
>         nfsd_file_cache_shutdown_net(net);
>         if (nn->lockd_up) {
>                 lockd_down(net);
>                 nn->lockd_up = false;
>         }
> 
>         wait_for_completion(&nn->nfsd_net_free_done);
>    ---> percpu_ref_exit(&nn->nfsd_net_ref);
> 
>         nn->nfsd_net_up = false;
>         nfsd_shutdown_generic();
> }
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux