On Thu, 17 Jul 2025, Mike Snitzer wrote: > On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote: > > On Thu, 17 Jul 2025, Trond Myklebust wrote: > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > The following patch series fixes a series of issues with the current > > > localio code, as reported in the link > > > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/ > > > > > > > > > Trond Myklebust (3): > > > NFS/localio: nfs_close_local_fh() fix check for file closed > > > NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() > > > NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file > > > > That all looks good to me - thanks a lot for finding and fixing my bugs. > > > > Reviewed-by: NeilBrown <neil@xxxxxxxxxx> > > > > I'd still like to fix the nfsd_file_cache_purge() issue but that is > > quite separate especially now that you've prevented it causing problems > > for nfs_uuid_put(). > > > > thanks, > > NeilBrown > > Unfortunately even with these 3 v2 fixes I was just able to hit the > same hang on NFSD shutdown. It took 5 iterations of the fio test, > reported here: > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/ > So it is harder to hit with these v2 fixes, nevertheless: > > [ 369.528839] task:rpc.nfsd state:D stack:0 pid:10569 tgid:10569 ppid:1 flags:0x00004006 Are there any other tasks which are in "state:D", or any nfsd or nfs processes that are waiting in any state? I'll see if I can work out any way that an nfsd_net_ref reference might leak. Thanks, NeilBrown > [ 369.528985] Call Trace: > [ 369.529127] <TASK> > [ 369.529295] __schedule+0x26d/0x530 > [ 369.529435] schedule+0x27/0xa0 > [ 369.529566] schedule_timeout+0x14e/0x160 > [ 369.529700] ? svc_destroy+0xce/0x160 [sunrpc] > [ 369.529882] ? lockd_put+0x5f/0x90 [lockd] > [ 369.530022] __wait_for_common+0x8f/0x1d0 > [ 369.530154] ? __pfx_schedule_timeout+0x10/0x10 > [ 369.530329] nfsd_destroy_serv+0x13f/0x1a0 [nfsd] > [ 369.530516] nfsd_svc+0xe0/0x170 [nfsd] > [ 369.530684] write_threads+0xc3/0x190 [nfsd] > [ 369.530845] ? simple_transaction_get+0xc2/0xe0 > [ 369.530973] ? __pfx_write_threads+0x10/0x10 [nfsd] > [ 369.531133] nfsctl_transaction_write+0x47/0x80 [nfsd] > [ 369.531324] vfs_write+0xfa/0x420 > [ 369.531448] ? do_filp_open+0xae/0x150 > [ 369.531574] ksys_write+0x63/0xe0 > [ 369.531693] do_syscall_64+0x7d/0x160 > [ 369.531816] ? do_sys_openat2+0x81/0xd0 > [ 369.531937] ? syscall_exit_work+0xf3/0x120 > [ 369.532058] ? syscall_exit_to_user_mode+0x32/0x1b0 > [ 369.532178] ? do_syscall_64+0x89/0x160 > [ 369.532344] ? __mod_memcg_lruvec_state+0x95/0x150 > [ 369.532465] ? __lruvec_stat_mod_folio+0x84/0xd0 > [ 369.532584] ? syscall_exit_work+0xf3/0x120 > [ 369.532705] ? syscall_exit_to_user_mode+0x32/0x1b0 > [ 369.532827] ? do_syscall_64+0x89/0x160 > [ 369.532947] ? __handle_mm_fault+0x326/0x730 > [ 369.533066] ? __mod_memcg_lruvec_state+0x95/0x150 > [ 369.533187] ? __count_memcg_events+0x53/0xf0 > [ 369.533306] ? handle_mm_fault+0x245/0x340 > [ 369.533427] ? do_user_addr_fault+0x341/0x6b0 > [ 369.533547] ? exc_page_fault+0x70/0x160 > [ 369.533666] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 369.533787] RIP: 0033:0x7f1db10fd617 > > crash> dis -l nfsd_destroy_serv+0x13f > /root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468 > 0xffffffffc172e36f <nfsd_destroy_serv+319>: mov %r12,%rdi > > which is the percpu_ref_exit() in nfsd_shutdown_net(): > > static void nfsd_shutdown_net(struct net *net) > { > struct nfsd_net *nn = net_generic(net, nfsd_net_id); > > if (!nn->nfsd_net_up) > return; > > percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done); > wait_for_completion(&nn->nfsd_net_confirm_done); > > nfsd_export_flush(net); > nfs4_state_shutdown_net(net); > nfsd_reply_cache_shutdown(nn); > nfsd_file_cache_shutdown_net(net); > if (nn->lockd_up) { > lockd_down(net); > nn->lockd_up = false; > } > > wait_for_completion(&nn->nfsd_net_free_done); > ---> percpu_ref_exit(&nn->nfsd_net_ref); > > nn->nfsd_net_up = false; > nfsd_shutdown_generic(); > } >