On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote: > On Thu, 17 Jul 2025, Trond Myklebust wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > The following patch series fixes a series of issues with the current > > localio code, as reported in the link > > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/ > > > > > > Trond Myklebust (3): > > NFS/localio: nfs_close_local_fh() fix check for file closed > > NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() > > NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file > > That all looks good to me - thanks a lot for finding and fixing my bugs. > > Reviewed-by: NeilBrown <neil@xxxxxxxxxx> > > I'd still like to fix the nfsd_file_cache_purge() issue but that is > quite separate especially now that you've prevented it causing problems > for nfs_uuid_put(). > > thanks, > NeilBrown Unfortunately even with these 3 v2 fixes I was just able to hit the same hang on NFSD shutdown. It took 5 iterations of the fio test, reported here: https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/ So it is harder to hit with these v2 fixes, nevertheless: [ 369.528839] task:rpc.nfsd state:D stack:0 pid:10569 tgid:10569 ppid:1 flags:0x00004006 [ 369.528985] Call Trace: [ 369.529127] <TASK> [ 369.529295] __schedule+0x26d/0x530 [ 369.529435] schedule+0x27/0xa0 [ 369.529566] schedule_timeout+0x14e/0x160 [ 369.529700] ? svc_destroy+0xce/0x160 [sunrpc] [ 369.529882] ? lockd_put+0x5f/0x90 [lockd] [ 369.530022] __wait_for_common+0x8f/0x1d0 [ 369.530154] ? __pfx_schedule_timeout+0x10/0x10 [ 369.530329] nfsd_destroy_serv+0x13f/0x1a0 [nfsd] [ 369.530516] nfsd_svc+0xe0/0x170 [nfsd] [ 369.530684] write_threads+0xc3/0x190 [nfsd] [ 369.530845] ? simple_transaction_get+0xc2/0xe0 [ 369.530973] ? __pfx_write_threads+0x10/0x10 [nfsd] [ 369.531133] nfsctl_transaction_write+0x47/0x80 [nfsd] [ 369.531324] vfs_write+0xfa/0x420 [ 369.531448] ? do_filp_open+0xae/0x150 [ 369.531574] ksys_write+0x63/0xe0 [ 369.531693] do_syscall_64+0x7d/0x160 [ 369.531816] ? do_sys_openat2+0x81/0xd0 [ 369.531937] ? syscall_exit_work+0xf3/0x120 [ 369.532058] ? syscall_exit_to_user_mode+0x32/0x1b0 [ 369.532178] ? do_syscall_64+0x89/0x160 [ 369.532344] ? __mod_memcg_lruvec_state+0x95/0x150 [ 369.532465] ? __lruvec_stat_mod_folio+0x84/0xd0 [ 369.532584] ? syscall_exit_work+0xf3/0x120 [ 369.532705] ? syscall_exit_to_user_mode+0x32/0x1b0 [ 369.532827] ? do_syscall_64+0x89/0x160 [ 369.532947] ? __handle_mm_fault+0x326/0x730 [ 369.533066] ? __mod_memcg_lruvec_state+0x95/0x150 [ 369.533187] ? __count_memcg_events+0x53/0xf0 [ 369.533306] ? handle_mm_fault+0x245/0x340 [ 369.533427] ? do_user_addr_fault+0x341/0x6b0 [ 369.533547] ? exc_page_fault+0x70/0x160 [ 369.533666] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 369.533787] RIP: 0033:0x7f1db10fd617 crash> dis -l nfsd_destroy_serv+0x13f /root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468 0xffffffffc172e36f <nfsd_destroy_serv+319>: mov %r12,%rdi which is the percpu_ref_exit() in nfsd_shutdown_net(): static void nfsd_shutdown_net(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); if (!nn->nfsd_net_up) return; percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done); wait_for_completion(&nn->nfsd_net_confirm_done); nfsd_export_flush(net); nfs4_state_shutdown_net(net); nfsd_reply_cache_shutdown(nn); nfsd_file_cache_shutdown_net(net); if (nn->lockd_up) { lockd_down(net); nn->lockd_up = false; } wait_for_completion(&nn->nfsd_net_free_done); ---> percpu_ref_exit(&nn->nfsd_net_ref); nn->nfsd_net_up = false; nfsd_shutdown_generic(); }