As mentioned in the thread https://lore.kernel.org/linux-nfs/601285843.50695650.1748800817824.JavaMail.zimbra@xxxxxxx/T/#u We observe that interrupted batch processing jobs put the client into an unrecoverable state that requires the client host reboot. Finally, I was able to build a custom kernel with all required third-party drivers to prove my assumption. So indeed, marking pNFS device unavailable fixes the issue. Thus, please consider the proposed change and backport it to older kernels. I did testing with (which is not part of the patch) and will try to add a trace point as soon as I find out how to implement one. Tigran Mkrtchyan (1): pNFS/flexfiles: mark device unavailable on fatal connection error fs/nfs/flexfilelayout/flexfilelayoutdev.c | 4 ++++ 1 file changed, 4 insertions(+) -- 2.49.0