Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/25 1:13 PM, Peter Xu wrote:

(skipping to this bit as I think we're mostly in agreement on the above)

>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>> index 296d294142c8..fa721525d93a 100644
>>> --- a/arch/x86/mm/fault.c
>>> +++ b/arch/x86/mm/fault.c
>>> @@ -1300,9 +1300,14 @@ void do_user_addr_fault(struct pt_regs *regs,
>>>          * We set FAULT_FLAG_USER based on the register state, not
>>>          * based on X86_PF_USER. User space accesses that cause
>>>          * system page faults are still user accesses.
>>> +        *
>>> +        * When we're in user mode, allow fast response on non-fatal
>>> +        * signals.  Do not set this in kernel mode faults because normally
>>> +        * a kernel fault means the fault must be resolved anyway before
>>> +        * going back to userspace.
>>>          */
>>>         if (user_mode(regs))
>>> -               flags |= FAULT_FLAG_USER;
>>> +               flags |= FAULT_FLAG_USER | FAULT_FLAG_INTERRUPTIBLE;
>>>  
>>>  #ifdef CONFIG_X86_64
>>>         /*
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 9b701cfbef22..a80f3f609b37 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -487,8 +487,7 @@ extern unsigned int kobjsize(const void *objp);
>>>   * arch-specific page fault handlers.
>>>   */
>>>  #define FAULT_FLAG_DEFAULT  (FAULT_FLAG_ALLOW_RETRY | \
>>> -                            FAULT_FLAG_KILLABLE | \
>>> -                            FAULT_FLAG_INTERRUPTIBLE)
>>> +                            FAULT_FLAG_KILLABLE)
>>> ===8<===
>>>
>>> That also kind of matches with what we do with fault_signal_pending().
>>> Would it make sense?
>>
>> I don't think doing a non-bounded non-interruptible sleep for a
>> condition that may never resolve (eg userfaultfd never fills the fault)
>> is a good idea. What happens if the condition never becomes true? You
> 
> If page fault is never going to be resolved, normally we sigkill the
> program as it can't move any further with no way to resolve the page fault.
> 
> But yeah that's based on the fact sigkill will work first..

Yep

>> can't even kill the task at that point... Generally UNINTERRUPTIBLE
>> sleep should only be used if it's a bounded wait.
>>
>> For example, if I ran my previous write(2) reproducer here and the task
>> got killed or exited before the userfaultfd fills the fault, then you'd
>> have the task stuck in 'D' forever. Can't be killed, can't get
>> reclaimed.
>>
>> In other words, this won't work.
> 
> .. Would you help explain why it didn't work even for SIGKILL?  Above will
> still set FAULT_FLAG_KILLABLE, hence I thought SIGKILL would always work
> regardless.
> 
> For such kernel user page access, IIUC it should respond to SIGKILL in
> handle_userfault(), then fault_signal_pending() would trap the SIGKILL this
> time -> going kernel fixups. Then the upper stack should get -EFAULT in the
> exception fixup path.
> 
> I could have missed something..

It won't work because sending the signal will not wake the process in
question as it's sleeping uninterruptibly, forever. My looping approach
still works for fatal signals as we abort the loop every now and then,
hence we know it won't be stuck forever. But if you don't have a timeout
on that uninterruptible sleep, it's not waking from being sent a signal
alone.

Example:

axboe@m2max-kvm ~> sudo ./tufd 
got buf 0xffff89800000
child will write
Page fault
flags = 0; address = ffff89800000
wait on child
fish: Job 1, 'sudo ./tufd' terminated by signal SIGKILL (Forced quit)

meanwhile in ps:

root         837     837  0.0    2  0.0  14628  1220 ?        Dl   12:37   0:00 ./tufd
root         837     838  0.0    2  0.0  14628  1220 ?        Sl   12:37   0:00 ./tufd

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux