Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/3/25 11:18, David Hildenbrand wrote:
> On 03.04.25 05:31, Jingbo Xu wrote:
>>
>>
>> On 4/3/25 5:34 AM, Joanne Koong wrote:
>>> On Thu, Dec 19, 2024 at 5:05 AM David Hildenbrand <david@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> On 23.11.24 00:23, Joanne Koong wrote:
>>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the
>>>>> folio if
>>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag
>>>>> set on its
>>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the
>>>>> mapping, the
>>>>> writeback may take an indeterminate amount of time to complete, and
>>>>> waits may get stuck.
>>>>>
>>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
>>>>> Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
>>>>> ---
>>>>>    mm/migrate.c | 5 ++++-
>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index df91248755e4..fe73284e5246 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t
>>>>> get_new_folio,
>>>>>                 */
>>>>>                switch (mode) {
>>>>>                case MIGRATE_SYNC:
>>>>> -                     break;
>>>>> +                     if (!src->mapping ||
>>>>> +                         !mapping_writeback_indeterminate(src-
>>>>> >mapping))
>>>>> +                             break;
>>>>> +                     fallthrough;
>>>>>                default:
>>>>>                        rc = -EBUSY;
>>>>>                        goto out;
>>>>
>>>> Ehm, doesn't this mean that any fuse user can essentially completely
>>>> block CMA allocations, memory compaction, memory hotunplug, memory
>>>> poisoning... ?!
>>>>
>>>> That sounds very bad.
>>>
>>> I took a closer look at the migration code and the FUSE code. In the
>>> migration code in migrate_folio_unmap(), I see that any MIGATE_SYNC
>>> mode folio lock holds will block migration until that folio is
>>> unlocked. This is the snippet in migrate_folio_unmap() I'm looking at:
>>>
>>>          if (!folio_trylock(src)) {
>>>                  if (mode == MIGRATE_ASYNC)
>>>                          goto out;
>>>
>>>                  if (current->flags & PF_MEMALLOC)
>>>                          goto out;
>>>
>>>                  if (mode == MIGRATE_SYNC_LIGHT && !
>>> folio_test_uptodate(src))
>>>                          goto out;
>>>
>>>                  folio_lock(src);
>>>          }
>>>
> 
> Right, I raised that also in my LSF/MM talk: waiting for readahead
> currently implies waiting for the folio lock (there is no separate
> readahead flag like there would be for writeback).
> 
> The more I look into this and fuse, the more I realize that what fuse
> does is just completely broken right now.
> 
>>> If this is all that is needed for a malicious FUSE server to block
>>> migration, then it makes no difference if AS_WRITEBACK_INDETERMINATE
>>> mappings are skipped in migration. A malicious server has easier and
>>> more powerful ways of blocking migration in FUSE than trying to do it
>>> through writeback. For a malicious fuse server, we in fact wouldn't
>>> even get far enough to hit writeback - a write triggers
>>> aops->write_begin() and a malicious server would deliberately hang
>>> forever while the folio is locked in write_begin().
>>
>> Indeed it seems possible.  A malicious FUSE server may already be
>> capable of blocking the synchronous migration in this way.
> 
> Yes, I think the conclusion is that we should advise people from not
> using unprivileged FUSE if they care about any features that rely on
> page migration or page reclaim.
> 
>>
>>
>>>
>>> I looked into whether we could eradicate all the places in FUSE where
>>> we may hold the folio lock for an indeterminate amount of time,
>>> because if that is possible, then we should not add this writeback way
>>> for a malicious fuse server to affect migration. But I don't think we
>>> can, for example taking one case, the folio lock needs to be held as
>>> we read in the folio from the server when servicing page faults, else
>>> the page cache would contain stale data if there was a concurrent
>>> write that happened just before, which would lead to data corruption
>>> in the filesystem. Imo, we need a more encompassing solution for all
>>> these cases if we're serious about preventing FUSE from blocking
>>> migration, which probably looks like a globally enforced default
>>> timeout of some sort or an mm solution for mitigating the blast radius
>>> of how much memory can be blocked from migration, but that is outside
>>> the scope of this patchset and is its own standalone topic.
> 
> I'm still skeptical about timeouts: we can only get it wrong.
> 
> I think a proper solution is making these pages movable, which does seem
> feasible if (a) splice is not involved and (b) we can find a way to not
> hold the folio lock forever e.g., in the readahead case.
> 
> Maybe readahead would have to be handled more similar to writeback
> (e.g., having a separate flag, or using a combination of e.g.,
> writeback+uptodate flag, not sure)
> 
> In both cases (readahead+writeback), we'd want to call into the FS to
> migrate a folio that is under readahread/writeback. In case of fuse
> without splice, a migration might be doable, and as discussed, splice
> might just be avoided.

My personal take is here that we should move away from splice.
Keith (or colleague) is working on ZC with io-uring anyway, so
maybe a good timing. We should just ensure that the new approach
doesn't have the same issue.

Thanks,
Bernd




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux