On 4/3/25 11:18, David Hildenbrand wrote: > On 03.04.25 05:31, Jingbo Xu wrote: >> >> >> On 4/3/25 5:34 AM, Joanne Koong wrote: >>> On Thu, Dec 19, 2024 at 5:05 AM David Hildenbrand <david@xxxxxxxxxx> >>> wrote: >>>> >>>> On 23.11.24 00:23, Joanne Koong wrote: >>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the >>>>> folio if >>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag >>>>> set on its >>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the >>>>> mapping, the >>>>> writeback may take an indeterminate amount of time to complete, and >>>>> waits may get stuck. >>>>> >>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> >>>>> Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> >>>>> --- >>>>> mm/migrate.c | 5 ++++- >>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>> index df91248755e4..fe73284e5246 100644 >>>>> --- a/mm/migrate.c >>>>> +++ b/mm/migrate.c >>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t >>>>> get_new_folio, >>>>> */ >>>>> switch (mode) { >>>>> case MIGRATE_SYNC: >>>>> - break; >>>>> + if (!src->mapping || >>>>> + !mapping_writeback_indeterminate(src- >>>>> >mapping)) >>>>> + break; >>>>> + fallthrough; >>>>> default: >>>>> rc = -EBUSY; >>>>> goto out; >>>> >>>> Ehm, doesn't this mean that any fuse user can essentially completely >>>> block CMA allocations, memory compaction, memory hotunplug, memory >>>> poisoning... ?! >>>> >>>> That sounds very bad. >>> >>> I took a closer look at the migration code and the FUSE code. In the >>> migration code in migrate_folio_unmap(), I see that any MIGATE_SYNC >>> mode folio lock holds will block migration until that folio is >>> unlocked. This is the snippet in migrate_folio_unmap() I'm looking at: >>> >>> if (!folio_trylock(src)) { >>> if (mode == MIGRATE_ASYNC) >>> goto out; >>> >>> if (current->flags & PF_MEMALLOC) >>> goto out; >>> >>> if (mode == MIGRATE_SYNC_LIGHT && ! >>> folio_test_uptodate(src)) >>> goto out; >>> >>> folio_lock(src); >>> } >>> > > Right, I raised that also in my LSF/MM talk: waiting for readahead > currently implies waiting for the folio lock (there is no separate > readahead flag like there would be for writeback). > > The more I look into this and fuse, the more I realize that what fuse > does is just completely broken right now. > >>> If this is all that is needed for a malicious FUSE server to block >>> migration, then it makes no difference if AS_WRITEBACK_INDETERMINATE >>> mappings are skipped in migration. A malicious server has easier and >>> more powerful ways of blocking migration in FUSE than trying to do it >>> through writeback. For a malicious fuse server, we in fact wouldn't >>> even get far enough to hit writeback - a write triggers >>> aops->write_begin() and a malicious server would deliberately hang >>> forever while the folio is locked in write_begin(). >> >> Indeed it seems possible. A malicious FUSE server may already be >> capable of blocking the synchronous migration in this way. > > Yes, I think the conclusion is that we should advise people from not > using unprivileged FUSE if they care about any features that rely on > page migration or page reclaim. > >> >> >>> >>> I looked into whether we could eradicate all the places in FUSE where >>> we may hold the folio lock for an indeterminate amount of time, >>> because if that is possible, then we should not add this writeback way >>> for a malicious fuse server to affect migration. But I don't think we >>> can, for example taking one case, the folio lock needs to be held as >>> we read in the folio from the server when servicing page faults, else >>> the page cache would contain stale data if there was a concurrent >>> write that happened just before, which would lead to data corruption >>> in the filesystem. Imo, we need a more encompassing solution for all >>> these cases if we're serious about preventing FUSE from blocking >>> migration, which probably looks like a globally enforced default >>> timeout of some sort or an mm solution for mitigating the blast radius >>> of how much memory can be blocked from migration, but that is outside >>> the scope of this patchset and is its own standalone topic. > > I'm still skeptical about timeouts: we can only get it wrong. > > I think a proper solution is making these pages movable, which does seem > feasible if (a) splice is not involved and (b) we can find a way to not > hold the folio lock forever e.g., in the readahead case. > > Maybe readahead would have to be handled more similar to writeback > (e.g., having a separate flag, or using a combination of e.g., > writeback+uptodate flag, not sure) > > In both cases (readahead+writeback), we'd want to call into the FS to > migrate a folio that is under readahread/writeback. In case of fuse > without splice, a migration might be doable, and as discussed, splice > might just be avoided. My personal take is here that we should move away from splice. Keith (or colleague) is working on ZC with io-uring anyway, so maybe a good timing. We should just ensure that the new approach doesn't have the same issue. Thanks, Bernd