On 03.04.25 05:31, Jingbo Xu wrote:
On 4/3/25 5:34 AM, Joanne Koong wrote:
On Thu, Dec 19, 2024 at 5:05 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 23.11.24 00:23, Joanne Koong wrote:
For migrations called in MIGRATE_SYNC mode, skip migrating the folio if
it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag set on its
mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the mapping, the
writeback may take an indeterminate amount of time to complete, and
waits may get stuck.
Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
---
mm/migrate.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index df91248755e4..fe73284e5246 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
*/
switch (mode) {
case MIGRATE_SYNC:
- break;
+ if (!src->mapping ||
+ !mapping_writeback_indeterminate(src->mapping))
+ break;
+ fallthrough;
default:
rc = -EBUSY;
goto out;
Ehm, doesn't this mean that any fuse user can essentially completely
block CMA allocations, memory compaction, memory hotunplug, memory
poisoning... ?!
That sounds very bad.
I took a closer look at the migration code and the FUSE code. In the
migration code in migrate_folio_unmap(), I see that any MIGATE_SYNC
mode folio lock holds will block migration until that folio is
unlocked. This is the snippet in migrate_folio_unmap() I'm looking at:
if (!folio_trylock(src)) {
if (mode == MIGRATE_ASYNC)
goto out;
if (current->flags & PF_MEMALLOC)
goto out;
if (mode == MIGRATE_SYNC_LIGHT && !folio_test_uptodate(src))
goto out;
folio_lock(src);
}
Right, I raised that also in my LSF/MM talk: waiting for readahead
currently implies waiting for the folio lock (there is no separate
readahead flag like there would be for writeback).
The more I look into this and fuse, the more I realize that what fuse
does is just completely broken right now.
If this is all that is needed for a malicious FUSE server to block
migration, then it makes no difference if AS_WRITEBACK_INDETERMINATE
mappings are skipped in migration. A malicious server has easier and
more powerful ways of blocking migration in FUSE than trying to do it
through writeback. For a malicious fuse server, we in fact wouldn't
even get far enough to hit writeback - a write triggers
aops->write_begin() and a malicious server would deliberately hang
forever while the folio is locked in write_begin().
Indeed it seems possible. A malicious FUSE server may already be
capable of blocking the synchronous migration in this way.
Yes, I think the conclusion is that we should advise people from not
using unprivileged FUSE if they care about any features that rely on
page migration or page reclaim.
I looked into whether we could eradicate all the places in FUSE where
we may hold the folio lock for an indeterminate amount of time,
because if that is possible, then we should not add this writeback way
for a malicious fuse server to affect migration. But I don't think we
can, for example taking one case, the folio lock needs to be held as
we read in the folio from the server when servicing page faults, else
the page cache would contain stale data if there was a concurrent
write that happened just before, which would lead to data corruption
in the filesystem. Imo, we need a more encompassing solution for all
these cases if we're serious about preventing FUSE from blocking
migration, which probably looks like a globally enforced default
timeout of some sort or an mm solution for mitigating the blast radius
of how much memory can be blocked from migration, but that is outside
the scope of this patchset and is its own standalone topic.
I'm still skeptical about timeouts: we can only get it wrong.
I think a proper solution is making these pages movable, which does seem
feasible if (a) splice is not involved and (b) we can find a way to not
hold the folio lock forever e.g., in the readahead case.
Maybe readahead would have to be handled more similar to writeback
(e.g., having a separate flag, or using a combination of e.g.,
writeback+uptodate flag, not sure)
In both cases (readahead+writeback), we'd want to call into the FS to
migrate a folio that is under readahread/writeback. In case of fuse
without splice, a migration might be doable, and as discussed, splice
might just be avoided.
I don't see how this patch has any additional negative impact on
memory migration for the case of malicious servers that the server
can't already (and more easily) do. In fact, this patchset if anything
helps memory given that malicious servers now can't also trigger page
allocations for temp pages that would never get freed.
If that's true, maybe we could drop this patch out of this patchset? So
that both before and after this patchset, synchronous migration could be
blocked by a malicious FUSE server, while the usability of continuous
memory (CMA) won't be affected.
I had exactly the same thought: if we can block forever on the folio
lock, there is no need for AS_WRITEBACK_INDETERMINATE. It's already all
completely broken.
--
Cheers,
David / dhildenb