On Fri, Jul 11, 2025 at 03:34:23PM +0200, Vlastimil Babka wrote: > +cc linux-api - see the description of the new behavior below Ah yeah :) I sent on 0/10 also. Friday... > > On 7/11/25 13:38, Lorenzo Stoakes wrote: > > Historically we've made it a uAPI requirement that mremap() may only > > operate on a single VMA at a time. > > > > For instances where VMAs need to be resized, this makes sense, as it > > becomes very difficult to determine what a user actually wants should they > > indicate a desire to expand or shrink the size of multiple VMAs (truncate? > > Adjust sizes individually? Some other strategy?). > > > > However, in instances where a user is moving VMAs, it is restrictive to > > disallow this. > > > > This is especially the case when anonymous mapping remap may or may not be > > mergeable depending on whether VMAs have or have not been faulted due to > > anon_vma assignment and folio index alignment with vma->vm_pgoff. > > > > Often this can result in surprising impact where a moved region is faulted, > > then moved back and a user fails to observe a merge from otherwise > > compatible, adjacent VMAs. > > > > This change allows such cases to work without the user having to be > > cognizant of whether a prior mremap() move or other VMA operations has > > resulted in VMA fragmentation. > > > > We only permit this for mremap() operations that do NOT change the size of > > the VMA and DO specify MREMAP_MAYMOVE | MREMAP_FIXED. > > > > Should no VMA exist in the range, -EFAULT is returned as usual. > > > > If a VMA move spans a single VMA - then there is no functional change. > > > > Otherwise, we place additional requirements upon VMAs: > > > > * They must not have a userfaultfd context associated with them - this > > requires dropping the lock to notify users, and we want to perform the > > operation with the mmap write lock held throughout. > > > > * If file-backed, they cannot have a custom get_unmapped_area handler - > > this might result in MREMAP_FIXED not being honoured, which could result > > in unexpected positioning of VMAs in the moved region. > > > > There may be gaps in the range of VMAs that are moved: > > > > X Y X Y > > <---> <-> <---> <-> > > |-------| |-----| |-----| |-------| |-----| |-----| > > | A | | B | | C | ---> | A' | | B' | | C' | > > |-------| |-----| |-----| |-------| |-----| |-----| > > addr new_addr > > > > The move will preserve the gaps between each VMA. > > AFAIU "moving a gap" doesn't mean we unmap anything pre-existing where the > moved gap's range falls to, right? Worth pointing out explicitly. > > > Note that any failures encountered will result in a partial move. Since an > > mremap() can fail at any time, this might result in only some of the VMAs > > being moved. > > > > Note that failures are very rare and typically require an out of a memory > > condition or a mapping limit condition to be hit, assuming the VMAs being > > moved are valid. > > > > We don't try to assess ahead of time whether VMAs are valid according to > > the multi VMA rules, as it would be rather unusual for a user to mix > > uffd-enabled VMAs and/or VMAs which map unusual driver mappings that > > specify custom get_unmapped_area() handlers in an aggregate operation. > > > > So we optimise for the far, far more likely case of the operation being > > entirely permissible. > > Guess it's the sanest thing to do given all the cirumstances. > > > In the case of the move of a single VMA, the above conditions are > > permitted. This makes the behaviour identical for a single VMA as before. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > > Some nits: > > > --- > > mm/mremap.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 150 insertions(+), 7 deletions(-) > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > index 8cb08ccea6ad..59f49de0f84e 100644 > > --- a/mm/mremap.c > > +++ b/mm/mremap.c > > @@ -69,6 +69,8 @@ struct vma_remap_struct { > > enum mremap_type remap_type; /* expand, shrink, etc. */ > > bool mmap_locked; /* Is mm currently write-locked? */ > > unsigned long charged; /* If VM_ACCOUNT, # pages to account. */ > > + bool seen_vma; /* Is >1 VMA being moved? */ > > Seems this could be local variable of remap_move(). Yes, this is because before there _was_ some external use, but after rework not any more. Will fix up in a fix-patch. > > > + bool vmi_needs_reset; /* Was the VMA iterator invalidated? */ > > }; > > > > static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr) > > @@ -1188,6 +1190,9 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm, > > *new_vma_ptr = NULL; > > return -ENOMEM; > > } > > A newline here? I kinda thought it made sense to 'group' it with logic above, so this was on purpose. > > > + if (vma != vrm->vma) > > + vrm->vmi_needs_reset = true; > > A comment on what this condition means wouldn't hurt? Is it when "Source vma > may have been merged into new_vma" in copy_vma(), or when not? > Sure will add in a fix-patch.