On Fri, May 02, 2025 at 01:59:49PM +0100, Lorenzo Stoakes wrote: > On Fri, May 02, 2025 at 02:20:38PM +0200, Jan Kara wrote: > > On Thu 01-05-25 18:25:26, Lorenzo Stoakes wrote: > > > During the mmap() of a file-backed mapping, we invoke the underlying driver > > > file's mmap() callback in order to perform driver/file system > > > initialisation of the underlying VMA. > > > > > > This has been a source of issues in the past, including a significant > > > security concern relating to unwinding of error state discovered by Jann > > > Horn, as fixed in commit 5de195060b2e ("mm: resolve faulty mmap_region() > > > error path behaviour") which performed the recent, significant, rework of > > > mmap() as a whole. > > > > > > However, we have had a fly in the ointment remain - drivers have a great > > > deal of freedom in the .mmap() hook to manipulate VMA state (as well as > > > page table state). > > > > > > This can be problematic, as we can no longer reason sensibly about VMA > > > state once the call is complete (the ability to do - anything - here does > > > rather interfere with that). > > > > > > In addition, callers may choose to do odd or unusual things which might > > > interfere with subsequent steps in the mmap() process, and it may do so and > > > then raise an error, requiring very careful unwinding of state about which > > > we can make no assumptions. > > > > > > Rather than providing such an open-ended interface, this series provides an > > > alternative, far more restrictive one - we expose a whitelist of fields > > > which can be adjusted by the driver, along with immutable state upon which > > > the driver can make such decisions: > > > > > > struct vm_area_desc { > > > /* Immutable state. */ > > > struct mm_struct *mm; > > > unsigned long start; > > > unsigned long end; > > > > > > /* Mutable fields. Populated with initial state. */ > > > pgoff_t pgoff; > > > struct file *file; > > > vm_flags_t vm_flags; > > > pgprot_t page_prot; > > > > > > /* Write-only fields. */ > > > const struct vm_operations_struct *vm_ops; > > > void *private_data; > > > }; > > > > > > The mmap logic then updates the state used to either merge with a VMA or > > > establish a new VMA based upon this logic. > > > > > > This is achieved via new file hook .mmap_prepare(), which is, importantly, > > > invoked very early on in the mmap() process. > > > > > > If an error arises, we can very simply abort the operation with very little > > > unwinding of state required. > > > > Looks sensible. So is there a plan to transform existing .mmap hooks to > > .mmap_prepare hooks? I agree that for most filesystems this should be just > > easy 1:1 replacement and AFAIU this would be prefered? > > Thanks! > > Yeah the intent is to convert _all_ callers away from .mmap() so we can > lock down what drivers are doing and be able to (relatively) safely make > assumptions about what's going on in mmap logic. > > As David points out, we may need to add new callbacks to account for other The plural is a little worrying, let's please aim minimize the number of new methods we need for this.