Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), The f_op->mmap hook has been deprecated in favour of f_op->mmap_prepare. This was introduced in order to make it possible for us to eventually eliminate the f_op->mmap hook which is highly problematic as it allows drivers and filesystems raw access to a VMA which is not yet correctly initialised. This hook also introduces complexity for the memory mapping operation, as we must correctly unwind what we do should an error arises. Overall this interface being so open has caused significant problems for us, including security issues, it is important for us to simply eliminate this as a source of problems. Therefore this series continues what was established by extending the functionality further to permit more drivers and filesystems to use mmap_prepare. After updating some areas that can simply use mmap_prepare as-is, and performing some housekeeping, we then introduce two new hooks: f_op->mmap_complete - this is invoked at the point of the VMA having been correctly inserted, though with the VMA write lock still held. mmap_prepare must also be specified. This expands the use of mmap_prepare to those callers which need to prepopulate mappings, as well as any which does genuinely require access to the VMA. It's simple - we will let the caller access the VMA, but only once it's established. At this point unwinding issues is simple - we just unmap the VMA. The VMA is also then correctly initialised at this stage so there can be no issues arising from a not-fully initialised VMA at this point. The other newly added hook is: f_op->mmap_abort - this is only valid in conjunction with mmap_prepare and mmap_complete. This is called should an error arise between mmap_prepare and mmap_complete (not as a result of mmap_prepare but rather some other part of the mapping logic). This is required in case mmap_prepare wishes to establish state or locks which need to be cleaned up on completion. If we did not provide this, then this could not be permitted as this cleanup would otherwise not occur should the mapping fail between the two calls. We then add split remap_pfn_range*() functions which allow for PFN remap (a typical mapping prepopulation operation) split between a prepare/complete step, as well as io_mremap_pfn_range_prepare, complete for a similar purpose. >From there we update various mm-adjacent logic to use this functionality as a first set of changes, as well as resctl and cramfs filesystems to round off the non-stacked filesystem instances. REVIEWER NOTE: ~~~~~~~~~~~~~~ I considered putting the complete, abort callbacks in vm_ops, however this won't work because then we would be unable to adjust helpers like generic_file_mmap_prepare() (which provides vm_ops) to provide the correct complete, abort callbacks. Conceptually it also makes more sense to have these in f_op as they are one-off operations performed at mmap time to establish the VMA, rather than a property of the VMA itself. Lorenzo Stoakes (16): mm/shmem: update shmem to use mmap_prepare device/dax: update devdax to use mmap_prepare mm: add vma_desc_size(), vma_desc_pages() helpers relay: update relay to use mmap_prepare mm/vma: rename mmap internal functions to avoid confusion mm: introduce the f_op->mmap_complete, mmap_abort hooks doc: update porting, vfs documentation for mmap_[complete, abort] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() mm: introduce io_remap_pfn_range_prepare, complete mm/hugetlb: update hugetlbfs to use mmap_prepare, mmap_complete mm: update mem char driver to use mmap_prepare, mmap_complete mm: update resctl to use mmap_prepare, mmap_complete, mmap_abort mm: update cramfs to use mmap_prepare, mmap_complete fs/proc: add proc_mmap_[prepare, complete] hooks for procfs fs/proc: update vmcore to use .proc_mmap_[prepare, complete] kcov: update kcov to use mmap_prepare, mmap_complete Documentation/filesystems/porting.rst | 9 ++ Documentation/filesystems/vfs.rst | 35 +++++++ arch/csky/include/asm/pgtable.h | 5 + arch/mips/alchemy/common/setup.c | 28 +++++- arch/mips/include/asm/pgtable.h | 10 ++ arch/s390/kernel/crash_dump.c | 6 +- arch/sparc/include/asm/pgtable_32.h | 29 +++++- arch/sparc/include/asm/pgtable_64.h | 29 +++++- drivers/char/mem.c | 80 ++++++++------- drivers/dax/device.c | 32 +++--- fs/cramfs/inode.c | 134 ++++++++++++++++++-------- fs/hugetlbfs/inode.c | 86 +++++++++-------- fs/ntfs3/file.c | 2 +- fs/proc/inode.c | 13 ++- fs/proc/vmcore.c | 53 +++++++--- fs/resctrl/pseudo_lock.c | 56 ++++++++--- include/linux/fs.h | 4 + include/linux/mm.h | 53 +++++++++- include/linux/mm_types.h | 5 + include/linux/proc_fs.h | 5 + include/linux/shmem_fs.h | 3 +- include/linux/vmalloc.h | 10 +- kernel/kcov.c | 40 +++++--- kernel/relay.c | 32 +++--- mm/memory.c | 128 +++++++++++++++--------- mm/secretmem.c | 2 +- mm/shmem.c | 49 +++++++--- mm/util.c | 18 +++- mm/vma.c | 96 +++++++++++++++--- mm/vmalloc.c | 16 ++- tools/testing/vma/vma_internal.h | 31 +++++- 31 files changed, 810 insertions(+), 289 deletions(-) -- 2.51.0