On Wed, Jun 04, 2025 at 07:03:08PM +0100, Lorenzo Stoakes wrote: > diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst > index e6756e78b476..be49e2a269e4 100644 > --- a/Documentation/mm/process_addrs.rst > +++ b/Documentation/mm/process_addrs.rst > @@ -303,7 +303,9 @@ There are four key operations typically performed on page tables: > 1. **Traversing** page tables - Simply reading page tables in order to traverse > them. This only requires that the VMA is kept stable, so a lock which > establishes this suffices for traversal (there are also lockless variants > - which eliminate even this requirement, such as :c:func:`!gup_fast`). > + which eliminate even this requirement, such as :c:func:`!gup_fast`). There is > + also a special case of page table traversal for non-VMA regions which we > + consider separately below. > 2. **Installing** page table mappings - Whether creating a new mapping or > modifying an existing one in such a way as to change its identity. This > requires that the VMA is kept stable via an mmap or VMA lock (explicitly not > @@ -335,15 +337,13 @@ ahead and perform these operations on page tables (though internally, kernel > operations that perform writes also acquire internal page table locks to > serialise - see the page table implementation detail section for more details). > > +.. note:: We free empty PTE tables on zap under the RCU lock - this does not > + change the aforementioned locking requirements around zapping. > + > When **installing** page table entries, the mmap or VMA lock must be held to > keep the VMA stable. We explore why this is in the page table locking details > section below. > > -.. warning:: Page tables are normally only traversed in regions covered by VMAs. > - If you want to traverse page tables in areas that might not be > - covered by VMAs, heavier locking is required. > - See :c:func:`!walk_page_range_novma` for details. > - > **Freeing** page tables is an entirely internal memory management operation and > has special requirements (see the page freeing section below for more details). > > @@ -355,6 +355,44 @@ has special requirements (see the page freeing section below for more details). > from the reverse mappings, but no other VMAs can be permitted to be > accessible and span the specified range. > > +Traversing non-VMA page tables > +------------------------------ > + > +We've focused above on traversal of page tables belonging to VMAs. It is also > +possible to traverse page tables which are not represented by VMAs. > + > +Kernel page table mappings themselves are generally managed but whatever part of > +the kernel established them and the aforementioned locking rules do not apply - > +for instance vmalloc has its own set of locks which are utilised for > +establishing and tearing down page its page tables. > + > +However, for convenience we provide the :c:func:`!walk_kernel_page_table_range` > +function which is synchronised via the mmap lock on the :c:macro:`!init_mm` > +kernel instantiation of the :c:struct:`!struct mm_struct` metadata object. > + > +If an operation requires exclusive access, a write lock is used, but if not, a > +read lock suffices - we assert only that at least a read lock has been acquired. > + > +Since, aside from vmalloc and memory hot plug, kernel page tables are not torn > +down all that often - this usually suffices, however any caller of this > +functionality must ensure that any additionally required locks are acquired in > +advance. > + > +We also permit a truly unusual case is the traversal of non-VMA ranges in > +**userland** ranges, as provided for by :c:func:`!walk_page_range_debug`. > + > +This has only one user - the general page table dumping logic (implemented in > +:c:macro:`!mm/ptdump.c`) - which seeks to expose all mappings for debug purposes > +even if they are highly unusual (possibly architecture-specific) and are not > +backed by a VMA. > + > +We must take great care in this case, as the :c:func:`!munmap` implementation > +detaches VMAs under an mmap write lock before tearing down page tables under a > +downgraded mmap read lock. > + > +This means such an operation could race with this, and thus an mmap **write** > +lock is required. > + > Lock ordering > ------------- > > @@ -461,6 +499,10 @@ Locking Implementation Details > Page table locking details > -------------------------- > > +.. note:: This section explores page table locking requirements for page tables > + encompassed by a VMA. See the above section on non-VMA page table > + traversal for details on how we handle that case. > + > In addition to the locks described in the terminology section above, we have > additional locks dedicated to page tables: > The wording looks good, thanks! Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx> -- An old man doll... just what I always wanted! - Clara
Attachment:
signature.asc
Description: PGP signature