Re: [PATCH 2/4] mm: perform VMA allocation, freeing, duplication in mm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 25, 2025 at 8:32 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Fri, Apr 25, 2025 at 6:55 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> >
> > * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [250425 06:40]:
> > > On Thu, Apr 24, 2025 at 08:15:26PM -0700, Kees Cook wrote:
> > > >
> > > >
> > > > On April 24, 2025 2:15:27 PM PDT, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > > > >+static void vm_area_init_from(const struct vm_area_struct *src,
> > > > >+                        struct vm_area_struct *dest)
> > > > >+{
> > > > >+  dest->vm_mm = src->vm_mm;
> > > > >+  dest->vm_ops = src->vm_ops;
> > > > >+  dest->vm_start = src->vm_start;
> > > > >+  dest->vm_end = src->vm_end;
> > > > >+  dest->anon_vma = src->anon_vma;
> > > > >+  dest->vm_pgoff = src->vm_pgoff;
> > > > >+  dest->vm_file = src->vm_file;
> > > > >+  dest->vm_private_data = src->vm_private_data;
> > > > >+  vm_flags_init(dest, src->vm_flags);
> > > > >+  memcpy(&dest->vm_page_prot, &src->vm_page_prot,
> > > > >+         sizeof(dest->vm_page_prot));
> > > > >+  /*
> > > > >+   * src->shared.rb may be modified concurrently when called from
> > > > >+   * dup_mmap(), but the clone will reinitialize it.
> > > > >+   */
> > > > >+  data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared)));
> > > > >+  memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx,
> > > > >+         sizeof(dest->vm_userfaultfd_ctx));
> > > > >+#ifdef CONFIG_ANON_VMA_NAME
> > > > >+  dest->anon_name = src->anon_name;
> > > > >+#endif
> > > > >+#ifdef CONFIG_SWAP
> > > > >+  memcpy(&dest->swap_readahead_info, &src->swap_readahead_info,
> > > > >+         sizeof(dest->swap_readahead_info));
> > > > >+#endif
> > > > >+#ifdef CONFIG_NUMA
> > > > >+  dest->vm_policy = src->vm_policy;
> > > > >+#endif
> > > > >+}
> > > >
> > > > I know you're doing a big cut/paste here, but why in the world is this function written this way? Why not just:
> > > >
> > > > *dest = *src;
> > > >
> > > > And then do any one-off cleanups?
> > >
> > > Yup I find it odd, and error prone to be honest. We'll end up with uninitialised
> > > state for some fields if we miss them here, seems unwise...
> > >
> > > Presumably for performance?
> > >
> > > This is, as you say, me simply propagating what exists, but I do wonder.
> >
> > Two things come to mind:
> >
> > 1. How ctors are done.  (v3 of Suren's RCU safe patch series, willy made
> > a comment.. I think)
> >
> > 2. Some race that Vlastimil came up with the copy and the RCU safeness.
> > IIRC it had to do with the ordering of the setting of things?
> >
> > Also, looking at it again...
> >
> > How is it safe to do dest->anon_name = src->anon_name?  Isn't that ref
> > counted?
>
> dest->anon_name = src->anon_name is fine here because right after
> vm_area_init_from() we call dup_anon_vma_name() which will bump up the
> refcount. I don't recall why this is done this way but now looking at
> it I wonder if I could call dup_anon_vma_name() directly instead of
> this assignment. Might be just an overlooked legacy from the time we
> memcpy'd the entire structure. I'll need to double-check.
>
> >
> > Pretty sure it's okay, but Suren would know for sure on all of this.
> >
> > Suren, maybe you could send a patch with comments on this stuff?
>
> Yeah, I think I need to add some comments in this code for
> clarification. We do not copy the entire vm_area_struct because we
> have to preserve vma->vm_refcnt field of the dest vma. Since these
> structures are allocated from a cache with SLAB_TYPESAFE_BY_RCU,
> another thread might be concurrently checking the state of the dest
> object by reading dest->vm_refcnt. Therefore it's important here not
> to override the vm_refcnt. Changelog in
> https://lore.kernel.org/all/20250213224655.1680278-18-surenb@xxxxxxxxxx/
> touches on it but a comment in the code would be indeed helpful. Will
> add it but will wait for Lorenzo's refactoring to land into linux-mm

s/linux-mm/mm-unstable. I need my morning coffee.

> first to avoid adding merge conflicts.
>
> >
> > Thanks,
> > Liam





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux