On Fri, Apr 25, 2025 at 6:55 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [250425 06:40]: > > On Thu, Apr 24, 2025 at 08:15:26PM -0700, Kees Cook wrote: > > > > > > > > > On April 24, 2025 2:15:27 PM PDT, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > >+static void vm_area_init_from(const struct vm_area_struct *src, > > > >+ struct vm_area_struct *dest) > > > >+{ > > > >+ dest->vm_mm = src->vm_mm; > > > >+ dest->vm_ops = src->vm_ops; > > > >+ dest->vm_start = src->vm_start; > > > >+ dest->vm_end = src->vm_end; > > > >+ dest->anon_vma = src->anon_vma; > > > >+ dest->vm_pgoff = src->vm_pgoff; > > > >+ dest->vm_file = src->vm_file; > > > >+ dest->vm_private_data = src->vm_private_data; > > > >+ vm_flags_init(dest, src->vm_flags); > > > >+ memcpy(&dest->vm_page_prot, &src->vm_page_prot, > > > >+ sizeof(dest->vm_page_prot)); > > > >+ /* > > > >+ * src->shared.rb may be modified concurrently when called from > > > >+ * dup_mmap(), but the clone will reinitialize it. > > > >+ */ > > > >+ data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared))); > > > >+ memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx, > > > >+ sizeof(dest->vm_userfaultfd_ctx)); > > > >+#ifdef CONFIG_ANON_VMA_NAME > > > >+ dest->anon_name = src->anon_name; > > > >+#endif > > > >+#ifdef CONFIG_SWAP > > > >+ memcpy(&dest->swap_readahead_info, &src->swap_readahead_info, > > > >+ sizeof(dest->swap_readahead_info)); > > > >+#endif > > > >+#ifdef CONFIG_NUMA > > > >+ dest->vm_policy = src->vm_policy; > > > >+#endif > > > >+} > > > > > > I know you're doing a big cut/paste here, but why in the world is this function written this way? Why not just: > > > > > > *dest = *src; > > > > > > And then do any one-off cleanups? > > > > Yup I find it odd, and error prone to be honest. We'll end up with uninitialised > > state for some fields if we miss them here, seems unwise... > > > > Presumably for performance? > > > > This is, as you say, me simply propagating what exists, but I do wonder. > > Two things come to mind: > > 1. How ctors are done. (v3 of Suren's RCU safe patch series, willy made > a comment.. I think) > > 2. Some race that Vlastimil came up with the copy and the RCU safeness. > IIRC it had to do with the ordering of the setting of things? > > Also, looking at it again... > > How is it safe to do dest->anon_name = src->anon_name? Isn't that ref > counted? dest->anon_name = src->anon_name is fine here because right after vm_area_init_from() we call dup_anon_vma_name() which will bump up the refcount. I don't recall why this is done this way but now looking at it I wonder if I could call dup_anon_vma_name() directly instead of this assignment. Might be just an overlooked legacy from the time we memcpy'd the entire structure. I'll need to double-check. > > Pretty sure it's okay, but Suren would know for sure on all of this. > > Suren, maybe you could send a patch with comments on this stuff? Yeah, I think I need to add some comments in this code for clarification. We do not copy the entire vm_area_struct because we have to preserve vma->vm_refcnt field of the dest vma. Since these structures are allocated from a cache with SLAB_TYPESAFE_BY_RCU, another thread might be concurrently checking the state of the dest object by reading dest->vm_refcnt. Therefore it's important here not to override the vm_refcnt. Changelog in https://lore.kernel.org/all/20250213224655.1680278-18-surenb@xxxxxxxxxx/ touches on it but a comment in the code would be indeed helpful. Will add it but will wait for Lorenzo's refactoring to land into linux-mm first to avoid adding merge conflicts. > > Thanks, > Liam