On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote: > On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote: >> Isn't this broken earlier by "Don't relocate non-allocated regions in modules." >> (pre-Git, [1])? > > Looking further back into the history, we have > 21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)") > > which does > > + if (pcpuindex) { > + /* We have a special allocation for this section. */ > + mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size, > + sechdrs[pcpuindex].sh_addralign); > + if (!mod->percpu) { > + err = -ENOMEM; > + goto free_mod; > + } > + sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC; > + } > > so this looks like the origin. This patch added the initial per-cpu support for modules. The relocation handling at that point appears correct to me. I think it's the mentioned patch "Don't relocate non-allocated regions in modules" that broke it. > > … >>> --- a/kernel/module/main.c >>> +++ b/kernel/module/main.c >>> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags) >>> if (err) >>> return ERR_PTR(err); >>> >>> + /* Add SHF_ALLOC back so that relocations are applied. */ >>> + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu) >>> + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC; >>> + >>> /* Module has been copied to its final place now: return it. */ >>> mod = (void *)info->sechdrs[info->index.mod].sh_addr; >>> kmemleak_load_module(mod, info); >> >> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr >> is set by rewrite_section_headers() to point to the percpu data in the >> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it >> doesn't move and the sh_addr isn't adjusted by move_module(). The >> function apply_relocations() then applies the relocations in the initial >> ELF copy. Finally, post_relocation() copies the relocated percpu data to >> their final per-CPU destinations. >> >> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in >> this way. It is ok to reset it once, but if we need to set it back again >> then I would reconsider this. > > I had the other way around but this flag is not considered anywhere > else other than the functions called here. So I decided to add back what > was taken once. > >> An alternative approach could be to teach apply_relocations() that the >> percpu section is special and should be relocated even though it doesn't >> have SHF_ALLOC set. This would also allow adding a comment explaining >> that we're relocating the data in the original ELF copy, which I find >> useful to mention as it is different to other relocation processing. > > Not sure if this makes it better. It looks like it continues a > workaround… > The only reason why it has been removed in the first place is to skip > the copy process. The SHF_ALLOC flag is also removed to prevent the section from being allocated by layout_sections(). > We could also keep the flag and skip the section during the copy > process based on its id. This was the original intention. > >> For instance: >> >> /* >> * Don't bother with non-allocated sections. >> * >> * An exception is the percpu section, which has separate allocations >> * for individual CPUs. We relocate the percpu section in the initial >> * ELF template and subsequently copy it to the per-CPU destinations. >> */ >> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) && >> infosec != info->index.pcpu) >> continue; >> > > If you insist but… It seems logical to me that the SHF_ALLOC flag is removed for the percpu section since it isn't directly allocated by the regular process. This is consistent with what the module loader does in other similar cases. I could also understand keeping the flag and explicitly skipping the layout and allocate process for the section. However, adjusting the flag back and forth to trigger the right code paths in between seems fragile to me and harder to maintain if we need to shuffle things around in the future. -- Cheers, Petr