On 8/27/25 23:06, David Hildenbrand wrote: > On 27.08.25 16:08, Eugen Hristev wrote: >> >> >> On 8/27/25 15:18, David Hildenbrand wrote: >>> On 27.08.25 13:59, Eugen Hristev wrote: >>>> >>>> >>>> On 8/25/25 16:58, David Hildenbrand wrote: >>>>> On 25.08.25 15:36, Eugen Hristev wrote: >>>>>> >>>>>> >>>>>> On 8/25/25 16:20, David Hildenbrand wrote: >>>>>>> >>>>>>>>> >>>>>>>>> IIRC, kernel/vmcore_info.c is never built as a module, as it also >>>>>>>>> accesses non-exported symbols. >>>>>>>> >>>>>>>> Hello David, >>>>>>>> >>>>>>>> I am looking again into this, and there are some things which in my >>>>>>>> opinion would be difficult to achieve. >>>>>>>> For example I looked into my patch #11 , which adds the `runqueues` into >>>>>>>> kmemdump. >>>>>>>> >>>>>>>> The runqueues is a variable of `struct rq` which is defined in >>>>>>>> kernel/sched/sched.h , which is not supposed to be included outside of >>>>>>>> sched. >>>>>>>> Now moving all the struct definition outside of sched.h into another >>>>>>>> public header would be rather painful and I don't think it's a really >>>>>>>> good option (The struct would be needed to compute the sizeof inside >>>>>>>> vmcoreinfo). Secondly, it would also imply moving all the nested struct >>>>>>>> definitions outside as well. I doubt this is something that we want for >>>>>>>> the sched subsys. How the subsys is designed, out of my understanding, >>>>>>>> is to keep these internal structs opaque outside of it. >>>>>>> >>>>>>> All the kmemdump module needs is a start and a length, correct? So the >>>>>>> only tricky part is getting the length. >>>>>> >>>>>> I also have in mind the kernel user case. How would a kernel programmer >>>>>> want to add some kernel structs/info/buffers into kmemdump such that the >>>>>> dump would contain their data ? Having "KMEMDUMP_VAR(...)" looks simple >>>>>> enough. >>>>> >>>>> The other way around, why should anybody have a saying in adding their >>>>> data to kmemdump? Why do we have that all over the kernel? >>>>> >>>>> Is your mechanism really so special? >>>>> >>>>> A single composer should take care of that, and it's really just start + >>>>> len of physical memory areas. >>>>> >>>>>> Otherwise maybe the programmer has to write helpers to compute lengths >>>>>> etc, and stitch them into kmemdump core. >>>>>> I am not saying it's impossible, but just tiresome perhaps. >>>>> >>>>> In your patch set, how many of these instances did you encounter where >>>>> that was a problem? >>>>> >>>>>>> >>>>>>> One could just add a const variable that holds this information, or even >>>>>>> better, a simple helper function to calculate that. >>>>>>> >>>>>>> Maybe someone else reading along has a better idea. >>>>>> >>>>>> This could work, but it requires again adding some code into the >>>>>> specific subsystem. E.g. struct_rq_get_size() >>>>>> I am open to ideas , and thank you very much for your thoughts. >>>>>> >>>>>>> >>>>>>> Interestingly, runqueues is a percpu variable, which makes me wonder if >>>>>>> what you had would work as intended (maybe it does, not sure). >>>>>> >>>>>> I would not really need to dump the runqueues. But the crash tool which >>>>>> I am using for testing, requires it. Without the runqueues it will not >>>>>> progress further to load the kernel dump. >>>>>> So I am not really sure what it does with the runqueues, but it works. >>>>>> Perhaps using crash/gdb more, to actually do something with this data, >>>>>> would give more insight about its utility. >>>>>> For me, it is a prerequisite to run crash, and then to be able to >>>>>> extract the log buffer from the dump. >>>>> >>>>> I have the faint recollection that percpu vars might not be stored in a >>>>> single contiguous physical memory area, but maybe my memory is just >>>>> wrong, that's why I was raising it. >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> From my perspective it's much simpler and cleaner to just add the >>>>>>>> kmemdump annotation macro inside the sched/core.c as it's done in my >>>>>>>> patch. This macro translates to a noop if kmemdump is not selected. >>>>>>> >>>>>>> I really don't like how we are spreading kmemdump all over the kernel, >>>>>>> and adding complexity with __section when really, all we need is a place >>>>>>> to obtain a start and a length. >>>>>>> >>>>>> >>>>>> I understand. The section idea was suggested by Thomas. Initially I was >>>>>> skeptic, but I like how it turned out. >>>>> >>>>> Yeah, I don't like it. Taste differs ;) >>>>> >>>>> I am in particular unhappy about custom memblock wrappers. >>>>> >>>>> [...] >>>>> >>>>>>>> >>>>>>>> To have this working outside of printk, it would be required to walk >>>>>>>> through all the printk structs/allocations and select the required info. >>>>>>>> Is this something that we want to do outside of printk ? >>>>>>> >>>>>>> I don't follow, please elaborate. >>>>>>> >>>>>>> How is e.g., log_buf_len_get() + log_buf_addr_get() not sufficient, >>>>>>> given that you run your initialization after setup_log_buf() ? >>>>>>> >>>>>>> >>>>>> >>>>>> My initial thought was the same. However I got some feedback from Petr >>>>>> Mladek here : >>>>>> >>>>>> https://lore.kernel.org/lkml/aBm5QH2p6p9Wxe_M@localhost.localdomain/ >>>>>> >>>>>> Where he explained how to register the structs correctly. >>>>>> It can be that setup_log_buf is called again at a later time perhaps. >>>>>> >>>>> >>>>> setup_log_buf() is a __init function, so there is only a certain time >>>>> frame where it can be called. >>>>> >>>>> In particular, once the buddy is up, memblock allocations are impossible >>>>> and it would be deeply flawed to call this function again. >>>>> >>>>> Let's not over-engineer this. >>>>> >>>>> Peter is on CC, so hopefully he can share his thoughts. >>>>> >>>> >>>> Hello David, >>>> >>>> I tested out this snippet (on top of my series, so you can see what I >>>> changed): >>>> >>>> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 18ba6c1e174f..7ac4248a00e5 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -67,7 +67,6 @@ >>>> #include <linux/wait_api.h> >>>> #include <linux/workqueue_api.h> >>>> #include <linux/livepatch_sched.h> >>>> -#include <linux/kmemdump.h> >>>> >>>> #ifdef CONFIG_PREEMPT_DYNAMIC >>>> # ifdef CONFIG_GENERIC_IRQ_ENTRY >>>> @@ -120,7 +119,12 @@ >>>> EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); >>>> EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); >>>> >>>> DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); >>>> -KMEMDUMP_VAR_CORE(runqueues, sizeof(runqueues)); >>>> + >>>> +size_t runqueues_get_size(void); >>>> +size_t runqueues_get_size(void) >>>> +{ >>>> + return sizeof(runqueues); >>>> +} >>>> >>>> #ifdef CONFIG_SCHED_PROXY_EXEC >>>> DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); >>>> diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c >>>> index d808c5e67f35..c6dd2d6e96dd 100644 >>>> --- a/kernel/vmcore_info.c >>>> +++ b/kernel/vmcore_info.c >>>> @@ -24,6 +24,12 @@ >>>> #include "kallsyms_internal.h" >>>> #include "kexec_internal.h" >>>> >>>> +typedef void* kmemdump_opaque_t; >>>> + >>>> +size_t runqueues_get_size(void); >>>> + >>>> +extern kmemdump_opaque_t runqueues; >>> >>> I would have tried that through: >>> >>> struct rq; >>> extern struct rq runqueues; >>> >>> But the whole PER_CPU_SHARED_ALIGNED makes this all weird, and likely >>> not the way we would want to handle that. >>> >>>> /* vmcoreinfo stuff */ >>>> unsigned char *vmcoreinfo_data; >>>> size_t vmcoreinfo_size; >>>> @@ -230,6 +236,9 @@ static int __init crash_save_vmcoreinfo_init(void) >>>> >>>> kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_VMCOREINFO, >>>> (void *)vmcoreinfo_data, vmcoreinfo_size); >>>> + kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_runqueues, >>>> + (void *)&runqueues, runqueues_get_size()); >>>> + >>>> return 0; >>>> } >>>> >>>> With this, no more .section, no kmemdump code into sched, however, there >>>> are few things : >>> >>> I would really just do here something like the following: >>> >>> /** >>> * sched_get_runqueues_area - obtain the runqueues area for dumping >>> * @start: ... >>> * @size: ... >>> * >>> * The obtained area is only to be used for dumping purposes. >>> */ >>> void sched_get_runqueues_area(void *start, size_t size) >>> { >>> start = &runqueues; >>> size = sizeof(runqueues); >>> } >>> >>> might be cleaner. >>> >> >> How about this in the header: >> >> #define DECLARE_DUMP_AREA_FUNC(subsys, symbol) \ >> >> void subsys ## _get_ ## symbol ##_area(void **start, size_t *size); >> >> >> >> #define DEFINE_DUMP_AREA_FUNC(subsys, symbol) \ >> >> void subsys ## _get_ ## symbol ##_area(void **start, size_t *size)\ >> >> {\ >> >> *start = &symbol;\ >> >> *size = sizeof(symbol);\ >> >> } >> >> >> then, in sched just >> >> DECLARE_DUMP_AREA_FUNC(sched, runqueues); >> >> DEFINE_DUMP_AREA_FUNC(sched, runqueues); >> >> or a single macro that wraps both. >> >> would make it shorter and neater. >> >> What do you think ? > > Looks a bit over-engineered, and will require us to import a header > (likely kmemdump.h) in these files, which I don't really enjoy. > > I would start simple, without any such macro-magic. It's a very simple > function after all, and likely you won't end up having many of these? > Thanks David, I will do it as you suggested and see what comes out of it. I have one side question you might know much better to answer: As we have a start and a size for each region, this start is a virtual address. The firmware/coprocessor that reads the memory and dumps it, requires physical addresses. What do you suggest to use to retrieve that address ? virt_to_phys might be problematic, __pa or __pa_symbol? or better lm_alias ? As kmemdump is agnostic of the region of the memory the `start` comes from, and it should be portable and platform independent. Thanks again, Eugen