On 8/25/25 16:58, David Hildenbrand wrote: > On 25.08.25 15:36, Eugen Hristev wrote: >> >> >> On 8/25/25 16:20, David Hildenbrand wrote: >>> >>>>> >>>>> IIRC, kernel/vmcore_info.c is never built as a module, as it also >>>>> accesses non-exported symbols. >>>> >>>> Hello David, >>>> >>>> I am looking again into this, and there are some things which in my >>>> opinion would be difficult to achieve. >>>> For example I looked into my patch #11 , which adds the `runqueues` into >>>> kmemdump. >>>> >>>> The runqueues is a variable of `struct rq` which is defined in >>>> kernel/sched/sched.h , which is not supposed to be included outside of >>>> sched. >>>> Now moving all the struct definition outside of sched.h into another >>>> public header would be rather painful and I don't think it's a really >>>> good option (The struct would be needed to compute the sizeof inside >>>> vmcoreinfo). Secondly, it would also imply moving all the nested struct >>>> definitions outside as well. I doubt this is something that we want for >>>> the sched subsys. How the subsys is designed, out of my understanding, >>>> is to keep these internal structs opaque outside of it. >>> >>> All the kmemdump module needs is a start and a length, correct? So the >>> only tricky part is getting the length. >> >> I also have in mind the kernel user case. How would a kernel programmer >> want to add some kernel structs/info/buffers into kmemdump such that the >> dump would contain their data ? Having "KMEMDUMP_VAR(...)" looks simple >> enough. > > The other way around, why should anybody have a saying in adding their > data to kmemdump? Why do we have that all over the kernel? > > Is your mechanism really so special? > > A single composer should take care of that, and it's really just start + > len of physical memory areas. > >> Otherwise maybe the programmer has to write helpers to compute lengths >> etc, and stitch them into kmemdump core. >> I am not saying it's impossible, but just tiresome perhaps. > > In your patch set, how many of these instances did you encounter where > that was a problem? > >>> >>> One could just add a const variable that holds this information, or even >>> better, a simple helper function to calculate that. >>> >>> Maybe someone else reading along has a better idea. >> >> This could work, but it requires again adding some code into the >> specific subsystem. E.g. struct_rq_get_size() >> I am open to ideas , and thank you very much for your thoughts. >> >>> >>> Interestingly, runqueues is a percpu variable, which makes me wonder if >>> what you had would work as intended (maybe it does, not sure). >> >> I would not really need to dump the runqueues. But the crash tool which >> I am using for testing, requires it. Without the runqueues it will not >> progress further to load the kernel dump. >> So I am not really sure what it does with the runqueues, but it works. >> Perhaps using crash/gdb more, to actually do something with this data, >> would give more insight about its utility. >> For me, it is a prerequisite to run crash, and then to be able to >> extract the log buffer from the dump. > > I have the faint recollection that percpu vars might not be stored in a > single contiguous physical memory area, but maybe my memory is just > wrong, that's why I was raising it. > >> >>> >>>> >>>> From my perspective it's much simpler and cleaner to just add the >>>> kmemdump annotation macro inside the sched/core.c as it's done in my >>>> patch. This macro translates to a noop if kmemdump is not selected. >>> >>> I really don't like how we are spreading kmemdump all over the kernel, >>> and adding complexity with __section when really, all we need is a place >>> to obtain a start and a length. >>> >> >> I understand. The section idea was suggested by Thomas. Initially I was >> skeptic, but I like how it turned out. > > Yeah, I don't like it. Taste differs ;) > > I am in particular unhappy about custom memblock wrappers. > > [...] > >>>> >>>> To have this working outside of printk, it would be required to walk >>>> through all the printk structs/allocations and select the required info. >>>> Is this something that we want to do outside of printk ? >>> >>> I don't follow, please elaborate. >>> >>> How is e.g., log_buf_len_get() + log_buf_addr_get() not sufficient, >>> given that you run your initialization after setup_log_buf() ? >>> >>> >> >> My initial thought was the same. However I got some feedback from Petr >> Mladek here : >> >> https://lore.kernel.org/lkml/aBm5QH2p6p9Wxe_M@localhost.localdomain/ >> >> Where he explained how to register the structs correctly. >> It can be that setup_log_buf is called again at a later time perhaps. >> > > setup_log_buf() is a __init function, so there is only a certain time > frame where it can be called. > > In particular, once the buddy is up, memblock allocations are impossible > and it would be deeply flawed to call this function again. > > Let's not over-engineer this. > > Peter is on CC, so hopefully he can share his thoughts. > Hello David, I tested out this snippet (on top of my series, so you can see what I changed): diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 18ba6c1e174f..7ac4248a00e5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -67,7 +67,6 @@ #include <linux/wait_api.h> #include <linux/workqueue_api.h> #include <linux/livepatch_sched.h> -#include <linux/kmemdump.h> #ifdef CONFIG_PREEMPT_DYNAMIC # ifdef CONFIG_GENERIC_IRQ_ENTRY @@ -120,7 +119,12 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); -KMEMDUMP_VAR_CORE(runqueues, sizeof(runqueues)); + +size_t runqueues_get_size(void); +size_t runqueues_get_size(void) +{ + return sizeof(runqueues); +} #ifdef CONFIG_SCHED_PROXY_EXEC DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index d808c5e67f35..c6dd2d6e96dd 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -24,6 +24,12 @@ #include "kallsyms_internal.h" #include "kexec_internal.h" +typedef void* kmemdump_opaque_t; + +size_t runqueues_get_size(void); + +extern kmemdump_opaque_t runqueues; + /* vmcoreinfo stuff */ unsigned char *vmcoreinfo_data; size_t vmcoreinfo_size; @@ -230,6 +236,9 @@ static int __init crash_save_vmcoreinfo_init(void) kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_VMCOREINFO, (void *)vmcoreinfo_data, vmcoreinfo_size); + kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_runqueues, + (void *)&runqueues, runqueues_get_size()); + return 0; } With this, no more .section, no kmemdump code into sched, however, there are few things : First the size function, which is quite dull and doesn't fit into the sched very much. Second, having the extern with a different "opaque" type to avoid exposing the struct rq definition, which is quite hackish. What do you think ? My opinion is that it's ugly, but maybe you have some better idea how to write this nicer ? ( I am also not 100 % sure if I did this the way you wanted). Thanks for helping out, Eugen