On Fri, Jul 11, 2025 at 10:41:36AM -0700, Casey Chen wrote: > On Thu, Jul 10, 2025 at 8:09 PM Kent Overstreet > <kent.overstreet@xxxxxxxxx> wrote: > > > > On Thu, Jul 10, 2025 at 06:07:13PM -0700, Casey Chen wrote: > > > On Thu, Jul 10, 2025 at 5:54 PM Kent Overstreet > > > <kent.overstreet@xxxxxxxxx> wrote: > > > > > > > > On Thu, Jul 10, 2025 at 05:42:05PM -0700, Casey Chen wrote: > > > > > Hi All, > > > > > > > > > > Thanks for reviewing my previous patches. I am replying some comments > > > > > in our previous discussion > > > > > https://lore.kernel.org/all/CAJuCfpHhSUhxer-6MP3503w6520YLfgBTGp7Q9Qm9kgN4TNsfw@xxxxxxxxxxxxxx/T/#u > > > > > > > > > > Most people care about the motivations and usages of this feature. > > > > > Internally, we used to have systems having asymmetric memory to NUMA > > > > > nodes. Node 0 uses a lot of memory but node 1 is pretty empty. > > > > > Requests to allocate memory on node 0 always fail. With this patch, we > > > > > can find the imbalance and optimize the memory usage. Also, David > > > > > Rientjes and Sourav Panda provide their scenarios in which this patch > > > > > would be very useful. It is easy to turn on an off so I think it is > > > > > nice to have, enabling more scenarios in the future. > > > > > > > > > > Andrew / Kent, > > > > > * I agree with Kent on using for_each_possible_cpu rather than > > > > > for_each_online_cpu, considering CPU online/offline. > > > > > * When failing to allocate counters for in-kernel alloc_tag, panic() > > > > > is better than WARN(), eventually the kernel would panic at invalid > > > > > memory access. > > > > > * percpu stats would bloat data structures quite a bit. > > > > > > > > > > David Wang, > > > > > I don't really understand what is 'granularity of calling sites'. If > > > > > NUMA imbalance is found, the calling site could request memory > > > > > allocation from different nodes. Other factors can affect NUMA > > > > > balance, those information can be implemented in a different patch. > > > > > > > > Let's get this functionality in. > > > > > > > > We've already got userspace parsing and consuming /proc/allocinfo, so we > > > > just need to do it without changing that format. > > > > > > You mean keep the format without per-NUMA info the same as before ? > > > My patch v3 changed the header and the alignment of bytes and calls. I > > > can restore them back. > > > > I mean an ioctl interface - so we can have a userspace program with > > different switches for getting different types of output. > > > > Otherwise the existing programs people have already written for > > consuming /proc/allocinfo are going to break. > > What does this IOCTL interface do ? get bytes/calls per allocating > site ? or get total bytes/calls per module ? or per-NUMA bytes/calls > for each allocating site or module ? > Would it be too much work for this patch ? If you can show me an > example, it would be useful. I can try implementing it. Since we're adding optional features the ioctl needs to pass in a flags argument for which features we want - per numa node stats for now, but I suspect more will come up (maybe we'll want to revisit number of calls per callsite). Return -EINVAL if we ask for something the running kernel doesn't support...