Hi Dave, Thanks for advising. I will try to dump size histolgram in next drop. Wengang > On Apr 29, 2025, at 7:38 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Mon, Apr 28, 2025 at 11:11:35AM -0700, Wengang Wang wrote: >> This patch introduces new fields to per-mount and global stats, >> and export them to user space. >> >> @page_alloc -- number of pages allocated from buddy to buffer cache >> @page_free -- number of pages freed to buddy from buffer cache >> @kbb_alloc -- number of BBs allocated from kmalloc slab to buffer cache >> @kbb_free -- number of BBs freed to kmalloc slab from buffer cache >> @vbb_alloc -- number of BBs allocated from vmalloc system to buffer cache >> @vbb_free -- number of BBs freed to vmalloc system from buffer cache > > This forms a permanent user API once created, so exposing internal > implementation details like this doesn't make me feel good. We've > changed how we allocate memory for buffers quite a bit recently > to do things like support large folios and minimise vmap usage, > then to use vmalloc instead of vmap, etc. e.g. we don't use pages > at all in the buffer cache anymore.. > > I'm actually looking further simplifying the implementation - I > think the custom folio/vmalloc stuff can be replaced entirely by a > single call to kvmalloc() now, which means some stuff will come from > slabs, some from the buddy and some from vmalloc. We won't know > where it comes from at all, and if this stats interface already > existed then such a change would render it completely useless. > >> By looking at above stats fields, user space can easily know the buffer >> cache usage. > > Not easily - the implementation only aggregates alloc/free values so > the user has to manually do the (alloc - free) calculation to > determine how much memory is currenlty in use. And then we don't > really know what size buffers are actually using that memory... > > i.e. buffers for everything other than xattrs are fixed sizes (single > sector, single block, directory block, inode cluster), so it makes > make more sense to me to dump a buffer size histogram for memory > usage. We can infer things like inode cluster memory usage from such > output, so not only would we get memory usage we also get some > insight into what is consuming the memory. > > Hence I think it would be better to track a set of buffer size based > buckets so we get output something like: > > buffer size count Total Bytes > ----------- ----- ----------- > < 4kB <n> <aggregate count of b_length> > 4kB > <= 8kB > <= 16kB > <= 32kB > <= 64kB > > I also think that it might be better to dump this in a separate > sysfs file rather than add it to the existing stats file. > > With this information on any given system, we can infer what > allocated from slab based on the buffer sizes and system PAGE_SIZE. > > However, my main point is that for the general case of "how much > memory is in use by the buffer cache", we really don't want to tie > it to the internal allocation implementation. A histogram output like the > above is not tied to the internal implementation, whilst giving > additional insight into what size allocations are generating all the > memory usage... > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx