Re: [PATCH v6 5/6] tracing: Show inode and device major:minor in deferred user space stacktrace

Steven Rostedt <rostedt@xxxxxxxxxxx> · Fri, 29 Aug 2025 12:57:56 -0400

On Fri, 29 Aug 2025 09:42:03 -0700
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 29 Aug 2025 at 09:33, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > I just realized that I'm using the rhashtable as an "does this hash exist".  
> 
> The question is still *why*?

The reason is to keep from triggering the event that records the pathname
for every look up.

> 
> SO JUST USE THE NUMBERS, for chissake! Don't make them mean anything.
> Don't try to think they mean something.
> 
> The *reason* I htink hashing 'struct file *' is better than the
> alternative is exactly that it *cannot* mean anything. It will get
> re-used quite actively, even when nobody actually changes any of the
> files. So you are forced to deal with this correctly, even though you
> seem to be fighting dealing with it correctly tooth and nail.
> 
> And at no point have you explained why you can't just treat it as
> meaningless numbers. The patch that started this all did exactly that.
> It just used the *wrong* numbers, and I pointed out why they were
> wrong, and why you shouldn't use those numbers.

I agree. The hash I showed last time was just using the pointers. The hash
itself is meaningless and is useless by itself. The only thing the hash is
doing is to be an identifier in the stack trace so that the path name and
buildid don't need to be generated and saved every time.

In my other email, I'm thinking of using the pid / vma->vm_start as a key
to know if the pathname needs to be printed again or not. Although, perhaps
if a task does a dlopen(), load some text and execute it, then a dlclose()
and another dlopen() and loads text, that this could break the assumption
that the vm_start is unique per file.

Just to clarify, the goal of this exercise is to avoid the work of creating
and generating the pathnames and buildids for every lookup / stacktrace.

Now maybe hashing the pathname isn't as expensive as I think it may be. And
just doing that could be "good enough".

-- Steve