On Fri, 29 Aug 2025 09:42:03 -0700 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 29 Aug 2025 at 09:33, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > I just realized that I'm using the rhashtable as an "does this hash exist". > > The question is still *why*? The reason is to keep from triggering the event that records the pathname for every look up. > > SO JUST USE THE NUMBERS, for chissake! Don't make them mean anything. > Don't try to think they mean something. > > The *reason* I htink hashing 'struct file *' is better than the > alternative is exactly that it *cannot* mean anything. It will get > re-used quite actively, even when nobody actually changes any of the > files. So you are forced to deal with this correctly, even though you > seem to be fighting dealing with it correctly tooth and nail. > > And at no point have you explained why you can't just treat it as > meaningless numbers. The patch that started this all did exactly that. > It just used the *wrong* numbers, and I pointed out why they were > wrong, and why you shouldn't use those numbers. I agree. The hash I showed last time was just using the pointers. The hash itself is meaningless and is useless by itself. The only thing the hash is doing is to be an identifier in the stack trace so that the path name and buildid don't need to be generated and saved every time. In my other email, I'm thinking of using the pid / vma->vm_start as a key to know if the pathname needs to be printed again or not. Although, perhaps if a task does a dlopen(), load some text and execute it, then a dlclose() and another dlopen() and loads text, that this could break the assumption that the vm_start is unique per file. Just to clarify, the goal of this exercise is to avoid the work of creating and generating the pathnames and buildids for every lookup / stacktrace. Now maybe hashing the pathname isn't as expensive as I think it may be. And just doing that could be "good enough". -- Steve