On Fri, 29 Aug 2025 19:42:46 -0400 Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > vma = NULL; > hash = 0; > foreach addr in callchain > if (!vma || addr not in range of vma) { > vma = vma_lookup(addr); > hash = get_hash(vma); > } > callchain[i] = addr - offset; > hash[i] = hash; > > > I had that get_hash(vma) have something like: > > > u32 get_hash(vma) { > unsigned long ptr = (unsigned long)vma->vm_file; > u32 hash; > > /* Remove alignment */ > ptr >>= 3; > hash = siphash_1u32((u32)ptr, &key); Oh, this hash isn't that great, as it did appear to have collisions. But I saw in vsprintf() it has something like: #ifdef CONFIG_64BIT return (u32)(unsigned long)siphash_1u64((u64)ptr, &key); #else return (u32)siphash_1u32((u32)ptr, &key); #endif Which for the 64 bit version, it uses all the bits to calculate the hash, and the resulting bottom 32 is rather a good spread. > > if (lookup_hash(hash)) > return hash; // already saved > > // The above is the most common case and is quick. > // Especially compared to vma_lookup() and the hash algorithm > > /* Slow but only happens when a new vma is discovered */ > trigger_event_that_maps_hash_to_file_data(hash, vma); > > /* Doesn't happen again for this hash value */ > save_hash(hash); So this basically creates the output of: trace-cmd-1034 [003] ..... 142.197674: <user stack unwind> cookie=300000004 => <000000000008f687> : 0x666220af => <0000000000014560> : 0x88512fee => <000000000001f94a> : 0x88512fee => <000000000001fc9e> : 0x88512fee => <000000000001fcfa> : 0x88512fee => <000000000000ebae> : 0x88512fee => <0000000000029ca8> : 0x666220af trace-cmd-1034 [003] ...1. 142.198063: file_cache: hash=0x666220af path=/usr/lib/x86_64-linux-gnu/libc.so.6 build_id={0x10bddb6d,0xf5234181,0xc2f72e26,0x1aa4f797,0x6aa19eda} trace-cmd-1034 [003] ...1. 142.198093: file_cache: hash=0x88512fee path=/usr/local/bin/trace-cmd build_id={0x3f399e26,0xf9eb2d4d,0x475fa369,0xf5bb7eeb,0x6244ae85} Where the first instances of the vma with the values of 0x666220af and 0x88512fee get printed, but from then on, they are not. That is, from then on, the lookup will return true, and no processing will take place. And periodically, I could clear the hash cache, so that all vmas get printed again. But this would be rate limited to not cause performance issues. -- Steve