On Tue, Sep 02, 2025 at 08:51:37PM +0200, René Scharfe wrote: > > I suspect it won't make a big time difference. The old code should have > > been optimized down to a single word load, and now we have two word > > loads and an xor. That probably isn't important compared to the actual > > 5-word memcmp() we have to do in order to verify that we found the right > > bucket anyway. > > I see slightly worse performance, but within the noise. > > However, just stacking two words won't do if only a few bits of the > resulting hash will be used to find a bucket. We could mix in more bits > and smear them all over, but if that's done by a deterministic function > then it could be applied during the construction of manipulated object > hash values as well, no? I think the difficulty in manipulating scales as the number of bits increases. So yeah, if you are worried about the low 8 bits, then XOR-ing in another 8 bits is not going to do much. But your table is only 256 items long, so you don't care much either way. At even 16 bits, it gets hard for the attacker to choose the low 16 bits _and_ the low 16 bits of the next word (you mentioned a project earlier which claims 28 bits). If you XOR in a third word, now your 16-bit hash is using 48 bits that the attacker has to control. And so on. > Perhaps salting with a random value determined at runtime would help. > Not XORing it in (pointless if the other value is controlled by the > attacker, as the result would still collide), but using it as a mask to > choose the bits to take from the object hash? I think that would work, but XOR-ing the higher order bits is easier to do and I think produces a similar effect. Let's shrink the problem for a second. Imagine sha1 was 16 bits, and we wanted to create an 8-bit hash to use in our table. The attacker creates two objects with binary hashes: object a: 10111001 11110111 object b: 01001000 11110111 They collide in the lower 8 bits, but we don't want them to. In your scheme, as I understand it, we'd come up with a 16-bit mask that has exactly 8 bits set, like: 11010110 01011000 and then picking only the bits where the mask is "1", we get: object a: 10111001 11110111 mask: 11010110 01011000 hash a: 10100110 object b: 01001000 11110111 mask: 11010110 01011000 hash b: 01000110 So I agree that is hard to foil without the attacker knowing which bits you'll pick. You've made their job 8 bits harder, because they now have to control all 16 bits to get their collision. But if we instead XOR the words of the object hashes together, we get: object a[hi]: 10111001 object a[lo]: 11110111 hash a: 01001110 object b[hi]: 01001000 object b[lo]: 11110111 hash b: 10111111 So you're flipping bits "randomly". It's not truly random, but is coming from the rest of the hash the attacker provided. But for any bit they want to control, they have to control that position in both words. So they're back to needing to control all 16 bits to get their desired hash. And as somebody who just hand-computed those answers, I can tell you that the XOR one is much simpler to do. ;) -Peff