On Fri, May 30, 2025 at 6:34 PM Florian Westphal <fw@xxxxxxxxx> wrote: > > The logic added in the blamed commit was supposed to only omit nat source > port allocation if neither the existing nor the new entry are subject to > NAT. > > However, its not enough to lookup the conntrack based on the proposed > tuple, we must also check the reverse direction. > > Otherwise there are esoteric cases where the collision is in the reverse > direction because that colliding connection has a port rewrite, but the > new entry doesn't. In this case, we only check the new entry and then > erronously conclude that no clash exists anymore. > > The existing (udp) tuple is: > a:p -> b:P, with nat translation to s:P, i.e. pure daddr rewrite, > reverse tuple in conntrack table is s:P -> a:p. > > When another UDP packet is sent directly to s, i.e. a:p->s:P, this is > correctly detected as a colliding entry: tuple is taken by existing reply > tuple in reverse direction. > > But the colliding conntrack is only searched for with unreversed > direction, and we can't find such entry matching a:p->s:P. > > The incorrect conclusion is that the clashing entry has timed out and > that no port address translation is required. > > Such conntrack will then be discarded at nf_confirm time because the > proposed reverse direction clashes with an existing mapping in the > conntrack table. > > Search for the reverse tuple too, this will then check the NAT bits of > the colliding entry and triggers port reallocation. > > Followp patch extends nft_nat.sh selftest to cover this scenario. > > The IPS_SEQ_ADJUST change is also a bug fix: > Instead of checking for SEQ_ADJ this tested for SEEN_REPLY and ASSURED > by accident -- _BIT is only for use with the test_bit() API. > > This bug has little consequence in practice, because the sequence number > adjustments are only useful for TCP which doesn't support clash resolution. > > The existing test case (conntrack_clash_clash.sh) exercise a race > condition path (parallel conntrack creation on different CPUs), so > the colliding entries have neither SEEN_REPLY nor ASSURED set. > > Thanks to Yafang Shao and Shaun Brady for an initial investigation > of this bug. > > Fixes: d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash") > Reported-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Reported-by: Shaun Brady <brady.1345@xxxxxxxxx> > Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1795 > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Tested-by: Yafang Shao <laoar.shao@xxxxxxxxx> > --- > net/netfilter/nf_nat_core.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c > index aad84aabd7f1..f391cd267922 100644 > --- a/net/netfilter/nf_nat_core.c > +++ b/net/netfilter/nf_nat_core.c > @@ -248,7 +248,7 @@ static noinline bool > nf_nat_used_tuple_new(const struct nf_conntrack_tuple *tuple, > const struct nf_conn *ignored_ct) > { > - static const unsigned long uses_nat = IPS_NAT_MASK | IPS_SEQ_ADJUST_BIT; > + static const unsigned long uses_nat = IPS_NAT_MASK | IPS_SEQ_ADJUST; > const struct nf_conntrack_tuple_hash *thash; > const struct nf_conntrack_zone *zone; > struct nf_conn *ct; > @@ -287,8 +287,14 @@ nf_nat_used_tuple_new(const struct nf_conntrack_tuple *tuple, > zone = nf_ct_zone(ignored_ct); > > thash = nf_conntrack_find_get(net, zone, tuple); > - if (unlikely(!thash)) /* clashing entry went away */ > - return false; > + if (unlikely(!thash)) { > + struct nf_conntrack_tuple reply; > + > + nf_ct_invert_tuple(&reply, tuple); > + thash = nf_conntrack_find_get(net, zone, &reply); > + if (!thash) /* clashing entry went away */ > + return false; > + } > > ct = nf_ct_tuplehash_to_ctrack(thash); > > -- > 2.49.0 > -- Regards Yafang