On Wed, May 28, 2025 at 05:03:56PM +0800, Yafang Shao wrote: > diff --git a/net/netfilter/nf_conntrack_core.c > b/net/netfilter/nf_conntrack_core.c > index 7bee5bd22be2..3481e9d333b0 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -1245,9 +1245,9 @@ __nf_conntrack_confirm(struct sk_buff *skb) > > chainlen = 0; > hlist_nulls_for_each_entry(h, n, > &nf_conntrack_hash[reply_hash], hnnode) { > - if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, > - zone, net)) > - goto out; > + //if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, > + // zone, net)) > + // goto out; > if (chainlen++ > max_chainlen) { > chaintoolong: > NF_CT_STAT_INC(net, chaintoolong); Forgive me for jumping in with very little information, but on a hunch I tried something. I applied the above patch to another bug I've been investigating: https://bugzilla.netfilter.org/show_bug.cgi?id=1795 and Ubuntu reference https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2109889 The Ubuntu reproduction steps where easier to follow, so I mimicked them: # cat add_ip.sh ip addr add 10.0.1.200/24 dev enp1s0 # cat nft.sh nft -f - <<EOF table ip dnat-test { chain prerouting { type nat hook prerouting priority dstnat; policy accept; ip daddr 10.0.1.200 udp dport 1234 counter dnat to 10.0.1.180:1234 } } EOF # cat listen.sh echo pong|nc -l -u 10.0.1.180 1234 # ./add_ip.sh ; ./nft.sh ; listen.sh (and then just ./listen.sh again) On a client machine I ran: $ echo ping|nc -u -p 4321 10.0.1.200 1234 $ echo ping|nc -u -p 4321 10.0.1.180 1234 And sure enough the listen.sh never completes (demonstrates the bug). When I apply the above patch, the problem goes away. What I _also_ was able to do to make the problem go away was to apply the following patch: diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index aad84aabd7f1..fecf5591f424 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -727,7 +727,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { /* try the original tuple first */ if (nf_in_range(orig_tuple, range)) { - if (!nf_nat_used_tuple_new(orig_tuple, ct)) { + if (!nf_nat_used_tuple(orig_tuple, ct)) { *tuple = *orig_tuple; return; } This was suggested to me by the bug report. I had not brought this up yet, as I had little understanding of why and what else was broken by reverting to nf_nat_used_tuple from _new. I thought that both patches fix the problem might be of interest. I'll keep digging in to my understanding..... SB