On Thu, May 29, 2025 at 7:43 AM Shaun Brady <brady.1345@xxxxxxxxx> wrote: > > On Wed, May 28, 2025 at 05:03:56PM +0800, Yafang Shao wrote: > > diff --git a/net/netfilter/nf_conntrack_core.c > > b/net/netfilter/nf_conntrack_core.c > > index 7bee5bd22be2..3481e9d333b0 100644 > > --- a/net/netfilter/nf_conntrack_core.c > > +++ b/net/netfilter/nf_conntrack_core.c > > @@ -1245,9 +1245,9 @@ __nf_conntrack_confirm(struct sk_buff *skb) > > > > chainlen = 0; > > hlist_nulls_for_each_entry(h, n, > > &nf_conntrack_hash[reply_hash], hnnode) { > > - if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, > > - zone, net)) > > - goto out; > > + //if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, > > + // zone, net)) > > + // goto out; > > if (chainlen++ > max_chainlen) { > > chaintoolong: > > NF_CT_STAT_INC(net, chaintoolong); > > Forgive me for jumping in with very little information, but on a hunch I > tried something. I applied the above patch to another bug I've been > investigating: > > https://bugzilla.netfilter.org/show_bug.cgi?id=1795 > and Ubuntu reference > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2109889 > > The Ubuntu reproduction steps where easier to follow, so I mimicked > them: > > # cat add_ip.sh > ip addr add 10.0.1.200/24 dev enp1s0 > # cat nft.sh > nft -f - <<EOF > table ip dnat-test { > chain prerouting { > type nat hook prerouting priority dstnat; policy accept; > ip daddr 10.0.1.200 udp dport 1234 counter dnat to 10.0.1.180:1234 > } > } > EOF > # cat listen.sh > echo pong|nc -l -u 10.0.1.180 1234 > # ./add_ip.sh ; ./nft.sh ; listen.sh (and then just ./listen.sh again) > > On a client machine I ran: > $ echo ping|nc -u -p 4321 10.0.1.200 1234 > $ echo ping|nc -u -p 4321 10.0.1.180 1234 > > And sure enough the listen.sh never completes (demonstrates the bug). > > When I apply the above patch, the problem goes away. > > What I _also_ was able to do to make the problem go away was to apply > the following patch: > > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c > index aad84aabd7f1..fecf5591f424 100644 > --- a/net/netfilter/nf_nat_core.c > +++ b/net/netfilter/nf_nat_core.c > @@ -727,7 +727,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, > !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { > /* try the original tuple first */ > if (nf_in_range(orig_tuple, range)) { > - if (!nf_nat_used_tuple_new(orig_tuple, ct)) { > + if (!nf_nat_used_tuple(orig_tuple, ct)) { > *tuple = *orig_tuple; > return; > } > > This was suggested to me by the bug report. I had not brought this up > yet, as I had little understanding of why and what else was broken by > reverting to nf_nat_used_tuple from _new. > > I thought that both patches fix the problem might be of interest. I'll > keep digging in to my understanding..... Could you please extract and share the /proc/net/nf_conntrack entries for the affected IP address? -- Regards Yafang