Hello, We recently encountered an SNAT-related issue in our Kubernetes environment and have successfully reproduced it with the following configuration: kernel -------- Our kernel is 6.1.y (also reproduced on 6.14) Host Network Configuration: -------------------------------------- We run a DNS proxy on our Kubernetes servers with the following iptables rules: -A PREROUTING -d 169.254.1.2/32 -j DNS-DNAT -A DNS-DNAT -d 169.254.1.2/32 -i eth0 -j RETURN -A DNS-DNAT -d 169.254.1.2/32 -i eth1 -j RETURN -A DNS-DNAT -d 169.254.1.2/32 -i bond0 -j RETURN -A DNS-DNAT -j DNAT --to-destination 127.0.0.1 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A POSTROUTING -j KUBE-POSTROUTING -A KUBE-POSTROUTING -m mark --mark 0x4000/0x4000 -j MASQUERADE Container Network Configuration: -------------------------------------------- Containers use 169.254.1.2 as their DNS resolver: $ cat /etc/resolve.conf nameserver 169.254.1.2 Issue Description ------------------------ When performing DNS lookups from a container, the query fails with an unexpected source port: $ dig +short @169.254.1.2 A www.google.com ;; reply from unexpected source: 169.254.1.2#123, expected 169.254.1.2#53 The tcpdump is as follows, 16:47:23.441705 veth9cffd2a4 P IP 10.242.249.78.37562 > 169.254.1.2.53: 298+ [1au] A? www.google.com. (55) 16:47:23.441705 bridge0 In IP 10.242.249.78.37562 > 127.0.0.1.53: 298+ [1au] A? www.google.com. (55) 16:47:23.441856 bridge0 Out IP 169.254.1.2.53 > 10.242.249.78.37562: 298 1/0/1 A 142.250.71.228 (59) 16:47:23.441863 bond0 Out IP 169.254.1.2.53 > 10.242.249.78.37562: 298 1/0/1 A 142.250.71.228 (59) 16:47:23.441867 eth1 Out IP 169.254.1.2.53 > 10.242.249.78.37562: 298 1/0/1 A 142.250.71.228 (59) 16:47:23.441885 eth1 P IP 169.254.1.2.53 > 10.242.249.78.37562: 298 1/0/1 A 142.250.71.228 (59) 16:47:23.441885 bond0 P IP 169.254.1.2.53 > 10.242.249.78.37562: 298 1/0/1 A 142.250.71.228 (59) 16:47:23.441916 veth9cffd2a4 Out IP 169.254.1.2.124 > 10.242.249.78.37562: UDP, length 59 The DNS response port is unexpectedly changed from 53 to 124, causing the application can't receive the response. We suspected the issue might be related to commit d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash"). After applying this commit, the port remapping no longer occurs, but the DNS response is still dropped. 16:52:00.968814 veth9cffd2a4 P IP 10.242.249.78.54482 > 169.254.1.2.53: 15035+ [1au] A? www.google.com. (55) 16:52:00.968814 bridge0 In IP 10.242.249.78.54482 > 127.0.0.1.53: 15035+ [1au] A? www.google.com. (55) 16:52:00.996661 bridge0 Out IP 169.254.1.2.53 > 10.242.249.78.54482: 15035 1/0/1 A 142.250.198.100 (59) 16:52:00.996664 bond0 Out IP 169.254.1.2.53 > 10.242.249.78.54482: 15035 1/0/1 A 142.250.198.100 (59) 16:52:00.996665 eth0 Out IP 169.254.1.2.53 > 10.242.249.78.54482: 15035 1/0/1 A 142.250.198.100 (59) 16:52:00.996682 eth0 P IP 169.254.1.2.53 > 10.242.249.78.54482: 15035 1/0/1 A 142.250.198.100 (59) 16:52:00.996682 bond0 P IP 169.254.1.2.53 > 10.242.249.78.54482: 15035 1/0/1 A 142.250.198.100 (59) The response is now correctly sent to port 53, but it is dropped in __nf_conntrack_confirm(). We bypassed the issue by modifying __nf_conntrack_confirm() to skip the conflicting conntrack entry check: diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 7bee5bd22be2..3481e9d333b0 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1245,9 +1245,9 @@ __nf_conntrack_confirm(struct sk_buff *skb) chainlen = 0; hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode) { - if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, - zone, net)) - goto out; + //if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, + // zone, net)) + // goto out; if (chainlen++ > max_chainlen) { chaintoolong: NF_CT_STAT_INC(net, chaintoolong); DNS resolution now works as expected. $ dig +short @169.254.1.2 A www.google.com 142.250.198.100 The tcpdump is as follows, 16:54:43.618509 veth9cffd2a4 P IP 10.242.249.78.56805 > 169.254.1.2.53: 7503+ [1au] A? www.google.com. (55) 16:54:43.618509 bridge0 In IP 10.242.249.78.56805 > 127.0.0.1.53: 7503+ [1au] A? www.google.com. (55) 16:54:43.618666 bridge0 Out IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) 16:54:43.618677 bond0 Out IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) 16:54:43.618683 eth1 Out IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) 16:54:43.618700 eth1 P IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) 16:54:43.618700 bond0 P IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) 16:54:43.618765 veth9cffd2a4 Out IP 169.254.1.2.53 > 10.242.249.78.56805: 7503 1/0/1 A 142.250.198.100 (59) The issue remains present in kernel 6.14 as well. Since we are not deeply familiar with NAT behavior, we would appreciate guidance on a proper fix or any further debugging. -- Regards Yafang