Hi!
I'm facing a very difficult to solve situation with nftables
DNAT hooks not working well together with network stack martian
source check. In my situation, I've policy routing (we're moving
from one ISP to another, so (temporarily) use two default routes
in two different routing tables), and a DNAT rules on external
interfaces which redirects certain ports to different internal
hosts.
We've two external ethernet interfaces, let's say eth1 & eth2,
with two externally routed IP addresses, ip1 & ip2, and the
following routing rules (simplified):
from ip1 default via gw1 dev eth1
from ip2 default via gw2 dev eth2
from all default via gw1 dev eth1
And there's also ethINT, with some host having hostINT address.
I'm doing DNAT from ip1:port to hostINT:port, and also
from ip2:port to hostINT:port.
When initial packet arrives, it gets mangled by nftables
DNAT rule first, and only after that, it gets checked for
martian sources. When initial packet comes to ip2:port,
it looks like
extIP => ip2:port
after DNAT, it becomes
extIP => hostINT:port
so info about its original destination IP is lost at this
point.
Now the network stack checks this new packet for martian
sources. It does not match the 2nd rule above (for ip2),
since the packet does not have ip2 in it anymore. So the
kernel "thinks" this packet should've been arrived on
eth1 (according to the 3rd routing rule above), instead of
eth2. And it gets dropped as martian source.
Exactly the same question has been asked multiple times
in the past, for example --
https://serverfault.com/questions/1021798/how-to-redirect-requests-on-port-80-to-localhost3000-using-nftables
it is the same situation but dnat is to 127.0.0.1, which is
special, and the same logic applies.
The question is: why NAT table modifies the packet BEFORE
the kernel checks for martians in it? To me, it would be
much more logical to check for martians first, and only
when it's done, traverse the nat tables. And it looks
absolutely illogical to do it the other way around, as
it is done currently.
Maybe only the mangle netfilter table which can be done
before, for "special" processing, -- and even that is
questionable, since there are applications for both ways.
In order to overcome this current situation, I would need
to disable martians checking (at least on this interface),
and to - basically - implement whole logic based on the
routing also in netfilter rules. Which, given the
complexity of our routing table, is rather an ambitious
task, and it would be twice as complex to support in the
future too (compared to current usage).
Why it's done in this (seemingly illogical) order?
Thanks,
/mjt