Le 24/06/2025 à 17:27, Eugene Crosser a écrit : > On 20/06/2025 18:20, Nicolas Dichtel wrote: > >>>>> It is possible, and very useful, to implement "two-stage routing" by >>>>> installing a route that points to a VRF device: >>>>> >>>>> ip link add vrfNNN type vrf table NNN >>>>> ... >>>>> ip route add xxxxx/yy dev vrfNNN >>>>> >>>>> however this causes surprising behaviour with relation to netfilter >>>>> hooks. Namely, packets taking such path traverse _output_ nftables >>>>> chain, with conntracking information reset. So, for example, even >>>>> when "notrack" has been set in the prerouting chain, conntrack entries >>>>> will still be created. Script attached below demonstrates this behaviour. >>>> You can have a look to this commit to better understand this: >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c9c296adfae9 >>> >>> I've seen this commit. >>> My point is that the packets are _not locally generated_ in this case, >>> so it seems wrong to pass them to the _output_ hook, doesn't it? >> They are, from the POV of the vrf. The first route sends packets to the vrf >> device, which acts like a loopback. > > I see, this explains the behaviour that I observe. > I believe that there are two problems here though: > > 1. This behaviour is _surprising_. Packets are not really "locally > generated", they come from "outside", but treated as is they were > locally generated. In my view, it deserves an section in > Documentation/networking/vrf.rst (see suggestion below). > > 2. Using "output" hook makes it impossible(?) to define different > nftables rules depending on what vrf was used for routing (because iif > is not accessible in the "output" chain). For example, traffic from > different tenants, that is routed via different VRFs but egress over the > same uplink interface, cannot be assigned different zones. Conntrack > entries of different tenants will be mixed. As another example, one > cannot disable conntracking of tenant's traffic while continuing to > track "true output" traffic from he processes running on the host. > Sorry for the late reply. I'll let netfiler/vrf experts answer these points. > Thanks for consideration, > > Eugene > > ======================== > Suggested update to the documentation: You can send a formal patch for this. Regards, Nicolas > > diff --git a/Documentation/networking/vrf.rst > b/Documentation/networking/vrf.rst > index 0a9a6f968cb9..74c6a69355df 100644 > --- a/Documentation/networking/vrf.rst > +++ b/Documentation/networking/vrf.rst > @@ -61,6 +61,11 @@ domain as a whole. > the VRF device. For egress POSTROUTING and OUTPUT rules can be > written > using either the VRF device or real egress device. > > +.. [3] When a packet is forwarded to a VRF interface, it gets further > + routed according to the route table associated with the VRF, but > + processed by the "output" netfilter hook instead of "forwarding" > + hook. > + > Setup > ----- > 1. VRF device is created with an association to a FIB table.