Re: [nf-next RFC] netfilter: nf_tables: Feature ifname-based hook registration

Phil Sutter <phil@xxxxxx> · Thu, 3 Jul 2025 14:37:50 +0200

On Thu, Jul 03, 2025 at 02:09:36PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > > Oh, right. So a decision whether this is feasible and if, how it should
> > > behave in detail, is urgent.
> > 
> > Downside is that this flag adds more complexity, since there will be
> > two paths to test (flag on/off).
> 
> Right.
> 
> > Another concern is having a lot of devices, this is now interating
> > linearly performing strcmp() to find matches from the control plane
> > (ie. maybe this slow down time to load ruleset?), IIRC you mentioned
> > this should not be an issue.
> 
> Can you suggest an alternative?
> 
> I see the following:
> 
> - revert to old behaviour (no search,
>   lookup-by-name), and introduce a new netlink attribute for the 'wildcard
>   name / full-search'.
>   Since thats a big change this requires a revert in nf.git and then a
>   followup change in nf-next to amend this.
> 
> - Only search if we have a wildcard.
>   It should be enough to check, from nft_netdev_hook_alloc, if hook->ifname
>   is null-terminated or not. If it is, lookup-by-name, else
>   for_each_netdev search.
> 
>   Thats assuming that the netlink attributes are encoded as
>   'eth\0' (4 bytes, no wildcard), vs 'eth' (3 bytes, wildcard).

Yes, that's how "wildcards" are implemented. (Hence why a simple
strncmp() is sufficient.)  When Pablo asked about it, I also realized I
could keep the lookup-by-name unless manual search was needed. I even
implemented it but surprisingly couldn't measure a difference. Quoting
myself from another mail here:

| > > Quick follow-up here: To test the above, I created many dummy NICs with
| > > same prefix and timed creation of a chain with matching wildcard hook
| > > spec. Failing to measure a significant delay, I increased the number of
| > > those NICs. Currently I have 10k matching and 10k non-matching NICs and
| > > still can't measure a slowdown creating that wildcard chain, even with
| > > 'nft monitor' running in another terminal which reports the added
| > > interfaces (that's code I haven't submitted yet).
| >
| > Are you sure you see no spike in perf record for nft_hook_alloc()?
| 
| I don't, flowtable creation is really fast (0.2s with 1k matching NICs),
| but deleting the same flowtable then takes about 60s. The perf report
| shows nf_flow_offload_xdp_setup() up high, I suspect the nested loops in
| nf_flowtable_by_dev_remove() to be the culprit.
| 
| Testing a netdev chain instead, both creation and deletion are almost
| instant (0.16s and 0.26s).

> > > Yes, that was the premise upon which I wrote the patch. I didn't intend
> > > to make the flag toggle between the old interface hooks and the new
> > > interface name hooks.
> > 
> > Mistyped name is another scenario this flag could help.
> 
> Regardless of this flag patch one could update nftables userspace to
> display hints like we do for sets with the new '# count 42' comment
> annotation.
> 
> Something that tells that the hook is subscribed for eth42 but currently
> not active.
> 
> Same with flowtables, something that tells which devices are configured
> (subscribed) and which devices are used (should likely still display
>  ppp* and not list 4000k ppp1234 :-) ).
> 
> Phil, whats your take here?

As suggested in my other reply, I could implement GETDEV request so
nftables may print either:

| flowtable f {
| 	hook ingress priority filter
| 	devices = { eth* (active: eth0, eth1), test1 (inactive) }
| }

or:

| flowtable f {
| 	hook ingress priority filter
| 	devices = { eth*, test1 } # active: eth0, eth1
| }

The NEWDEV/DELDEV notifications the kernel currently emits will be
printed by 'nft monitor'. This is still useful despite the above to
notify user space if a device is bound/unbound at run-time.

Cheers, Phil