Re: [PATCH nf-next,v1 0/6] revisiting nf_tables ruleset validation

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Tue, 20 May 2025 13:29:43 +0200

On Tue, May 20, 2025 at 12:42:50PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > On Thu, May 15, 2025 at 08:03:13AM +0200, Florian Westphal wrote:
[...]
> > > >   netfilter: nf_tables: honor validation state in preparation phase
> > > >   netfilter: nf_tables: add infrastructure for chain validation on updates
> > > >   netfilter: nf_tables: add new binding infrastructure
> > > >   netfilter: nf_tables: use new binding infrastructure
> > > >   netfilter: nf_tables: add support for validating incremental ruleset updates
> > > > 
> > > >  include/net/netfilter/nf_tables.h |  52 +-
> > > >  net/netfilter/nf_tables_api.c     | 800 ++++++++++++++++++++++++++++--
> > > >  net/netfilter/nft_immediate.c     |  25 +-
> > > >  3 files changed, 844 insertions(+), 33 deletions(-)
> > > 
> > > This is a lot of new code but no explanation as to why is given.
> > 
> > These are the results of a test program I made, which is incrementally
> > adding elements to an already populated verdict map with 100k elements.
> > 
> > The ruleset validation shows in the perf callgraph, for each new
> > element that is added:
> > 
> >     55.06%  nft-buffer       [kernel.kallsyms]           [k] nft_chain_validate
> >      7.68%  nft-buffer       [kernel.kallsyms]           [k] nf_tables_commit
> >      7.19%  nft-buffer       [kernel.kallsyms]           [k] nft_table_validate
> >      5.34%  nft-buffer       [kernel.kallsyms]           [k] nft_rhash_walk
> >      2.82%  swapper          [kernel.kallsyms]           [k] mwait_idle_with_hints.constprop.0
> >      1.26%  nft-buffer       [kernel.kallsyms]           [k] __rhashtable_walk_find_next
> >      1.09%  nft-buffer       [kernel.kallsyms]           [k] nft_setelem_validate
> >      0.94%  nft-buffer       [kernel.kallsyms]           [k] rhashtable_walk_next
> >      0.54%  nft-buffer       libnftables.so.1.1.0        [.] nft_parse
> > 
> > This is a test adding 100 new elements including jump to chain in a
> > existing 100k verdict map.
> > 
> > With this approach, there is no need to fully validate the chain
> > graph.
> > 
> > > Does this fix bugs with the existing scheme?
> > 
> > The two initial patches are targetted at fixing minor issues. The
> > remaining patches are new code, they are passing tests/shell with
> > CONFIG_KASAN and CONFIG_KMEMLEAK. I will post v2 but I would like to
> > run more fuzz test on the error path.
> > 
> > > Or is this an optimization? If so, how big is the speedup?
> > 
> > Optimization. After this series, validation does not show up near the
> > top 10; this is the first symbol in the perf call graph:
> > 
> >      0.03%  nft-buffer       [nf_tables]               [k] __nft_chain_validate
> > 
> > and it is far from the top 10.
> > 
> > I can include this information in v2 so they can sit in the mailing
> > list for a while, there is a few bugs in v1 that I have addressed.
> > Phil has spotted one in them.
> > 
> > Moreover, I can move the bindings hashtable to make it per-net to
> > control the maximum number of jump/goto per family. This is a lot
> > larger than Shaun's update, you have to tell me your preference on
> > this.
> 
> I think Shauns approach is better due to backporting reasons.

Yes, it is small.

I could not find a way to extend the existing loop and validation
routines to make it incremental.

> Once the new code is stable enough you can move the jump limitation
> check there.
> 
> Makes sense?

OK.

> > Old binding code can possibly go away, but that requires closer look
> > too.
> 
> Yes, lets keep it for now until there is confidence that the split
> approach catches all issues.
> 
> I suggest to do this by adding a WARN_ON_ONCE() in the old path once
> you think the new code should have prevented cycles in 100% of cases,
> then remove it after e.g. 6-12 months or so.

This is how it works in v2:

On new table, the classic full ruleset validation from basechain is
done.

On existing tables, from the chain that has been updated (either new
rule or new jump/goto to reach it), the incremental routine backtracks
to find the basechain. If chain callstack is too deep, then it falls
back to the classic full ruleset validation from basechains. If it
detects a loop when backtracking, it also falls back to classic full
ruleset validation (it could be a loop in chains that are not
connected to basechain, this is currently allowed). Then, from the
chain that has been updated, call the existing nft_validate_chain()
using the collected basechains that can reach such updated chain. The
nft_validate_chain() function already relies on too deep callstack to
detect loops.

So the backtracking is used to find and collect the basechains that
can reach this chain.

Loop and checking if expression is accepted from this chain (eg.
masquerade from filter chain type) is done in one single pass.

Incremental is a fast path, but in case backtracking cannot collect
basechains, then it fallbacks to classic ruleset validation.

I will look into the WARN_ON_ONCE() you suggest.