Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure

Florian Westphal <fw@xxxxxxxxx> · Wed, 30 Jul 2025 18:35:20 +0200

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > Yes, that part works, but we still need to kfree the elements after unlink.
> > 
> > When commit phase does the unlink, the element becomes unreachable from
> > the set.  At this time, the DELSETELEM object keeps a pointer to the
> > unlinked elements, and that allows us to kfree after synchronize_rcu
> > from the worker.  If we don't want DELSETELEM for flush, we need to
> > provide the address to free by other means, e.g. stick a pointer into
> > struct nft_set_ext.
> 
> For the commit phase, I suggest to add a list of dying elements to the
> transaction object. After unlinking the element from the (internal)
> set data structure, add it to this transaction dying list so it
> remains reachable to be released after the rcu grace period.

Thats what I meant by 'stick a pointer into struct nft_set_ext'.
Its awkward but I should be able to get the priv pointer back
by doing the inverse of nft_set_elem_ext().

The cleaner solution would be to turn nft_elem_priv into a
'nft_elem_common', place a hlist_node into that and then
use container_of().  But its too much code churn for my
liking.

So I'll extend each set element with a pointer and
add a removed_elements hlist_head to struct nft_trans_elem.

The transacion id isn't needed I think once that list exist:
it provides the needed info to undo previous operations
without the need to walk the set again.

We can probably even rework struct nft_trans_elem to always use
this pointer, even for inserts, and only use the 

struct nft_trans_one_elem       elems[]

member for elements that we update (no add or removal).
But thats something for a later time.