Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure

Florian Westphal <fw@xxxxxxxxx> · Tue, 19 Aug 2025 21:10:42 +0200

Florian Westphal <fw@xxxxxxxxx> wrote:
> > For the commit phase, I suggest to add a list of dying elements to the
> > transaction object. After unlinking the element from the (internal)
> > set data structure, add it to this transaction dying list so it
> > remains reachable to be released after the rcu grace period.
> 
> Thats what I meant by 'stick a pointer into struct nft_set_ext'.
> Its awkward but I should be able to get the priv pointer back
> by doing the inverse of nft_set_elem_ext().
> 
> The cleaner solution would be to turn nft_elem_priv into a
> 'nft_elem_common', place a hlist_node into that and then
> use container_of().  But its too much code churn for my
> liking.
> 
> So I'll extend each set element with a pointer and
> add a removed_elements hlist_head to struct nft_trans_elem.
> 
> The transacion id isn't needed I think once that list exist:
> it provides the needed info to undo previous operations
> without the need to walk the set again.
> 
> We can probably even rework struct nft_trans_elem to always use
> this pointer, even for inserts, and only use the 
> 
> struct nft_trans_one_elem       elems[]
> 
> member for elements that we update (no add or removal).
> But thats something for a later time.

This doesn't work.
NEWSETELEM cannot (re)use the same list node as DELSETELEM.

Reason is that a set flush will also flush elements
added in the same batch.

But if NEWSETELEM uses a list (instead of priv pointer
as we do now), then at the time of the set flush, the
encountered element is already on a NEWSETELEM trans_elem list.

I'll try doing:

 struct nft_set_ext {
        u8      genmask;
        u8      offset[NFT_SET_EXT_NUM];
+       struct llist_node trans_list_new;
+       struct llist_node trans_list_del;
        char    data[];

to avoid this problem.