Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure

Florian Westphal <fw@xxxxxxxxx> · Mon, 28 Jul 2025 23:28:50 +0200

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> Yes, u32 flush_id (or trans_id) needs to be added, then set
> transaction id incrementally.

Not enough, unfortunately.

The key difference between flush (delete all elements) and delset
(remove the set and all elements) is that the set itself gets detached
from the dataplane.  Then, when elements get free'd, we can just iterate
the set and free all elements, they are all unreachable from the
dataplane.

But in case of a flush, thats not the case, releasing the elements will
cause use-after-free.  Current DELSETELEM method unlinks the elements
from the set, links them to the DELSETELEM transactional container.

Then, on abort they get re-linked to the set, or, in case of commit,
they can be free'd after the final synchronize_rcu().

That leaves two options:
1.  Use the first patchset, that makes delsetelem allocations sleepable.
2.  Add a pointer + and id to nft_set_ext.

The drawback of 2) is the added mem cost for every set eleemnt (first
patch series only forces it for rhashtable).

The major upside however is that DELSETELEM transaction objects are
simplified a lot, the to-be-deleted elements could be linked to it by
the then-always-available nft_set_ext pointer, i.e., each DELSETELEM
transaction object can take an arbitrary number of elements.

Unless you disagree, I will go for 2).
This will also allow to remove the krealloc() compaction for DELSETELEM,
so it should be a net code-removal patch.

Another option might be to replace a flush with delset+newset
internally, but this will get tricky because the set/map still being
referenced by other rules, we'd have to fixup the ruleset internally to
use the new/empty set while still being able to roll back.

Proably too tricky / hard to get right, but I'll check it anyway.