On Fri, Jul 25, 2025 at 01:15:27PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > I think the key is to be able to identify what elements have been > > flushed by what flush command, so abort path can just restore/undo the > > state for the given elements. > > > > Because this also is possible: > > > > flush set x + [...] + flush set x > > > > And [...] includes possible new/delete elements in x. > > > > It should be possible to store an flush command id in the set element > > (this increases the memory consumption of the set element, which your > > series already does it) to identify what flush command has deleted it. > > This is needed because the transaction object won't be in place but I > > think it is a fair tradeoff. The flush command id can be incremental > > in the batch (the netlink sequence number cannot be used for this > > purpose). > > OK, that might work. So the idea is to do the set walk as-is, but > instead of allocating a NFT_MSG_DELSETELEM for each transaction > object, introduce NFT_MSG_FLUSHSET transaction. Or simply using NFT_MSG_DELSET and add a flag to note this is a flush. > Then, for a DELSETELEM request with no elements (== flush), > allocate *one* NFT_MSG_FLUSHSET transaction. Yes. > The NFT_MSG_FLUSHSET transaction holds the set being flushed > and an id, that increments sequentially once for each flush. Yes. > Then, do the walk as-is: > > static int nft_setelem_flush(const struct nft_ctx *ctx, > struct nft_set *set, > const struct nft_set_iter *iter, > struct nft_elem_priv *elem_priv) > { > const struct nft_set_ext *ext = nft_set_elem_ext(set, elem_priv); > struct nft_trans *trans; > > /* previous delsetelem or erlier flush marked it inactive */ > if (!nft_set_elem_active(ext, iter->genmask)) > return 0; > > /* No allocation per set elemenet anymore */ > /* trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM, */ > > /* trans_flush could be obtained from the tail of > * the transaction list. Or placed in *iter. > */ > elem_priv->flush_id = trans_flush->flush_id > set->ops->flush(ctx->net, set, elem_priv); > set->ndeact++; > > nft_setelem_data_deactivate(ctx->net, set, elem_priv); > > return 0; Maybe use nft_map_deactivate() ? > } > > On abort, NFT_MSG_FLUSHSET would do another walk, for all set elements, > and then calls nft_setelem_data_activate/nft_setelem_activate in case > elem_priv->flush_id == trans_flush->flush_id. Exactly, maybe nft_map_activate() can help. > Did I get that right? I don't see any major issues with this, except > the need to add u32 flush_id to struct nft_elem_priv. > Or perhaps struct nft_set_ext would be a better fit as nft_elem_priv is > just a proxy. Yes, u32 flush_id (or trans_id) needs to be added, then set transaction id incrementally. > > Of course, this needs careful look, but if the set element can be used > > to annotate the information that allows us to restore to previous > > state before flush (given the transaction object is not used anymore), > > then it should work. Your series is extending the set element size for > > a different purpose, so I think the extra memory should not be an > > issue. > > Agree, it would need 4 bytes per elem which isn't much compared to the > transaction log savings. > > Will you have a look or should I have a go at this? Please, go for it.