Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > I think the key is to be able to identify what elements have been > flushed by what flush command, so abort path can just restore/undo the > state for the given elements. > > Because this also is possible: > > flush set x + [...] + flush set x > > And [...] includes possible new/delete elements in x. > > It should be possible to store an flush command id in the set element > (this increases the memory consumption of the set element, which your > series already does it) to identify what flush command has deleted it. > This is needed because the transaction object won't be in place but I > think it is a fair tradeoff. The flush command id can be incremental > in the batch (the netlink sequence number cannot be used for this > purpose). OK, that might work. So the idea is to do the set walk as-is, but instead of allocating a NFT_MSG_DELSETELEM for each transaction object, introduce NFT_MSG_FLUSHSET transaction. Then, for a DELSETELEM request with no elements (== flush), allocate *one* NFT_MSG_FLUSHSET transaction. The NFT_MSG_FLUSHSET transaction holds the set being flushed and an id, that increments sequentially once for each flush. Then, do the walk as-is: static int nft_setelem_flush(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_set_iter *iter, struct nft_elem_priv *elem_priv) { const struct nft_set_ext *ext = nft_set_elem_ext(set, elem_priv); struct nft_trans *trans; /* previous delsetelem or erlier flush marked it inactive */ if (!nft_set_elem_active(ext, iter->genmask)) return 0; /* No allocation per set elemenet anymore */ /* trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM, */ /* trans_flush could be obtained from the tail of * the transaction list. Or placed in *iter. */ elem_priv->flush_id = trans_flush->flush_id set->ops->flush(ctx->net, set, elem_priv); set->ndeact++; nft_setelem_data_deactivate(ctx->net, set, elem_priv); return 0; } On abort, NFT_MSG_FLUSHSET would do another walk, for all set elements, and then calls nft_setelem_data_activate/nft_setelem_activate in case elem_priv->flush_id == trans_flush->flush_id. Did I get that right? I don't see any major issues with this, except the need to add u32 flush_id to struct nft_elem_priv. Or perhaps struct nft_set_ext would be a better fit as nft_elem_priv is just a proxy. > Of course, this needs careful look, but if the set element can be used > to annotate the information that allows us to restore to previous > state before flush (given the transaction object is not used anymore), > then it should work. Your series is extending the set element size for > a different purpose, so I think the extra memory should not be an > issue. Agree, it would need 4 bytes per elem which isn't much compared to the transaction log savings. Will you have a look or should I have a go at this?