Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > > This can also be achieved with RCU: Each reader of the private pointer > > > will be with in an RCU read section. The new pointer will be published > > > with rcu_assign_pointer() and synchronize_rcu() is used to wait until > > > each reader left its critical section. > > > > Note we had this before and it was reverted in > > d3d40f237480 ("Revert "netfilter: x_tables: Switch synchronization to RCU"") > > > > I'm not saying its wrong, but I think you need a plan b when the same > > complaints wrt table replace slowdown come in. > > > > And unfortunately I can't find a solution for this, unless we keep > > either the existing wait-scheme for counters sake or we accept > > that some counter update might be lost between copy to userland and > > destruction (it would need to use rcu_work or similar, the xt > > target/module destroy callbacks can sleep). > > Urgh. Is this fast & frequent update a real-world thing or a benchmark > of some sort? I mean adding rule after rule is possible but… No idea, its certainly bad design (iptables-restore exists for a reason). > I used here synchronize_rcu() and there is also > synchronize_rcu_expedited() but I do hate it. With everything. Agreed. > What are the counters used in userland for? I've seen that they are > copied but did not understood why. > iptables-legacy -A INPUT -j ACCEPT > ends up in xt_replace_table() but iptables-nft doesn't. Different > interface, better design? Or I just used legacy and now it is considered > as the only? iptables-nft uses the nftables netlink API, I don't think it suffers from the preempt_rt issues you're resolving in the setsockopt api. It won't go anywhere near xt_replace_table and it doesn't use the xtables binary format on the user/kernel api side. > I see two invocations on iptables-legacy-restore. > > But the question remains: Why copy counters after replacing the rule > set? Good question. What I can gather from https://git.netfilter.org/iptables/tree/libiptc/libiptc.c#n2597 After replace we get copy of the old counters, depending on mode we can force-update counters post-replace in kernel, via ret = setsockopt(handle->sockfd, TC_IPPROTO, SO_SET_ADD_COUNTERS, newcounters, counterlen); I suspect this is whats used by 'iptables -L -v -Z INPUT'. But I don't know if anyone uses this in practice. Maybe Pablo remembers the details, this is ancient code by kernel standards. I think we might get away with losing counters on the update side, i.e. rcu_update_pointer() so new CPUs won't find the old binary blob, copy the counters, then defer destruction via rcu_work or similar and accept that the counters might have marginally changed after the copy to userland and before last cpu left its rcu critical section.