On Thu, Mar 27, 2025 at 07:12:29AM -0400, Dan Winship wrote: > On 3/26/25 11:56, Phil Sutter wrote: > > The suggested 'flush ruleset' stems from Fedora's nftables.service and > > is also present in CentOS Stream and RHEL. So anyone running k8s there > > either doesn't use nftables.service (likely, firewalld is default) or > > doesn't restart the service. Maybe k8s should "officially" conflict with > > nftables and iptables services? > > (It's weird that nftables.service is part of the nftables package, when > with iptables it was in a separate package, iptables-services? But > that's not a discussion for this mailing list...) > > >> (If the nftables "owner" flag thwarts "flush ruleset", then that's > >> definitely *better*, though that flag is still too new to help very much.) > > > > Yes, "owned" tables may only be manipulated by their owner. Firewalld > > will use it as well, for the same reason as k8s. > > So in the long run, this solves my problem, even if static firewalls are > using "flush ruleset". > > >> Once upon a time, it was reasonable for the system firewall scripts to > >> assume that they were the only users of netfilter on the system, but > >> that is not the world we live in any more. Sure, *most* Linux users > >> aren't running Kubernetes, but many people run hypervisors, or > >> docker/podman, or other things that create a handful of dynamic > >> iptables/nftables rules, and then expect those rules to not suddenly > >> disappear for no apparent reason later. > > > > The question is whether the nftables and iptables services are meant for > > the world we live in now. > > If they're not, then distros shouldn't install them by default. Having > them installed on the system (or provided as an example in the nftables > sources) suggests to admins that it's reasonable to use them. (And > having nftables.service use "flush ruleset" suggests to admins that > that's a reasonable command for them to run when they are building their > own things based on our examples.) We're drifting into downstream details here, but I agree that we should have nftables-service(s) RPM which is not installed by default. > > At least with iptables, it is very hard not to > > stomp on others' feet when restarting. > > Sure, there's nothing that can be done to improve the situation with > iptables. It just doesn't have the features needed to support multiple > users well. But nftables does. That's the whole point of multiple tables > isn't it? Probably, yes. Tables only separate name spaces, though. The actual merge points are the netfilter hooks and tables don't matter there. The problem of rule ordering in a builtin iptables chain has become a problem of base chain ordering in a hook. Eventually rules are serialized and since one can't undo an earlier drop/reject, there's still room for conflicts. Real concurrent use therefore requires a mediating agent like firewalld and a more abstract language than "accept this, drop that". > > With nftables, we could cache the > > 'add table' commands for use later when stopping the service. There is > > margin for error though since the added table may well exist already. > > I was thinking more like, the service documents that all of your rules > have to be in the table 'firewall', and while it may not actually > *prevent* you from setting up rules in other tables, it doesn't make any > effort to make that work either: > > ExecStart=/sbin/nft 'destroy table firewall; add table firewall; include > "/etc/sysconfig/nftables.conf";' > ExecReload=/sbin/nft 'destroy table firewall; add table firewall; > include "/etc/sysconfig/nftables.conf";' > ExecStop=/sbin/nft destroy table firewall Since tables are bound to an IP version (or "inet"), a single table may or may not suffice for users. Apart from that, one may even do: | ExecStart=nft 'add table firewall { include "/etc/sysconfig/nftables.conf"; }' One can't dump the current ruleset into that file anymore, though. Anyway, I think we're playing hide'n'seek here: Even if nftables service sticks to a given (set of) table(s), base chains may easily break k8s. Marking the two as conflicting with systemd is a better choice IMO. Cheers, Phil