Re: [PATCH V3] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl

Florian Westphal <fw@xxxxxxxxx> · Wed, 9 Apr 2025 11:42:06 +0200

lvxiafei <xiafei_xupt@xxxxxxx> wrote:
> Florian Westphal <fw@xxxxxxxxx> wrote:
> > Whats the function of nf_conntrack_max?
> > After this change its always 0?
> 
> nf_conntrack_max is a global (ancestor) limit, by default
> nf_conntrack_max = max_factor * nf_conntrack_htable_size.

Argh.

net.netfilter.nf_conntrack_max
is replaced by init_net.nf_conntrack_max in your patch.

But not net.nf_conntrack_max, so they are now different and not
related at all anymore except that the latter overrides the former
even in init_net.

I'm not sure this is sane.  And it needs an update to
Documentation/networking/nf_conntrack-sysctl.rst

in any case.

Also:

-       if (nf_conntrack_max && unlikely(ct_count > nf_conntrack_max)) {
+       if (net->ct.sysctl_max && unlikely(ct_count > min(nf_conntrack_max, net->ct.sysctl_max))) {

... can't be right, this allows a 0 setting in the netns.
So, setting 0 in non-init-net must be disallowed.

I suggest to remove nf_conntrack_max as a global variable,
make net.nf_conntrack_max use init_net.nf_conntrack_max too internally,
so in the init_net both sysctls remain the same.

Then, change __nf_conntrack_alloc() to do:

unsigned int nf_conntrack_max = min(net->ct.sysctl_max, &init_net.ct.sysctl_max);

and leave the if-condition as is, i.e.:

if (nf_conntrack_max && unlikely(ct_count > nf_conntrack_max)) { ...

It means:
each netns can pick an arbitrary value (but not 0, this ability needs to
be removed).

When a new conntrack is allocated, then:

If the limit in the init_net is lower than the netns, then
that limit applies, so it provides upper cap.

If the limit in the init_net is higher, the lower pernet limit
is applied.

If the init_net has 0 setting, no limit is applied.

This also needs an update to Documentation/networking/nf_conntrack-sysctl.rst
to explain the restrictions.

Or, alternative, try the other suggestion I made
(memcg charge at sysctl change time,
 https://lore.kernel.org/netfilter-devel/20250408095854.GB536@xxxxxxxxxxxxx/).

Or come up with a better proposal.