Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Mon, Jul 14, 2025 at 04:36:35PM +0200, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > On Thu, Jul 03, 2025 at 04:21:51PM +0200, Florian Westphal wrote: > > > > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > > > Thanks for the description, this scenario is esoteric. > > > > > > > > > > Is this bug fully reproducible? > > > > > > > > No. Unicorn. Only happened once. > > > > Everything is based off reading the backtrace and vmcore. > > > > > > I guess this needs a chaos money to trigger this bug. Else, can we try to catch this unicorn again? > > > > I would not hold my breath. But I don't see anything that prevents the > > race described in 4/4, and all the things match in the vmcore, including > > increment of clash resolution counter. If you think its too perfect > > then ok, we can keep 4/4 back until someone else reports this problem > > again. > > Hm, I think your sequence is possible, it is the SLAB_TYPESAFE_BY_RCU rule > that allows for this to occur. > > Could this rare sequence still happen? > > cpu x cpu y cpu z > found entry E found entry E > E is expired <preemption> > nf_ct_delete() > return E to rcu slab > init_conntrack > <preemption> NOTE: ct->status not yet set to zero > > cpu y resumes, it observes E as expired but CONFIRMED: > <resumes> > nf_ct_expired() > -> yes (ct->timeout is 30s) > confirmed bit set. Yes, that can happen, but then the refcount can't be incremented as its 0 (-> entry is skipped). If its nonzero but the object was returned by the kmem cache we have a different kind of bug (free with refcount > 0), or use-after-free.