On 9/4/25 10:26 AM, Kees Cook wrote:
On Wed, Sep 03, 2025 at 08:38:03PM +0000, Tom Hromatka wrote:
Add an operation, SECCOMP_CLONE_FILTER, that can copy the seccomp filters
from another process to the current process.
I roughly reproduced the Docker seccomp filter [1] and timed how long it
takes to build it (via libseccomp) and attach it to a process. After
1000 runs, on average it took 3,740,000 TSC ticks (or ~1440 microseconds)
on an AMD EPYC 9J14 running at 2596 MHz. The median build/load time was
3,715,000 TSC ticks.
On the same system, I preloaded the above Docker seccomp filter onto a
process. (Note that I opened a pidfd to the reference process and left
the pidfd open for the entire run.) I then cloned the filter using the
feature in this patch to 1000 new processes. On average, it took 9,300
TSC ticks (or ~3.6 microseconds) to copy the filter to the new processes.
The median clone time was 9,048 TSC ticks.
This is approximately a 400x performance improvement for those container
managers that are using the exact same seccomp filter across all of their
containers.
Thanks for looking it over. I'll make the technical changes in a v2 in
the next week or two.
This is a nice speedup, but with devil's advocate hat on, are launchers
spawning at rates high enough that this makes a difference?
For users that launch VMs that last hours or more, you are correct, this
change doesn't matter to them.
But there are a small subset of users that launch containers at a very
high rate and startup times are critical.
FWIW, easyseccomp [1] was created a few years ago in part because
generating filters with libseccomp can be challenging and somewhat
slow.
Thanks!
Tom
[1] https://github.com/giuseppe/easyseccomp