kdevops: scaling automated testing

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Mon, 24 Mar 2025 14:39:35 -0700

We already have a kdevops BoF scheduled so I don't think we need another
session, but what I think is needed is references to folks or a memo about
how I think we can scale automated tests with kdevops.

Essentially we have kdevops kernel-ci support now [0], and we have not
only kernel-patches-daemon integration but also have worked with
kernel.org admins to PoC using lei based patchwork to help us reduce the
scope of what we want to test [1]. So for instance, you can request a
lei based patchwork for all kernel patches posted which modifies just
the loopback driver.

Since kdevops leverages kconfig, it also means that if you write your
kconfig logic, and since we require it for kdevops, that means you can
leverage existing CI web intrastructure to provide the variability of
your tests using existing web CI tooling, you just map your target CI
goals to a and end result kdevops .config. An example is provided with
XFS to enable testing all tests or just reduce the scope, and also
allowing you to modify say, the SOAK_DURATION. And so the github<->KPD
integration git tree we use is more useful than the usual KPD git trees,
in that we can simply enable now also kernel maintainers for each subsystem
to just git push their development branches.

That means kernel subsystems can either opt-in for automatically testing
patches posted to the mailing lists for the subsystem and / or can just
do testing for when the maintainer wants to.

The next aspect to this is scaling archiving test results. Although github
does let you upload artifacts, these are ephemeral and so won't be around
forever. That means kernel configs may be gone eventually too...

To address this kdevops supports and uses both archiving results as ephemeral
to github and then also pushes results as persistent to github. We leverage
git LFS which lets git trees to be larger than usual, and also it enables
users to clone a tree archive and *not* download all tarballs, and on-demand
only fetch the files you really need. We do this with kdevops-results-archive.
Since even git LFS trees still has a size limit all you need to do is rotate
the archive as "epochs". For example see some results from our 2025-02 epoch
for XFS, that's a limited set of results [4] but we also have more
expanded set of results [5].

The next question is how to scale this in terms of infrasturcture.

For that we can use a SAT solver, and wouldn't it be nice? But we actually
have one proposed for kconfig and so we can just use that. So let me
paste the relevant parts:

How can we leverage a SAT solver on kdevops?

1) Feature-driven configuration and scriptable goals

Instead of having the user do the heavy work on figuring out what the
heck to enable on make menuconfig, the user just has to writes a
requirement. Something like this:

ci-goal:
  - filesystem: xfs
  - features: [reflink, 4k]
  - workload: sysbench-mysql-docker

This can also enable scriptable CI goals:

kconfig-solve --enable sysbench --fs xfs --blocksize 4k --reflink

Generates .config to let us test this.

2) Minimized configs to reproduce a test on our CI

Today if someone wants to reproduce a generic/750 test on xfs reflink 4k
profile they can just use the web interface to select just the xfs_reflink_4k
defconfig, and we have a kconfig option to let us limit the test to a
set specified [0]. That requires adding a defconfig per test profile we
support. Wouldn't it be nicer if we can just say:

ci-goal:
  - filesystem: xfs
  - features: [reflink, 4k]
  - testsuite: fstests
  - tests: generic/750

3) Generate a set of different tests for a goal

Given a set of features we want to test, we could have the
SAT solver look for satisfiable combinations we could have

ci-goal:
  - filesystem: xfs
  - features: [reflink]
  - workload: sysbench-mysql-docker

And so this may generate different .configs to help us run each one as a
setup to test test XFS on mysql using docker using all XFS profiles.

Given we support all cloud providers...

This can also be something like:

matrix:
  providers: [aws, gcp]
    storage: [ebs, nvme]
      filesystems: [xfs, ext4]
      testsuites: [fstests]

If we could gather data about price...

       - cost_limit: $0.50/hr

We then just need a mapping of code to tests.

code_paths:
  fs/xfs/: [fstests, ltp, gitr]
  block/: [blktets]

Ie, code maps to Kconfig attributes, and so we know what tests to run
as code gets updated on each commit.

So... if we have hardware at LF we can donate... then we can just use our
own cloud like openstack of ubicloud to let us describe our needs for
each test.

Does this make sense?

What we need? More help and focus on kpd and its code, and then also
deciding if we can give LF hw.

[0] https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/README.md
[1] https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/kernel-ci-kpd.md
[2] https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/linux-filesystems-kdevops-CI-testing.md
[3] https://github.com/linux-kdevops/kdevops-results-archive/
[4] https://github.com/linux-kdevops/kdevops-results-archive-2025-02/commit/1b94c7227e58c0fb8e3f6362fd59e482d373c433
[5] https://github.com/linux-kdevops/kdevops-results-archive-2025-02/commit/f5c35a745220d720423af939a81b7aba93451063
[6] https://lore.kernel.org/all/Z9_JA_tuFbVJRcTR@xxxxxxxxxxxxxxxxxxxxxx/

  Luis