I will do a better job at communicating things like this in the future, but we ran into some additional issues that resulted in another revert back to non-pipeline builds on 8/22.
1. Error: creating events dirs: mkdir /run/user/1101: permission denied – builds were getting OOM Killed. This would result in the jenkins-build user essentially getting logged out. This would break systemd lingering. We realized there is some logic in the ceph spec file that sets -j based on how much memory a system has for RPM builds. The make-debs.sh script in ceph.git, which ceph-dev-pipeline utilizes and non-containerized builds do not, did not have this memory calculation. It does now.
2. basic_outcome_exception_observers.hpp: Operation not permitted – The source tarball created inside a container was being untarred as root. There was some scenario where this would leave root-owned files left over in the jenkins-build user’s home directory. Subsequent Wipe Workspace plugin runs would fail because there is no way to tell the plugin to use sudo. Make-debs.sh now untars files as the host user’s UIDDs.
3. error running container: from /usr/bin/crun creating container – Same story as the OOM Kill.
4. Error: current system boot ID differs from cached boot ID – Systemd didn’t correctly clean up podman-related files on boot. Dan got a fix into upstream Podman but we also merged a fix to manually put that tmpfiles.d config into place before jobs run.
Tl;dr - we would like to switch back to the Pipeline tomorrow (9/12) and we’ll keep an eye on the builds. Thanks for your patience as we worked through these issues.
From:
David Galloway <David.Galloway@xxxxxxx>
Date: Thursday, August 21, 2025 at 3:41 PM
To: dev <dev@xxxxxxx>, sepia@xxxxxxxx <sepia@xxxxxxx>
Subject: Re: Maintenance Notification: Branch Push Behavior Change - 8/19/25
We believe the root cause of all failures has been identified and resolved. A recent shellcheck suggestion broke some bash logic so a podman command to prepare the container environment wasn’t getting run.
Additionally, a post-job script to chown files was missing a GID. The Wipe Workspace plugin that attempts to clean up the build environment before each job doesn’t use sudo and was unable to delete some files
left over.
https://github.com/ceph/ceph-build/pull/2428/files
Thanks Dan, Zack, and John for the quick identification and resolution.
From:
David Galloway <David.Galloway@xxxxxxx>
Date: Thursday, August 21, 2025 at 12:00 PM
To: dev <dev@xxxxxxx>, sepia@xxxxxxxx <sepia@xxxxxxx>
Subject: Re: Maintenance Notification: Branch Push Behavior Change - 8/19/25
All,
I have reverted this change for the time being. We are seeing a few (what I believe to be) container-related permissions issues that need to be resolved.
https://github.com/ceph/ceph-build/pull/2427
Please re-push any branches you need rebuilt. Apologies for the inconvenience.
From:
David Galloway <David.Galloway@xxxxxxx>
Date: Tuesday, August 19, 2025 at 4:10 PM
To: dev <dev@xxxxxxx>, sepia@xxxxxxxx <sepia@xxxxxxx>
Subject: Re: Maintenance Notification: Branch Push Behavior Change - 8/19/25
This is complete. Any branches pushed to ceph-ci after this message will be using the new ceph-dev-pipeline job by default. Let us know if you encounter any issues.
From:
David Galloway <David.Galloway@xxxxxxx>
Date: Wednesday, August 13, 2025 at 4:11 PM
To: dev <dev@xxxxxxx>, sepia@xxxxxxxx <sepia@xxxxxxx>
Subject: Maintenance Notification: Branch Push Behavior Change - 8/19/25
Upstream development community,
You may recall we announced the availability of a new Jenkins pipeline in June:
https://lists.ceph.io/hyperkitty/list/sepia@xxxxxxx/thread/HBA4CVO2F6VVBZEOLCUPZJ6SOPB7KCDF/
Since that announcement, this pipeline has been opt-in only. We feel we’ve observed enough builds succeed and ironed out a few bugs that we’re comfortable making this pipeline the default Jenkins job for
both ceph.git and ceph-ci.git branches.
This switch will be made Tuesday, August 19 at roughly 3PM Eastern.
There is no additional action required on your part. Some things to remember:
- DWZ is disabled by default
- SCCACHE is enabled by default
- The above behavior, as well as which DISTROs/ARCHs to build for, can be overridden using git trailers
- Builds will be done
inside a container
For a refresher on how to use the git trailers and their behavior in our environment, see
https://github.com/ceph/ceph-build/blob/main/ceph-trigger-build/README.md.
David Galloway
Ceph Engineering Labs – Infrastructure Architect
+1 989 295 0091 - Mobile
david.galloway@xxxxxxx
IBM
|
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx