Em Fri, 20 Jun 2025 20:44:59 +0200 Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> escreveu: > Em Fri, 20 Jun 2025 15:05:39 +0200 > Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> escreveu: > > > Em Fri, 20 Jun 2025 20:14:57 +0900 > > Akira Yokosawa <akiyks@xxxxxxxxx> escreveu: > > > > > Mauro! > > > > > > On Fri, 20 Jun 2025 09:44:30 +0200, Mauro Carvalho Chehab wrote: > > > > Em Fri, 20 Jun 2025 11:22:48 +0900 > > > > Akira Yokosawa <akiyks@xxxxxxxxx> escreveu: > > > > > > > [...] > > > > > > > > > > > I didn't test it yet, but yesterday I wrote a script which allows us to test > > > > for Sphinx version breakages on multiple versions in one go. > > > > > > > > Using it (and again before this patch, but after my parser-yaml series), I > > > > noticed that 6.0.1 with "-jauto" with those packages: > > > > > > Why did you pick 6.0.1, which was in the middle of successive releases in > > > early 6.x days. > > > > I added all major,minor,latest-patch version since 3.4.3 and added to > > the script. I didn't check what of those are inside a distro or not. > > > > > No distro Sphinx packagers have picked this version. > > > > The hole idea is to have a script where we can automate build tests > > with old versions. Perhaps it makes a sense to add a flag at the table > > indicating what major distros have what sphinx version and a command > > line parameter to either test all or just the ones shipped on major > > distros. > > > > > > Just see the release history: > > > > > > [2022-10-16] 5.3.0 ### stable ### > > > [2022-12-29] 6.0.0 > > > [2023-01-05] 6.0.1 > > > [2023-01-05] 6.1.0 6.1.1 > > > [2023-01-07] 6.1.2 > > > [2023-01-10] 6.1.3 ### stable ### > > > [2023-04-23] 6.2.0 > > > > > > The crash you observed is hardly related to this fix. > > > > Almost certainly, the breakage with 6.0.1 is unrelated to this > > change. > > Heh, I'm not even sure that the problem is with 6.0.1 or with > Fedora OOM killer setup... > > Even with 64GB ram and 8GB swap(*), I'm getting lots of those: > > jun 20 03:23:46 myhost kernel: [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name > jun 20 03:23:46 myhost kernel: [ 1762] 998 1762 4074 467 96 371 0 77824 144 -900 systemd-oomd > jun 20 03:23:46 myhost kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.kde.konsole-433443.scope,task=sphinx-build,pid=1043271,uid=1000 > jun 20 03:23:46 myhost kernel: Out of memory: Killed process 1043271 (sphinx-build) total-vm:4222280kB, anon-rss:3934380kB, file-rss:688kB, shmem-rss:0kB, UID:1000 pgtables:7812kB oom_score_adj:200 > jun 20 03:24:28 myhost kernel: sphinx-build invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=200 > jun 20 03:24:28 myhost kernel: oom_kill_process.cold+0xa/0xbe > > Will do some extra texts here and try to adjust this. > > (*) Granted, I need more swap... the FS was generated when 8GB > were good enough ;-) > Still 64GB RAM should be enough. Will try to change overcommit > and see how it goes. Yeah, the problem with 6.0.1 was indeed with OOM killer. Once I added more 64GB of swap, and wait for a long time, it finally compleded the task without crashes: $ ./scripts/test_doc_build.py -m -V 6.0.1 -v ... Finished doc build for Sphinx 6.0.1. Elapsed time: 00:31:02 Summary: Sphinx 6.0.1 elapsed time: 00:31:02 Looking at the past log I have handy, this is by far the worse one: Finished doc build for Sphinx 6.1.3. Elapsed time: 00:11:15 Finished doc build for Sphinx 6.2.1. Elapsed time: 00:09:21 Finished doc build for Sphinx 7.0.1. Elapsed time: 00:09:17 Finished doc build for Sphinx 7.1.2. Elapsed time: 00:09:22 Finished doc build for Sphinx 7.2.3. Elapsed time: 00:09:17 Finished doc build for Sphinx 7.3.7. Elapsed time: 00:09:34 Finished doc build for Sphinx 7.4.7. Elapsed time: 00:04:54 Finished doc build for Sphinx 8.0.2. Elapsed time: 00:03:40 Finished doc build for Sphinx 8.1.3. Elapsed time: 00:03:47 Finished doc build for Sphinx 8.2.3. Elapsed time: 00:03:45 (3.4.3 was the previous "champion" with about 14 minutes) All of them are using "-jauto" on a machine with 24 CPU threads. The only one that didn't work with my past scenario was 6.0.1, so OOM killer seems the one to blame: it is killing a sub-process, but keeping the main one active, thus causing Sphinx to run for a long time, only to notice at the end that something bad happened and producing a completely bogus log. Heh, systemd-oomd, shame on you! Thanks, Mauro