Re: Help needed for scotch FTBFS: illegal instruction of ppc64le - gcc regression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 31 Jul 2025 07:47:00 -0600
Orion Poplawski <orion@xxxxxxxx> wrote:

> On 7/28/25 03:12, Dan Horák wrote:
> > On Mon, 28 Jul 2025 10:40:03 +0200
> > Florian Weimer <fweimer@xxxxxxxxxx> wrote:
> > 
> >> * Dan Horák:
> >>
> >>> On Sun, 27 Jul 2025 21:34:12 +0200
> >>> Sandro Mani <manisandro@xxxxxxxxx> wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> scotch is currently FTBFS on ppc64le (affects the current 7.0.7, the
> >>>> previous 7.0.6, as well as the new 7.0.8 release), failing with [1]
> >>>>
> >>>> gmake[2]: *** [src/libscotch/CMakeFiles/ptscotchf_h.dir/build.make:77: src/include/ptscotchf.h] Illegal instruction (core dumped)
> >>>>
> >>>> 7.0.6 previously successfully built with gcc-0:15.0.1-0.3.fc42.1.ppc64le and now fails with gcc-0:15.1.1-5.fc43.1.ppc64le, so this looks like a gcc regression.
> >>>>
> >>>> Being this on ppc64le and having no access to such a machine, how can I debug this?
> >>>
> >>> you have access to a system from
> >>> https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainers
> >>> and it can be reproduced there. But it needs the Power10 system (same
> >>> as current koji builders), not the Power9 (pre-DC-migration koji
> >>> builders, it builds there OK).
> >>>
> >>> ppc64le-redhat-linux-gnu-openmpi/src/libscotch/ptdummysizes is the
> >>> crashing binary ...
> >>>
> >>> and running it under gdb gives
> >>>
> >>> ...
> >>> Program received signal SIGILL, Illegal instruction.
> >>> 0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6
> >>> (gdb) where
> >>> #0  0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6
> >>> #1  0x00007ffff787b38c in ucm_fire_mmap_events_internal () from /lib64/libucm.so.0
> >>> #2  0x00007ffff787bd88 in ucm_mmap_test_events_nolock () from /lib64/libucm.so.0
> >>> #3  0x00007ffff78818b8 in ucm_mmap_install () from /lib64/libucm.so.0
> >>> #4  0x00007ffff7881b30 in ucm_mmap_init () from /lib64/libucm.so.0
> >>> #5  0x00007ffff7881c2c in ucm_library_init () from /lib64/libucm.so.0
> >>> #6  0x00007ffff7881cbc in ucm_set_global_opts () from /lib64/libucm.so.0
> >>> #7  0x00007ffff725745c in ucs_init_ucm_opts () from /lib64/libucs.so.0
> >>> #8  0x00007ffff7243fb0 in ucs_init () from /lib64/libucs.so.0
> >>> #9  0x00007ffff7f989bc in call_init (l=<optimized out>, argc=1, argv=0x7fffffffece8, env=0x7fffffffecf8) at dl-init.c:74
> >>> #10 _dl_init (main_map=0x7ffff7ff12f0, argc=1, argv=0x7fffffffece8, env=0x7fffffffecf8) at dl-init.c:121
> >>> #11 0x00007ffff7fc3eb8 in _dl_start_user () from /lib64/ld64.so.2
> >>
> >> The location of the crash:
> >>
> >> Dump of assembler code for function __GI___sbrk:
> >>     0x00007ffff774e400 <+0>:     d1 ff 21 f8     stdu    r1,-48(r1)
> >> => 0x00007ffff774e404 <+4>:     0e 00 10 06     .long 0x610000e
> >>     0x00007ffff774e408 <+8>:     00 00 60 3d     lis     r11,0
> >>     0x00007ffff774e40c <+12>:    ff 7f 6b 61     ori     r11,r11,32767
> >>     0x00007ffff774e410 <+16>:    c7 07 6b 79     sldi.   r11,r11,32
> >>     0x00007ffff774e414 <+20>:    87 f7 6b 65     oris    r11,r11,63367
> >>     0x00007ffff774e418 <+24>:    b8 9e 6b 61     ori     r11,r11,40632
> >>     0x00007ffff774e41c <+28>:    a6 03 69 7d     mtctr   r11
> >>     0x00007ffff774e420 <+32>:    20 04 80 4e     bctr
> >>     0x00007ffff774e424 <+36>:    40 00 01 f8     std     r0,64(r1)
> >>     0x00007ffff774e428 <+40>:    99 61 ff 4b     bl      0x7ffff77445c0 <__brk>
> >>
> >> This was patched by the ucx library.
> >>
> >> The original looks like this:
> >>
> >> 000000000014e400 <__sbrk>:
> >>    14e400:       d1 ff 21 f8     stdu    r1,-48(r1)
> >>    14e404:       0e 00 10 06     plbz    r9,961781       # 2390f9 <__libc_initial>
> >>    14e408:       f5 ac 20 89
> >>    14e40c:       78 1b 62 7c     mr      r2,r3
> >>    14e410:       00 00 09 2c     cmpwi   r9,0
> >>    14e414:       4c 00 82 40     bne     14e460 <__sbrk+0x60>
> >>    14e418:       00 00 23 2c     cmpdi   r3,0
> >>    14e41c:       b0 00 82 40     bne     14e4cc <__sbrk+0xcc>
> >>    14e420:       a6 02 08 7c     mflr    r0
> >>    14e424:       40 00 01 f8     std     r0,64(r1)
> >>    14e428:       99 61 ff 4b     bl      1445c0 <brk>
> >>
> >> So there was a 64-bit instruction bundle at the patched offset, and that
> >> may have been the reason why ucx failed to patch properly.
> > 
> > I agree
> >   
> >> I would very much prefer if there weren't any libraries like ucx in
> >> Fedora that patch glibc merely because you link against them.  It's fine
> >> to do this for debugging tools, but as part of regular execution, it
> >> risks too much breakage.
> > 
> > thanks, Florian, for the insight
> > 
> > IMO this issue is also causing the openmpi build failure in the
> > mass-rebuild
> > - https://koji.fedoraproject.org/koji/taskinfo?taskID=135241418
> > 
> > 
> > 		Dan
> 
> I'm planning on dropping ucx support in openmpi on ppc64le:
> https://src.fedoraproject.org/rpms/openmpi/pull-request/24

ack, makes sense
 
> This should hopefully fix a lot of FTBFS issues with MPI using packages.
> 
> Thank you very much for the detailed analysis - I would not have known 
> where to start.

right, it is a weird one ...


		Dan
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux