Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/25/25 2:10 AM, Domenico Andreoli wrote:
> Hi,
>
>   This a forward of Debian bug report [0] where you can find more
> details. At [1] and [2] you can get the kernel and module to reproduce.
> I could reproduce on both amd64 and arm64 using pahole 1.29.
>
> This is marked as serious severity because it makes the autobuilder hang
> as well [3].
>
> Could you please help?
>
> Regards,
> Domenico

Hi Domenico, thanks for the bug report.

I debugged the hanging, and it appears that "abort" handling in case
of a BTF encoding error was overlooked in recent changes to speedup
parallel encoding.

Could you please try the diff below, and check if it resolves the
hanging?


diff --git a/dwarf_loader.c b/dwarf_loader.c
index 84122d0..e1ba7bc 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3459,6 +3459,7 @@ static struct {
 	 */
 	uint32_t next_cu_id;
 	struct list_head jobs;
+	bool abort;
 } cus_processing_queue;
 
 enum job_type {
@@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
 	pthread_cond_init(&cus_processing_queue.job_added, NULL);
 	INIT_LIST_HEAD(&cus_processing_queue.jobs);
 	cus_processing_queue.next_cu_id = 0;
+	cus_processing_queue.abort = false;
 }
 
 static void cus_queue__destroy(void)
@@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
 		pthread_cond_signal(&cus_processing_queue.job_added);
 	}
 	for (;;) {
+		bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
 		job = cus_queue__try_dequeue();
-		if (job)
+		if (job || abort)
 			break;
 		/* No jobs or only steals out of order */
 		pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
@@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
 
 	while (!stop) {
 		job = cus_queue__enqdeq_job(job);
+		if (!job)
+			goto out_abort;
+
 		switch (job->type) {
 
 		case JOB_DECODE:
@@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
 
 	return (void *)DWARF_CB_OK;
 out_abort:
+	__atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
+	pthread_cond_signal(&cus_processing_queue.job_added);
 	return (void *)DWARF_CB_ABORT;
 }
 
@@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
 
 	/* Process the one or more modules gleaned from this file. */
 	int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
-	if (err < 0)
+	if (err)
 		return -1;
 
 	// We can't call dwfl_end(dwfl) here, as we keep pointers to strings
-- 
2.48.1


>
>
> The command to succeed:
>
> This simplified (sequential) command succeeds:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j1
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> 0
> ===== >8 =====
>
>
> While this (parallel) command hangs:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j2
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> Terminated
> 143
> ===== >8 =====

Please note that even though the sequential command succeeds, the BTF
output is going to be incomplete (and potentially invalid). The
underlying issue is that there is an unhandled DW_TAG in the BTF
encoder. The encoding process exits on errors like this.

It would be nice if you provided all the input (base vmlinux and the
module) that led to this error, so we could investigate.

Thank you!

>
>
> [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1100503
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=vmlinux.zst;msg=19
> [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=nvidia-modeset.base.ko.zst;msg=12
> [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101262
>





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux