On 12/05/2025 10:17, Tony Ambardar wrote: > On Fri, May 09, 2025 at 11:40:47AM -0700, Andrii Nakryiko wrote: >> On Thu, May 8, 2025 at 6:22 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: >>> >>> When testing v1 of [1] we noticed that functions with 0-sized structs >>> as parameters were not part of BTF encoding; this was fixed in v2. >>> However we need to make sure we handle such zero-sized structs >>> correctly since they confound the calling convention expectations - >>> no registers are used for the empty struct so this has knock-on effects >>> for subsequent register-parameter matching. >> >> Do you have a list (or at least an example) of the function we are >> talking about, just curious to see what's that. >> > > BTW, Alan shared an example in the other pahole patch thread: > https://lore.kernel.org/dwarves/07d92da1-36f3-44d2-a0a4-cf7dabf278c6@xxxxxxxxxx/ > Yep, the one I came across on x86_64 was static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t tw, int min_events, int max_events); (the io_tw_token_t parameter is a typedef for struct io_tw_state { }; >> The question I have is whether it's safe to assume that regardless of >> architecture we can assume that zero-sized struct has no effect on >> register allocation (which would seem logical, but is that true for >> all ABIs). >> >> BTW, while looking at patch #2, I noticed that >> btf_distill_func_proto() disallows functions returning >> struct-by-value, which seems overly aggressive, at least for structs >> of up to 8 bytes. So maybe if we can validate that both cases are not >> introducing any new quirks across all supported architectures, we can >> solve both limitations? >> > Good idea. I'll try and address this and add a return value test. > Given pahole (and my related patch) assume pass-by-value for well-sized > structs, I'd like to see this too. But while the pahole patch works on > 64/32-bit archs, I noticed from patch #1 that e.g. ___bpf_treg_cnt() > seems to hard-code a 64-bit register size. Perhaps we can fix that too? > So I think your concern is the assumptions __builtin_choose_expr(sizeof(t) == 8, 1, \ __builtin_choose_expr(sizeof(t) == 16, 2, \ ? We may need arch-specific macros that specify register size that we can use here, or is there a better way? >> P.S., oh, and s390x selftest (test_struct_args) isn't happy, please check. Yep, working to repro this locally now, thanks. Alan