Re: [PATCH v10 06/14] unwind_user/deferred: Add deferred unwinding interface

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Thu, 19 Jun 2025 09:50:08 +0200

On Wed, Jun 18, 2025 at 03:09:15PM -0400, Steven Rostedt wrote:
> On Wed, 18 Jun 2025 20:46:20 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> 
> > > +struct unwind_work;
> > > +
> > > +typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_stacktrace *trace, u64 timestamp);
> > > +
> > > +struct unwind_work {
> > > +	struct list_head		list;  
> > 
> > Does this really need to be a list? Single linked list like
> > callback_head not good enough?
> 
> Doesn't a list head make it easier to remove without having to iterate the
> list?

Yeah, but why would you ever want to remove it? You asked for an unwind,
you get an unwind, no?

> > >  static __always_inline void unwind_exit_to_user_mode(void)
> > >  {
> > >  	if (unlikely(current->unwind_info.cache))
> > >  		current->unwind_info.cache->nr_entries = 0;
> > > +	current->unwind_info.timestamp = 0;  
> > 
> > Surely clearing that timestamp is only relevant when there is a cache
> > around? Better to not add this unconditional write to the exit path.
> 
> That's actually not quite true. If the allocation fails, we still want to
> clear the timestamp. But later patches add more data to check and it does
> exit out if there's been no requests:

Well, you could put in an error value on alloc fail or somesuch. Then
its non-zero.

> But for better reviewing, I could add a comment in this patch that states
> that this will eventually exit out early when it does more work.

You're making this really hard to review, why not do it right from the
get-go?

> > > +/* Guards adding to and reading the list of callbacks */
> > > +static DEFINE_MUTEX(callback_mutex);
> > > +static LIST_HEAD(callbacks);  
> > 
> > Global state.. smells like failure.
> 
> Yes, the unwind infrastructure is global, as it is the way tasks know what
> tracer's callbacks to call.

Well, that's apparently how you've set it up. I don't immediately see
this has to be like this.

And there's no comments no nothing.

I don't see why you can't have something like:

struct unwind_work {
	struct callback_head task_work;
	void *data;
	void (*func)(struct unwind_work *work, void *data);
};

void unwind_task_work_func(struct callback_head *task_work)
{
	struct unwind_work *uw = container_of(task_work, struct unwind_work, task_work);

	// do actual unwind

	uw->func(uw, uw->data);
}

or something along those lines. No global state involved.

> > > +	guard(mutex)(&callback_mutex);
> > > +	list_for_each_entry(work, &callbacks, list) {
> > > +		work->func(work, &trace, timestamp);
> > > +	}  
> > 
> > So now you're globally serializing all return-to-user instances. How is
> > that not a problem?
> 
> It was the original way we did things. The next patch changes this to SRCU.
> But it requires a bit more care. For breaking up the series, I preferred
> not to add that logic and make it a separate patch.
> 
> For better reviewing, I'll add a comment here that says:
> 
> 	/* TODO switch this global lock to SRCU */

Oh ffs :-(

So splitting up patches is for ease of review, but now you're making
splits that make review harder, how does that make sense?