On Wed, Jun 25, 2025 at 05:33:54PM +0200, Sebastian Andrzej Siewior wrote: > On 2025-06-25 17:27:02 [+0200], Nam Cao wrote: > > > > @@ -1896,21 +1732,30 @@ static int ep_send_events(struct eventpoll *ep, > > > > __pm_relax(ws); > > > > } > > > > > > > > - list_del_init(&epi->rdllink); > > > > - > > > > /* > > > > * If the event mask intersect the caller-requested one, > > > > * deliver the event to userspace. Again, we are holding ep->mtx, > > > > * so no operations coming from userspace can change the item. > > > > */ > > > > revents = ep_item_poll(epi, &pt, 1); > > > > - if (!revents) > > > > + if (!revents) { > > > > + init_llist_node(n); > > > > + > > > > + /* > > > > + * Just in case epi becomes ready after ep_item_poll() above, but before > > > > + * init_llist_node(). Make sure to add it to the ready list, otherwise an > > > > + * event may be lost. > > > > + */ > > > > > > So why not llist_del_first_init() at the top? Wouldn't this avoid the > > > add below? > > > > Look at that function: > > static inline struct llist_node *llist_del_first_init(struct llist_head *head) > > { > > struct llist_node *n = llist_del_first(head); > > > > // BROKEN: another task does llist_add() here for the same node > > > > if (n) > > init_llist_node(n); > > return n; > > } > > > > It is not atomic to another task doing llist_add() to the same node. > > init_llist_node() would then put the list in an inconsistent state. > > Okay, I wasn't expecting another llist_add() from somewhere else. Makes > sense. Sorry, it's been a few weeks and I misremembered. But that wasn't the reason. epitem_ready() is atomic to llist_del_first_init(). The actual reason is that, llist_del_first_init() would allow another llist_add() to happen. So in the future loop iterations, we could see the same item again, and we would incorrectly report more events than actually available. Thus, init_llist_node() doesn't happen until we are done looping. > > To be sure, I tried your suggestion. Systemd sometimes failed to boot, and > > my stress test crashed instantly. > > I had a trace_printk() there while testing and it never triggered. This code path is only executed for broken userspace. Nam