On 2025-06-25 17:27:02 [+0200], Nam Cao wrote: > > > @@ -1896,21 +1732,30 @@ static int ep_send_events(struct eventpoll *ep, > > > __pm_relax(ws); > > > } > > > > > > - list_del_init(&epi->rdllink); > > > - > > > /* > > > * If the event mask intersect the caller-requested one, > > > * deliver the event to userspace. Again, we are holding ep->mtx, > > > * so no operations coming from userspace can change the item. > > > */ > > > revents = ep_item_poll(epi, &pt, 1); > > > - if (!revents) > > > + if (!revents) { > > > + init_llist_node(n); > > > + > > > + /* > > > + * Just in case epi becomes ready after ep_item_poll() above, but before > > > + * init_llist_node(). Make sure to add it to the ready list, otherwise an > > > + * event may be lost. > > > + */ > > > > So why not llist_del_first_init() at the top? Wouldn't this avoid the > > add below? > > Look at that function: > static inline struct llist_node *llist_del_first_init(struct llist_head *head) > { > struct llist_node *n = llist_del_first(head); > > // BROKEN: another task does llist_add() here for the same node > > if (n) > init_llist_node(n); > return n; > } > > It is not atomic to another task doing llist_add() to the same node. > init_llist_node() would then put the list in an inconsistent state. Okay, I wasn't expecting another llist_add() from somewhere else. Makes sense. > To be sure, I tried your suggestion. Systemd sometimes failed to boot, and > my stress test crashed instantly. I had a trace_printk() there while testing and it never triggered. > > > > > + if (unlikely(ep_item_poll(epi, &pt, 1))) { > > > + ep_pm_stay_awake(epi); > > > + epitem_ready(epi); > > > + } > > > continue; > > > + } > > > > > > events = epoll_put_uevent(revents, epi->event.data, events); > > > if (!events) { > > > - list_add(&epi->rdllink, &txlist); > > > - ep_pm_stay_awake(epi); > > > + llist_add(&epi->rdllink, &ep->rdllist); > > > > That epitem_ready() above and this llist_add() add epi back where it was > > retrieved from. Wouldn't it loop in this case? > > This is the EFAULT case, we are giving up, therefore we put the item back > and bail out. Therefore no loop. Right. > If we have already done at least one item, then we report that to user. If > none, then we report -EFAULT. Regardless, this current item is not > "successfully consumed", so we put it back for the others to take it. We > are done here. > > > I think you can avoid the add above and here adding it to txlist would > > avoid the loop. (It returns NULL if the copy-to-user failed so I am not > > sure why another retry will change something but the old code did it, > > too so). > > > > > if (!res) > > > res = -EFAULT; > > > break; > > > > One note: The old code did "list_add() + ep_pm_stay_awake()". Now you do > > "ep_pm_stay_awake() + epitem_ready()". epitem_ready() adds the item > > conditionally to the list so you may do ep_pm_stay_awake() without > > adding it to the list because it already is. Looking through > > ep_pm_stay_awake() it shouldn't do any harm except incrementing a > > counter again. > > Yes, it shouldn't do any harm. > > Thanks for reviewing, I know this lockless thing is annoying to look at. but it looks now a bit smaller :) > Nam Sebastian