[PATCH 2/2] eventpoll: Fix epoll_wait() report false negative

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ep_events_available() checks for available events by looking at ep->rdllist
and ep->ovflist. However, this is done without a lock, therefore the
returned value is not reliable. Because it is possible that both checks on
ep->rdllist and ep->ovflist are false while ep_start_scan() or
ep_done_scan() is being executed on other CPUs, despite events are
available.

This bug can be observed by:

  1. Create an eventpoll with at least one ready level-triggered event

  2. Create multiple threads who do epoll_wait() with zero timeout. The
     threads do not consume the events, therefore all epoll_wait() should
     return at least one event.

If one thread is executing ep_events_available() while another thread is
executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly
return no event for the former thread.

This reproducer is implemented as TEST(epoll65) in
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c

Fix it by skipping ep_events_available(), just call ep_try_send_events()
directly.

epoll_sendevents() (io_uring) suffers the same problem, fix that as well.

There is still ep_busy_loop() who uses ep_events_available() without lock,
but it is probably okay (?) for busy-polling.

Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
Fixes: ae3a4f1fdc2c ("eventpoll: add epoll_sendevents() helper")
Signed-off-by: Nam Cao <namcao@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
 fs/eventpoll.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0fbf5dfedb24..541481eafc20 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2022,7 +2022,7 @@ static int ep_schedule_timeout(ktime_t *to)
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		   int maxevents, struct timespec64 *timeout)
 {
-	int res, eavail, timed_out = 0;
+	int res, eavail = 1, timed_out = 0;
 	u64 slack = 0;
 	wait_queue_entry_t wait;
 	ktime_t expires, *to = NULL;
@@ -2041,16 +2041,6 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		timed_out = 1;
 	}
 
-	/*
-	 * This call is racy: We may or may not see events that are being added
-	 * to the ready list under the lock (e.g., in IRQ callbacks). For cases
-	 * with a non-zero timeout, this thread will check the ready list under
-	 * lock and will add to the wait queue.  For cases with a zero
-	 * timeout, the user by definition should not care and will have to
-	 * recheck again.
-	 */
-	eavail = ep_events_available(ep);
-
 	while (1) {
 		if (eavail) {
 			res = ep_try_send_events(ep, events, maxevents);
@@ -2496,9 +2486,7 @@ int epoll_sendevents(struct file *file, struct epoll_event __user *events,
 	 * Racy call, but that's ok - it should get retried based on
 	 * poll readiness anyway.
 	 */
-	if (ep_events_available(ep))
-		return ep_try_send_events(ep, events, maxevents);
-	return 0;
+	return ep_try_send_events(ep, events, maxevents);
 }
 
 /*
-- 
2.39.5





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux