On Fri, Aug 29, 2025 at 01:17:16PM +0300, Alexander Monakov wrote: > > On Fri, 29 Aug 2025, Christian Brauner wrote: > > > On Fri, Aug 29, 2025 at 10:21:35AM +0300, Alexander Monakov wrote: > > > > > > On Wed, 27 Aug 2025, Alexander Monakov wrote: > > > > > > > Dear fs hackers, > > > > > > > > I suspect there's an unfortunate race window in __fput where file locks are > > > > dropped (locks_remove_file) prior to decreasing writer refcount > > > > (put_file_access). If I'm not mistaken, this window is observable and it > > > > breaks a solution to ETXTBSY problem on exec'ing a just-written file, explained > > > > in more detail below. > > > > > > The race in __fput is a problem irrespective of how the testcase triggers it, > > > right? It's just showing a real-world scenario. But the issue can be > > > demonstrated without a multithreaded fork: imagine one process placing an > > > exclusive lock on a file and writing to it, another process waiting on that > > > lock and immediately execve'ing when the lock is released. > > > > > > Can put_file_access be moved prior to locks_remove_file in __fput? > > > > Even if we fix this there's no guarantee that the kernel will give that > > letting the close() of a writably opened file race against a concurrent > > exec of the same file will not result in EBUSY in some arcane way > > currently or in the future. > > Forget Go and execve. Take the two-process scenario from my last email. > The program waiting on flock shouldn't be able to observe elevated > refcounts on the file after the lock is released. It matters not only > for execve, but also for unmounting the underlying filesystem, right? What? No. How?: with details, please. > And maybe other things too. So why not fix the ordering issue in __fput > and if there are other bugs breaking valid uses of flock, fix them too? For locks_remove_file() to be movable after put_file_access() we'd have to prove that no filesystem implementing f_op->lock() doesn't rely on f_op->release() to not have run. It is fundamentally backwards to have run f_ops after f_op->release() ran. Random quick look into 9pfs: static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl) { struct p9_flock flock; struct p9_fid *fid; uint8_t status = P9_LOCK_ERROR; int res = 0; struct v9fs_session_info *v9ses; fid = filp->private_data; BUG_ON(fid == NULL); This relies on filp->private_data to be valid which it wouldn't be anymore after f_op->release(). Moving put_file_access() before f_op->release() is also wrong and would require to prove that no filesystem depends on file access not having changed before f_op->release() has run. So no, not a trivial thing to move around. And you are explicitly advertising this as a fix to the go execve problem; both in the bugtracker and here. And it's simply not a good solution. The problem remains exec's deny-write mechanism. But hooking into file locks to serialize exec against writable openers isn't a good solution. It surely is creative though. We can give userspace a simple way to sidestep this problem though as I tried to show.