I believe I have a couple of adjustments to the counters that make things flow properly again, including through xfstests numerous times on top of 6.15-rc6. I guess we had this bug all along, I'm glad Dave's patch uncovered it. I think Dave's patch probably should have been pulled during a merge window instead of halfway through rc7 though. Maybe it got talked about a lot and I missed it.. I don't see where it has caused any other problems but 6.14 is on Fedora 42... orangefs is broken there. -Mike diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 5ac743c6bc2e..08a6f372a352 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -32,12 +32,13 @@ static int orangefs_writepage_locked(struct folio *folio, len = i_size_read(inode); if (folio->private) { wr = folio->private; - WARN_ON(wr->pos >= len); off = wr->pos; - if (off + wr->len > len) + if ((off + wr->len > len) && (off <= len)) wlen = len - off; else wlen = wr->len; + if (wlen == 0) + wlen = wr->len; } else { WARN_ON(1); off = folio_pos(folio); @@ -46,8 +47,6 @@ static int orangefs_writepage_locked(struct folio *folio, if (wlen > len - off) wlen = len - off; } - /* Should've been handled in orangefs_invalidate_folio. */ - WARN_ON(off == len || off + wlen > len); WARN_ON(wlen == 0); bvec_set_folio(&bv, folio, wlen, offset_in_folio(folio, off)); @@ -320,6 +319,8 @@ static int orangefs_write_begin(struct file *file, wr->len += len; goto okay; } else { + wr->pos = pos; + wr->len = len; ret = orangefs_launder_folio(folio); if (ret) return ret; On Wed, Apr 30, 2025 at 5:06 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 4/30/25 13:43, Mike Marshall wrote: > > [ 1991.319111] orangefs_writepage_locked: wr->pos:0: len:4080: > > [ 1991.319450] service_operation: file_write returning: 0 for 0000000018e1923a. > > [ 1991.319457] orangefs_writepage_locked: wr->pos:4080: len:4080: > > Is that consistent with an attempt to write 4080 bytes that failed, > returned a 0 and then encountered the WARN_ON()? > > While I guess it's possible that userspace might be trying to write > 4080 bytes twice, the wr->pos:4080 looks suspicious. Is it possible > that wr->pos inadvertently got set to 4080 during the write _failure_? > Then, the write (aiming to write the beginning of the file) retries > but pos==4080 and not 0. > > > [ 1991.319581] Call Trace: > > [ 1991.319583] <TASK> > ... > > [ 1991.319613] orangefs_launder_folio+0x2e/0x50 [orangefs] > > [ 1991.319619] orangefs_write_begin+0x87/0x150 [orangefs] > > [ 1991.319624] generic_perform_write+0x81/0x280 > > [ 1991.319627] generic_file_write_iter+0x5e/0xe0 > > [ 1991.319629] orangefs_file_write_iter+0x44/0x50 [orangefs] > > [ 1991.319633] vfs_write+0x240/0x410 > > [ 1991.319636] ksys_write+0x52/0xc0 > > [ 1991.319638] do_syscall_64+0x62/0x180 > > [ 1991.319640] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 1991.319643] RIP: 0033:0x7f218b134f44 > > This is the path I was expecting. Note that my hackish patch will just > lift the old (pre-regression) faulting from generic_file_write_iter() > up to its caller: orangefs_file_write_iter(). > > So now I'm doubly curious if that also hides the underlying bug.