On 5/26/25 17:06, Jens Axboe wrote: > On 5/26/25 7:05 AM, Jens Axboe wrote: >> On 5/25/25 1:12 PM, Vlastimil Babka wrote: >> >> Thanks for taking a look at this! I tried to reproduce this this morning >> and failed miserably. I then injected a delay for the above case, and it >> does indeed then trigger for me. So far, so good. >> >> I agree with your analysis, we should only be doing the dropbehind for a >> non-zero return from __folio_end_writeback(), and that includes the >> test_and_clear to avoid dropping the drop-behind state. But we also need >> to check/clear this state pre __folio_end_writeback(), which then puts >> us in a spot where it needs to potentially be re-set. Which fails pretty >> racy... >> >> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I >> suspect this would've taken a while to run into. > > Took a closer look... I may be smoking something good here, but I don't > see what the __folio_end_writeback()() return value has to do with this > at all. Regardless of what it returns, it should've cleared > PG_writeback, and in fact the only thing it returns is whether or not we > had anyone waiting on it. Which should have _zero_ bearing on whether or > not we can clear/invalidate the range. Yeah it's very much possible that I was wrong, folio_xor_flags_has_waiters() looked a bit impenetrable to me, and it seemed like an simple explanation to the splats. But as you had to add delays, this indeed smells as a race. > To me, this smells more like a race of some sort, between dirty and > invalidation. fsx does a lot of sub-page sized operations. > > I'll poke a bit more... >