On 6/4/25 03:21, Al Viro wrote: > On Wed, Jun 04, 2025 at 02:12:11AM +0100, Tingmao Wang wrote: >> On 6/4/25 01:55, Al Viro wrote: >>> On Wed, Jun 04, 2025 at 01:45:45AM +0100, Tingmao Wang wrote: >>>> + rename_seqcount = read_seqbegin(&rename_lock); >>>> + if (rename_seqcount % 2 == 1) { >>> >>> Please, describe the condition when that can happen, preferably >>> along with a reproducer. >> >> My understanding is that when a rename is in progress the seqcount is odd, >> is that correct? >> >> If that's the case, then the fs_race_test in patch 2 should act as a >> reproducer, since it's constantly moving the directory. >> >> I can add a comment to explain this, thanks for pointing out. > > Please, read through the header declaring those primitives and read the > documentation it refers to - it's useful for background. Ok, so I didn't realize read_seqbegin actually waits for the seqcount to turn even. I did read the header earlier when following dget_parent but probably misremembered and mixed raw_seqcount_begin with read_seqbegin. > > What's more, look at the area covered by rename_lock - I seriously suspect > that you are greatly overestimating it. Admittedly "when a rename is in progress" is a vague statement. Looking at what takes rename_lock in the code, it's only when we actually do d_move where we take this lock (plus some other places), and the critical section isn't very large, and does not contain any waits etc. If we keep read_seqbegin, then that gives landlock more opportunity to do a reference-less parent walk, but at the expense that a d_move anywhere, even if it doesn't affect anything we're currently looking at, will temporarily block this landlocked application (even if not for very long), and multiple concurrent renames might cause us to wait for longer (but probably won't starve us since we just need one "cycle" where rename seqcount is even). Since we can still safely do a parent walk, just needing to take dentry references on our way, we could simply fallback to that in this situation. i.e. we can use raw_seqcount_begin and keep the seqcount & 1 check. Now, there is the argument that if d_move is very quick, then it might be worth waiting for it to finish, and we will fallback to the original parent walk if the seqcount changes again. I'm not sure which is best, but I'm inclining towards changing this to raw_seqcount_begin, as this is purely an optimization, and we do not _need_ to avoid concurrent renames.