On 6/4/25 18:15, Mickaël Salaün wrote: > On Wed, Jun 04, 2025 at 01:45:43AM +0100, Tingmao Wang wrote: >> [..] >> @@ -897,10 +898,14 @@ static bool is_access_to_paths_allowed( >> break; >> jump_up: >> if (walker_path.dentry == walker_path.mnt->mnt_root) { >> + /* follow_up gets the parent and puts the passed in path */ >> + path_get(&walker_path); >> if (follow_up(&walker_path)) { >> + path_put(&walker_path); > > path_put() cannot be safely called in a RCU read-side critical section > because it can free memory which can sleep, and also because it can wait > for a lock. However, we can call rcu_read_unlock() before and > rcu_read_lock() after (if we hold a reference). Thanks for pointing this out. Actually I think this might be even more tricky. I'm not sure if we can always rely on the dentry still being there after rcu_read_unlock(), regardless of whether we do a path_get() before unlocking... Even when we're inside a RCU read-side critical section, my understanding is that if a dentry reaches zero refcount and is selected to be freed (either immediately or by LRU) from another CPU, dentry_free will do call_rcu(&dentry->d_u.d_rcu, __d_free) which will cause the dentry to immediately be freed after our rcu_read_unlock(), regardless of whether we had a path_get() before that. In fact because lockref_mark_dead sets the refcount to negative, path_get() would simply be wrong. We could use lockref_get_not_dead() instead, and only continue if we actually acquired a reference, but then we have the problem of not being able to dput() the dentry acquired by follow_up(), without risking it getting killed before we can enter RCU again (although I do wonder if it's possible for it to be killed, given that there is an active mountpoint on it that we hold a reference for?). While we could probably do something like "defer the dput() until we next reach a mountpoint and can rcu_read_unlock()", or use lockref_put_return() and assert that the dentry must still have refcount > 0 since it's an in-use mountpoint, after a lot of thinking it seems to me the only clean solution is to have a mechanism of walking up mounts completely reference-free. Maybe what we actually need is choose_mountpoint_rcu(). That function is private, so I guess a question for Al and other VFS people here is, can we potentially expose an equivalent publicly? (Perhaps it would only do effectively what __prepend_path in d_path.c does, and we can track the mount_lock seqcount outside. Also the fact that throughout all this we have a valid reference to the leaf dentry we started from, to me should mean that the mount can't disappear under us anyway) > >> /* Ignores hidden mount points. */ >> goto jump_up; >> } else { >> + path_put(&walker_path); >> /* >> * Stops at the real root. Denies access >> * because not all layers have granted access. >> @@ -920,11 +925,11 @@ static bool is_access_to_paths_allowed( >> } >> break; >> } >> - parent_dentry = dget_parent(walker_path.dentry); >> - dput(walker_path.dentry); >> + parent_dentry = walker_path.dentry->d_parent; >> walker_path.dentry = parent_dentry; >> } >> - path_put(&walker_path); >> + >> + rcu_read_unlock(); >> >> if (!allowed_parent1) { >> log_request_parent1->type = LANDLOCK_REQUEST_FS_ACCESS; >> @@ -1045,12 +1050,11 @@ static bool collect_domain_accesses( >> layer_masks_dom, >> LANDLOCK_KEY_INODE); >> >> - dget(dir); >> - while (true) { >> - struct dentry *parent_dentry; >> + rcu_read_lock(); >> >> + while (true) { >> /* Gets all layers allowing all domain accesses. */ >> - if (landlock_unmask_layers(find_rule(domain, dir), access_dom, >> + if (landlock_unmask_layers(find_rule_rcu(domain, dir), access_dom, >> layer_masks_dom, >> ARRAY_SIZE(*layer_masks_dom))) { >> /* >> @@ -1065,11 +1069,11 @@ static bool collect_domain_accesses( >> if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir))) >> break; >> >> - parent_dentry = dget_parent(dir); >> - dput(dir); >> - dir = parent_dentry; >> + dir = dir->d_parent; >> } >> - dput(dir); >> + >> + rcu_read_unlock(); >> + >> return ret; >> } >> >> -- >> 2.49.0 >> >>