Hi David,
I was surprised to see this because I'm working on the very same problem.
But since I didn't have a reproducer I've painstakingly worked through
the map reload related code.
I don't know if my changes have fixed the problem but I can post them
for you to try them out. The main reason I would prefer to use my changes
(if they do fix the problem) is that I found quite a few problems with the
map reload not working properly which lead to spending a bunch of time on
that. One of the changes fixes the valid entry lookup and removes setting
the entry negative in lookup_prune_one_cache() and I think fixes the devid
handling in do_readmap_mount().
I'm not quite finished the series yet so I'll post it when I have, hopefully
today.
Umm, I probably should give your reproducer a go ... perhaps later ...
Ian
On 1/8/25 23:22, David Disseldorp wrote:
This effectively reverts commit 21ce28d ("autofs-5.1.4 - mark removed
cache entry negative"), which causes the kernel to stall in autofs_wait
for the following workload:
cat > /etc/auto.direct <<EOF
echo "/nfs/share $mount_args ${NFS_SERVER}:/${NFS_SHARE}"
EOF
setsid --fork automount --debug --foreground &> /automount.log
sleep 1
touch /test.run
setsid --fork /bin/bash -c \
"while [[ -f /test.run ]]; do df -ia >> /test.log; sleep 1; done"
echo "df loop logging to /test.log"
sleep 2
echo "changing and reloading auto.direct"
echo > /etc/auto.direct
killall -HUP automount
sleep 2
echo "unmounting..."
umount /nfs/share || echo "umount failed"
The current behaviour sees us hit:
handle_packet_missing_direct:1352: can't find map entry for ()
...which doesn't respond to the kernel, triggering the stall.
This approach adds a new MOUNT_FLAG_STALE flag to track removed map
entries. While keeping enough state around to respond for the
handle_packet_missing_direct case.
RFC:
- needs further testing (e.g. indirect maps)
- I'm not familiar with the codebase so this may be the wrong approach
- we may need a background job to purge MOUNT_FLAG_STALE entries?
Signed-off-by: David Disseldorp <ddiss@xxxxxxx>
---
daemon/direct.c | 8 ++++++--
daemon/indirect.c | 8 ++++++--
daemon/lookup.c | 11 ++++-------
include/automount.h | 3 +++
4 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/daemon/direct.c b/daemon/direct.c
index 42baac8..5e78c40 100644
--- a/daemon/direct.c
+++ b/daemon/direct.c
@@ -1389,8 +1389,12 @@ int handle_packet_missing_direct(struct autofs_point *ap, autofs_packet_missing_
return 0;
}
- /* Check if we recorded a mount fail for this key */
- if (me->status >= monotonic_time(NULL)) {
+ /*
+ * Check if we recorded a mount fail for this key, or the entry has
+ * been removed.
+ */
+ if (me->status >= monotonic_time(NULL) ||
+ me->flags & MOUNT_FLAG_STALE) {
ops->send_fail(ap->logopt,
ioctlfd, pkt->wait_queue_token, -ENOENT);
ops->close(ap->logopt, ioctlfd);
diff --git a/daemon/indirect.c b/daemon/indirect.c
index 7d4aad7..934bb74 100644
--- a/daemon/indirect.c
+++ b/daemon/indirect.c
@@ -798,8 +798,12 @@ int handle_packet_missing_indirect(struct autofs_point *ap, autofs_packet_missin
me = lookup_source_mapent(ap, pkt->name, LKP_DISTINCT);
if (me) {
- /* Check if we recorded a mount fail for this key */
- if (me->status >= monotonic_time(NULL)) {
+ /*
+ * Check if we recorded a mount fail for this key, or the entry
+ * has been removed.
+ */
+ if (me->status >= monotonic_time(NULL) ||
+ me->flags & MOUNT_FLAG_STALE) {
ops->send_fail(ap->logopt, ap->ioctlfd,
pkt->wait_queue_token, -ENOENT);
cache_unlock(me->mc);
diff --git a/daemon/lookup.c b/daemon/lookup.c
index dc77948..ad0b460 100644
--- a/daemon/lookup.c
+++ b/daemon/lookup.c
@@ -1416,15 +1416,12 @@ void lookup_prune_one_cache(struct autofs_point *ap, struct mapent_cache *mc, ti
if (valid && valid->mc == mc) {
/*
* We've found a map entry that has been removed from
- * the current cache so it isn't really valid. Set the
- * mapent negative to prevent further mount requests
+ * the current cache so it isn't really valid. Flag the
+ * mapent stale to prevent further mount requests
* using the cache entry.
*/
- debug(ap->logopt, "removed map entry detected, mark negative");
- if (valid->mapent) {
- free(valid->mapent);
- valid->mapent = NULL;
- }
+ debug(ap->logopt, "removed map entry detected, mark stale");
+ valid->flags |= MOUNT_FLAG_STALE;
cache_unlock(valid->mc);
valid = NULL;
}
diff --git a/include/automount.h b/include/automount.h
index 9548db8..007d020 100644
--- a/include/automount.h
+++ b/include/automount.h
@@ -548,6 +548,9 @@ struct kernel_mod_version {
/* Indicator for applications to ignore the mount entry */
#define MOUNT_FLAG_IGNORE 0x1000
+/* map has been removed, but we can't clean up yet */
+#define MOUNT_FLAG_STALE 0x2000
+
struct autofs_point {
pthread_t thid;
char *path; /* Mount point name */