Hi,
After noticing this log "rejected association temporarily; comeback
duration 1000 TU (1024 ms)" I started looking more into how the kernel
handles this and noticed a few things:
1. The kernel takes the delay in the association response frame and
waits, but has no sane bounds for how long the wait is. An AP could send
0xffffffff and the kernel will just block for that entire duration.
2. The first issue would appear to be guarded by the fact that
run_again() only reschedules if the new timeout is less than the current
time remaining but only if there is an existing timer set.
Looking at the code, the association timer gets set when we begin an
association so it _should_ be set when we hit this comeback delay case.
But through testing I found that it is not. Hacking hostapd to use 10000
TU's as the comeback delay I see this:
[ 4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
[ 4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
status=30 aid=0)
[ 4.340409] wlan1: 02:00:00:00:00:00 rejected association
temporarily; comeback duration 10000 TU (10240 ms)
[ 14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
[ 14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
status=30 aid=0)
[ 14.658430] wlan1: 02:00:00:00:00:00 rejected association
temporarily; comeback duration 10000 TU (10240 ms)
[ 14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
[ 14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
status=30 aid=0)
[ 14.854269] wlan1: 02:00:00:00:00:00 rejected association
temporarily; comeback duration 10000 TU (10240 ms)
So the first association attempt waited the full 10 seconds, then after
that the timer was presumably set, and we only waited the default 200ms
(ASSOC_TIMEOUT). So to me, this feels like either a bug or an oversight
on how this should be handled:
- If the timer should already be set, this is a bug as I see the
kernel waiting excessively.
- If the timer being unset is expected, the kernel should be limiting
this wait to something reasonable.
I also realize that CMD_ASSOC_COMEBACK was added and userspace gets
notified, but this feels excessive to handle in userspace when the
kernel could instead enforce a sane timeout all on its own without
requiring userspace disconnect/reconnect when the AP sends an absurd
timeout.
My main concern here is a rouge AP scenario that can then DoS all your
clients that try and connect to it.
Thanks,
James