Search Linux Wireless

Re: Association comeback delay behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,

> 1. The kernel takes the delay in the association response frame and 
> waits, but has no sane bounds for how long the wait is. An AP could send 
> 0xffffffff and the kernel will just block for that entire duration.

For some value of "block", it's not really blocking in the (traditional)
threading sense of the word :)


> 2. The first issue would appear to be guarded by the fact that 
> run_again() only reschedules if the new timeout is less than the current 
> time remaining but only if there is an existing timer set.
> 
> Looking at the code, the association timer gets set when we begin an 
> association so it _should_ be set when we hit this comeback delay case. 
> But through testing I found that it is not. Hacking hostapd to use 10000 
> TU's as the comeback delay I see this:
> 
> [    4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [    4.340409] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> [   14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [   14.658430] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> [   14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
> [   14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [   14.854269] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> 
> So the first association attempt waited the full 10 seconds, then after 
> that the timer was presumably set, and we only waited the default 200ms 
> (ASSOC_TIMEOUT). 

That's not exactly how it works, run_again() multiplexes different
things onto the same timer by tracking the various sources. So the
_timer_ might be expiring again, but the actual "assoc handling" part
should only happen after 10000 TU.

> So to me, this feels like either a bug

Yes. I can't reproduce it though:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
[   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   25.440000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   36.320000] wlan0: association with 02:00:00:00:00:00 timed out


That last "timed out" should really come earlier though, oops. Let me
fix that:

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index fa7cf3b8ad59..f4a5deedfaab 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -6383,7 +6383,8 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
 
 	if (status_code == WLAN_STATUS_ASSOC_REJECTED_TEMPORARILY &&
 	    elems->timeout_int &&
-	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK) {
+	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK &&
+	    assoc_data->tries < IEEE80211_ASSOC_MAX_TRIES) {
 		u32 tu, ms;
 
 		cfg80211_assoc_comeback(sdata->dev, assoc_data->ap_addr,


So now I see:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
[   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   25.440000] wlan0: 02:00:00:00:00:00 denied association (code=30)

>   - If the timer being unset is expected, the kernel should be limiting 
> this wait to something reasonable.

Define "reasonable"? I mean, sure, if it says 0xffffffff we'll even
overflow the calculation and end up trying way too early, and if it says
0x100000 instead to avoid the overflow inside the calculation and in
jiffies, we'll wait a very long time:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 1048576 TU (1073741 ms)
[ 1078.240000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[ 1078.240000] wlan0: deauthenticated from 02:00:00:00:00:00 while associating (Reason: 6=CLASS2_FRAME_FROM_NONAUTH_STA)

Long enough, in fact, that hostapd forgot the STA even existed ;-)


> I also realize that CMD_ASSOC_COMEBACK was added and userspace gets 
> notified, but this feels excessive to handle in userspace when the 
> kernel could instead enforce a sane timeout all on its own without 
> requiring userspace disconnect/reconnect when the AP sends an absurd 
> timeout.

Define "absurd". Bigger than around what I was demonstrating above
doesn't actually work properly anyway due to the possible overflows, and
sure, 15 minutes is long, but doesn't feel "absurd".

I tend to think this is exactly right - the kernel will wait, but since
it's not doing anything else that doesn't really matter. Maybe it'll
work later (earlier tests above), maybe it won't (like when the AP
forgot about the STA above), but it's not like the kernel is holding
some important resource busy for all that time?

And userspace gets notified and gets a choice, so of course it can give
up on the association instead.

And yeah I did "iw connect -w" and it'd be hard to actually work around
it with that, but it could even make the assoc socket-owned and then
it'd probably stop when you hit Ctrl-C, and anyway nobody really uses
that.


> My main concern here is a rouge AP scenario that can then DoS all your 
> clients that try and connect to it.

Oh, so you're just trying to sell us a missing implementation in iwd as
a kernel security bug? :-)

johannes





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux