On 3/20/25 21:54, Cong Wang wrote: > On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote: >> On 3/19/25 23:18, Cong Wang wrote: >>> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: >>>> Signal delivery during connect() may lead to a disconnect of an already >>>> established socket. That involves removing socket from any sockmap and >>>> resetting state to SS_UNCONNECTED. While it correctly restores socket's >>>> proto, a call to vsock_bpf_recvmsg() might have been already under way in >>>> another thread. If the connect()ing thread reassigns the vsock transport to >>>> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. >>>> >> >> *THREAD 1* *THREAD 2* >> >>>> connect >>>> / state = SS_CONNECTED / >>>> sock_map_update_elem >>>> vsock_bpf_recvmsg >>>> psock = sk_psock_get() >>>> lock sk >>>> if signal_pending >>>> unhash >>>> sock_map_remove_links >>> >>> So vsock's ->recvmsg() should be restored after this, right? Then how is >>> vsock_bpf_recvmsg() called afterward? >> >> I'm not sure I understand the question, so I've added a header above: those >> are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called >> afterwards. It was called before sock_map_remove_links(). Note that at the >> time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still >> executing (in T2). > > I thought the above vsock_bpf_recvmsg() on the right side completed > before sock_map_remove_links(), sorry for the confusion. No problem, I see why you've might. Perhaps deeper indentation would make things clearer. >>>> state = SS_UNCONNECTED >>>> release sk >>>> >>>> connect >>>> transport = NULL >>>> lock sk >>>> WARN_ON_ONCE(!vsk->transport) >>>> >>> >>> And I am wondering why we need to WARN here since we can handle this error >>> case correctly? >> >> The WARN and transport check are here for defensive measures, and to state >> a contract. >> >> But I think I get your point. If we accept for a fact of life that BPF code >> should be able to handle transport disappearing - then WARN can be removed >> (while keeping the check) and this patch can be dropped. > > I am thinking whether we have more elegant way to handle this case, > WARN looks not pretty. Since the case should never happen, I like to think of WARN as a deliberate eyesore :) >> My aim, instead, was to keep things consistent. By which I mean sticking to >> the conditions expressed in vsock_bpf_update_proto() as invariants; so that >> vsock with a psock is guaranteed to have transport assigned. > > Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() > because for example UDP is (almost) lockless, so enforcing the sock lock > for all vsock types looks not flexible and may hurt performance. > > Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks > for different struct proto (as we did for TCP/UDP)? By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not implemented read_skb() callback, making it unsupported by BPF/sockmap.