On 09-Jul-25 10:41, Stefano Garzarella wrote:
On Wed, 9 Jul 2025 at 17:26, Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
On Wed, 9 Jul 2025 at 16:54, Konstantin Shkolnyy <kshk@xxxxxxxxxxxxx> wrote:
I'm seeing a problem on s390 with the new "SOCK_STREAM transport change
null-ptr-deref" test. Here is how it appears to happen:
test_stream_transport_change_client() spins for 2s and sends 70K+
CONTROL_CONTINUE messages to the "control" socket.
test_stream_transport_change_server() spins calling accept() because it
keeps receiving CONTROL_CONTINUE.
When the client exits, the server has received just under 1K of those
70K CONTROL_CONTINUE, so it calls accept() again but the client has
exited, so accept() never returns and the server never exits.
Just to be clear, I was seeing something a bit different.
The accept() in the server is no-blocking, since we set O_NONBLOCK on
the socket, so I see the server looping around a failing accept()
(errno == EAGAIN) while dequeueing the CONTROL_CONTINUE messages, so
after 10/15 seconds the server ends on my case.
It seems strange that in your case it blocks, since it should be a
no-blocking call.
It was my mistake. The accept() doesn't block. I've retested it more
carefully and it keeps returning and the loop eventually consumes all
queued CONTROL_CONTINUE messages and quits, as you described.