2023/03/12
Keeping the TCP Window a byte open
Over the weekend, I was adding TCP support into cannelloni and I used a pattern that only consumes as many bytes as needed for the next step while reading from the socket. This is done by using a decoder with a state machine that returns the amount of bytes that need to be present until the decoder can be called again to decode the next segment.
Reading exactly n bytes
A quote from the man page of read(2)
:
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number.
It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.
This means one needs to pay extra attention on how many bytes have been read, not only how many bytes can be read.
Since cannelloni is written in C++ and designed to run on Linux machines,
I am using ioctl
and FIONREAD
to know how many bytes are availabe in the read buffer of the socket.
If the decoder has advertised more bytes than are currently present, the process waits briefly so that more bytes arrive.
The whole process looks like this:
while (1) {
/* [...] */
/* check whether we can read enough bytes */
ssize_t expectedBytes = decoder.expectedBytes;
int available;
if (ioctl(socket, FIONREAD, &available) == -1) {
lerror << "ioctl failed" << std::endl;
disconnect();
continue;
} else if (available > 0 && available < static_cast<int>(expectedBytes)) {
/* not enough bytes are available, let's wait a bit */
std::this_thread::sleep_for(std::chrono::milliseconds(20));
continue;
}
receivedBytes = read(socket, buffer, expectedBytes);
if (receivedBytes < 0) {
lerror << "recvfrom error." << std::endl;
/* close connection */
disconnect();
continue;
} else if (receivedBytes == 0){
disconnect();
continue;
}
/* [...] */
decoder.expectedBytes = decodeFrame(buffer, receivedBytes, &decoder.tempFrame, &decoder.state);
/* [...] */
}
So far so good. The code above will not read
less than expectedBytes
from socket
which is something that needs to be accounted for.
Load testing closes the window shut
It worked fine under normal operation but when doing load testing using cangen vcan0 -c 1 -v -g 0 > /dev/null
which generates 30-40 Mbit/s of
CAN traffic on my laptop things broke quite fast. No data would flow between the two instances of cannelloni and no frames
were bridged between the two SocketCAN interfaces. Even the TCP connection itself was silent besides of regular keep-alives.
The receiving end would actually wait for more bytes to be available on the socket, i.e. 2
bytes instead of the expected 4
would
be in the receive buffer.
This was also visible when inspecting the connection state using ss -t
- but what was more confusing
were the several hundered kilobyte of data in the Send-Q
of the sender. What was going on? The one side was waiting for just a few more bytes of data
while the other side was sitting on several hundered kilobytes!
I started wireshark and starred at the output.
Wireshark log of TCP Zero condition reached
The output shows that at some point a zero window is announced by the receiver. This happens when the receiver can not
accept more data which typically occurs when the application is not consuming data fast enough and the receive buffer (ss: Recv-Q
) fills up.
But ss -t
clearly shows an almost empty queue instead of a full one!
So I replace all the code above by a simple loop that read
s up to 10 bytes from the socket and then waits 10 milliseconds.
This is really inefficient: a perfect bottle neck.
With this I could reproduce the zero window condition without the decoder blocking the connection forever or ending up in weird
states in the protocol.
Monitoring the socket again with ss -t
, I was able to see that the receiver queue would only fill up once Recv-Q
was
really close or actually zero.
Keeping the window open
With the assumption that the Recv-Q
actually needs to be empty before the zero window condition on that connection is lifted again
by the Kernel, I fixed the TCP_WINDOW_CLAMP
to 1
which means that the window will never be zero.
const int min_window_size = 1;
if (setsockopt(m_socket, IPPROTO_TCP, TCP_WINDOW_CLAMP, &min_window_size, sizeof(min_window_size))) {
lerror << "Could not set window size to " << min_window_size << std::endl;
}
The receiving end is now able to fill the receive buffer byte-by-byte until the expectedBytes
can be read,
which fully resolved the dead lock on localhost
while stress testing!
Theory
Most TCP congestion control algorithms like Reno, New Reno or CUBIC use the Round Trip Time as a key metric to measure the latency between two network endpoints.
The RTT affects a lot of variables of the connection like the estimated bandwidth of the connection, retransmission time-outs, window size and possibly also the threshold when a once full queue is deemed empty enough to accept new data.
Since the RTT on localhost
should be close to zero, the threshold of the Recv-Q
should also be relativly close to zero
if RTT is a factor in the calculation.
I couldn’t find reliable information on the underlying algorithms without reading the Kernel source code, if you know more, shoot me an e-mail and I will update this blog post.