Problems receiving large amount of data through ruby TCP Socket - ruby

I am sending and receiving JSON data through a TCP socket. It works fine when it is smaller amounts of data, like 200 bytes or so. But when it gets to about 10 KB it only receives part of the data. I have tried all the different TCP socket retrieve data commands I can find (read, gets, gets.chomp, recv) but I cannot find one that will work for all of my tests.
Here is the code I have now:
socket = TCPSocket.new '10.11.50.xx', 13338
response = socket.recv(1000000000)
I have also tried adding a timeout but I could not get it to work:
socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, 1)
I am not sure what I am missing. Any help would be appreciated.

It's badly documented in the Ruby docs, but I think TCPSocket#recv actually just calls the recv system call. That one (see man 2 recv) reads a number of bytes from the stream that is determined by the kernel, though never more than the application specifies. To receive a larger "message", you will need to call it in a loop.
But there is an easier way: because TCPSocket indirectly inherits from the IO class, you get all of its methods for free, including IO#read which does read as many bytes as you specify (if possible).
You wil also need to implement a way to delimit your messages:
use fixed-length messages
send the length of the message up front in a (fixed-size) header
use some kind of terminator, e.g. a NULL byte

Related

Linux filter in Ruby which maintains continuous data flow for a given time by its internal buffer?

For example when the incoming data stream on the STDIN can have pauses, gaps in it for a few minutes, meanwhile no data is sent, because actual data is still being prepeared, or isn't currently available for some reason.
I would like to write a CLI app in Ruby which reads its own STDIN and writes the data without any modification to its own STDOUT. However it must maintain an internal buffer, which prevents the situation, when there are no incoming data. In this case it should turn to its internal buffer, and provide outgoing data to STDOUT from that buffer, for example 64-1024 bytes per second or even less, until the internal buffer runs out.
I would like to use this filter in the following manner:
producer_app | filter | consumer_app
Without the buffering it is simple:
bufsize=64*1024
while data=STDIN.read(bufsize) do
STDOUT.write(data)
end
Unfortunately when there is no data on the STDIN for let's say a minute, then the STDIN.read(bufsize) call simply blocks (the further execution), for a minute, which is not good in my case, because consumer_app immediatley closes, exits, when it can't read any data for 20 seconds (it uploads data to a server, which closes the connection, when there is no data to be read for 20 seconds).
I think it should use STDIN.read_nonblock() call somehow, for example:
data=STDIN.read_nonblock(bufsize) rescue nil
returns nil into the "data" variable, when there is no data to read, else it returns some data, which size in bytes is <= than bufsize.
Of course we can assume, that incoming data stream byte count in total is greater than the above mentioned internal buffer size.

How to maintain the TCP connection using Ruby?

Using TCPSocket, I need to socket.puts "foobar", and then socket.close to let the socket in the other side socket.read the message.
Is there a way to send or receive a message though a socket, but without closing the socket, which mean I can send message again without creating a socket again?
p.s Something like websocket
If you traverse up the super class chain you will eventually see that you inherit from IO. Most IO objects, in Ruby, buffer the data to be more efficient writing and reading from disk.
In your case, the buffer wasn't large enough (or enough time didn't pass) for it to flush out. However, when you closed the socket, this forced a flush of the buffer resources.
You have a few options:
You can manually force the buffer to flush using IO#flush.
You can set the buffer to always sync after a write / read, by setting IO#sync= to true. You can check the status of your IO object's syncing using IO#sync; I'm guessing you'd see socket.sync #=> false
Use BasicSocket#send which will call POSIX send(2); since sockets are initialized with O_FSYNC set, the send will be synchronous and atomic.
It should not be neccessary to close the connection in order for the other party to read it. send should transfer the data over the conection immediately. Make sure the other party is reading from the socket.

About boost::asio::io_service::run into multithread

boost doc says that io_service may distribute work across threads in an arbitrary fashion, is it means that when i'm using TCP socket i may receive data into disorder? Because my reception handler may be distribute across threads in an arbitrary fashion.
When you schedule an async_read or a read using a boost io_service, you act on a socket. Either through socket->read(...) or read(socket ...). If you look through the documentation, there are some variants that accept a criteria for finishing the read, number of bytes, or matching condition. Using this you could have a connection which gives you say 20 bytes of data and you read it in 10 bytes to one thread and while that thread is processing the data, the next 20 bytes go to another thread. There are a few cases when you may want to do that, but usually you will want each thread to read in an entire packet.
If you want to ensure that only one thread is handling your io from a socket at a time, you can wrap the callbacks in a strand. Here's a fairly generic example of what that would look like.
boost::asio::async_read(socket,
buffer(*responseBuffer),
transfer_all(),
strand.wrap(boost::bind(&YourClass::handleRead,
this, /*or use shared_from_this*/
placeholders::error)));

How do you know when all the data has been received by the Winsock control that has issued a POST or GET to an Web Server?

I'm using the VB6 Winsock control. When I do a POST to a server I get back the response as multiple Data arrival events.
How do you know when all the data has arrived?
(I'm guessing it's when the Winsock_Close event fires)
I have used VB6 Winsock controls in the past, and what I did was format my messages in a certain way to know when all the data has arrived.
Example: Each message starts with a "[" and ends with a "]".
"[Message Text]"
When data comes in from the DataArrival event check for the end of the message "]". If it is there you received at least one whole message, and possibly the start of a new one. If more of the message is waiting, store your message data in a form level variable and append to it when the DataArrival event fires the next time.
In HTTP, you have to parse and analyze the reply data that the server is sending back to you in order to know how to read it all.
First, the server sends back a list of CRLF-delimited header lines, which are terminated by a blank CRLF-delimited line by itself. You then have to look at the actual values of the 'Content-Length' and 'Transfer-Encoding' headers to know how to read the remaining data.
If there is no 'Transfer-Encoding' header, or if it does not contain a 'chunked' item in it, then the 'Content-Length' header specifies how many remaining bytes to read. But if the 'Transfer-Encoding' header contains a 'chunked' item, then you have to read and parse the remaining data in chunks, one at a time, in order to know when the data ends (each chunk reports its own size, and the last chunk reports a size of 0).
And no, you cannot rely on the connection being closed after the reply has been sent, unless the 'Connection' header explicitally says 'close'. For HTTP 1.1, that header is usually set to 'keep-alive' instead, which means the socket is left open so the client can send more requests on the same socket.
Read RFC 2616 for more details.
No, the Close event doesn't fire when all the data has arrived, it fires when you close the connection. It's not the Winsock control's job to know when all the data has been transmitted, it's yours. As part of your client/server communication protocol implementation, you have to tell the client what to expect.
Suppose your client wants the contents of a file from the server. The client doesn't know how much data is in the file. The exchange might go something like this:
client sends request for the data in the file
the server reads the file, determines the size, attaches the size to the beginning of the data (let's say it uses 4 bytes) that tells the client how much data to expect, and starts sending it
your client code knows to strip the first 4 bytes off any data that arrives after a file request and store it as the amount of data that is to follow, then accumulate the subsequent data, through any number of DataArrival events, until it has that amount
Ideally, the server would append a checksum to the data as well, and you'll have to implement some sort of timeout mechanism, figure out what to do if you don't get the expected amount of data, etc.

Optimally reading data from an Asynchronous Socket

I have a problem with a socket library that uses WSAASyncSelect to put the socket into asynchronous mode. In asynchronous mode the socket is placed into a non-blocking mode (WSAWOULDBLOCK is returned on any operations that would block) and windows messages are posted to a notification window to inform the application when the socket is ready to be read, written to etc.
My problem is this - when receiving a FD_READ event I don't know how many bytes to try and recv. If I pass a buffer thats too small, then winsock will automatically post another FD_READ event telling me theres more data to read. If data is arriving very fast, this can saturate the message queue with FD_READ messages, and as WM_TIMER and WM_PAINT messages are only posted when the message queue is empty this means that an application could stop painting if its receiving a lot of data and useing asynchronous sockets with a too small buffer.
How large to make the buffer then? I tried using ioctlsocket(FIONREAD) to get the number of bytes to read, and make a buffer exactly that large, BUT, KB192599 explicitly warns that that approach is fraught with inefficiency.
How do I pick a buffer size thats big enough, but not crazy big?
As far as I could ever work out, the value set using setsockopt with the SO_RVCBUF option is an upper bound on the FIONREAD value. So rather than call ioctlsocket it should be OK to call getsockopt to find out the SO_RCVBUF setting, and use that as the (attempted) value for each recv.
Based on your comment to Aviad P.'s answer, it sounds like this would solve your problem.
(Disclaimer: I have always used FIONREAD myself. But after reading the linked-to KB article I will probably be changing...)
You can set your buffer to be as big as you can without impacting performance, relying on the TCP PUSH flag to make your reads return before filling the buffer if the sender sent a smaller message.
The TCP PUSH flag is set at a logical message boundary (normally after a send operation, unless explicitly set to false). When the receiving end sees the PUSH flag on a TCP packet, it returns any blocking reads (or asynchronous reads, doesn't matter) with whatever's accumulated in the receive buffer up to the PUSH point.
So if your sender is sending reasonable sized messages, you're ok, if he's not, then you limit your buffer size such that even if you read into it all, you don't negatively impact performance (subjective).

Resources