Linux filter in Ruby which maintains continuous data flow for a given time by its internal buffer? - ruby

For example when the incoming data stream on the STDIN can have pauses, gaps in it for a few minutes, meanwhile no data is sent, because actual data is still being prepeared, or isn't currently available for some reason.
I would like to write a CLI app in Ruby which reads its own STDIN and writes the data without any modification to its own STDOUT. However it must maintain an internal buffer, which prevents the situation, when there are no incoming data. In this case it should turn to its internal buffer, and provide outgoing data to STDOUT from that buffer, for example 64-1024 bytes per second or even less, until the internal buffer runs out.
I would like to use this filter in the following manner:
producer_app | filter | consumer_app
Without the buffering it is simple:
bufsize=64*1024
while data=STDIN.read(bufsize) do
STDOUT.write(data)
end
Unfortunately when there is no data on the STDIN for let's say a minute, then the STDIN.read(bufsize) call simply blocks (the further execution), for a minute, which is not good in my case, because consumer_app immediatley closes, exits, when it can't read any data for 20 seconds (it uploads data to a server, which closes the connection, when there is no data to be read for 20 seconds).
I think it should use STDIN.read_nonblock() call somehow, for example:
data=STDIN.read_nonblock(bufsize) rescue nil
returns nil into the "data" variable, when there is no data to read, else it returns some data, which size in bytes is <= than bufsize.
Of course we can assume, that incoming data stream byte count in total is greater than the above mentioned internal buffer size.

Related

Serial Port Communication Protocol

In serial communication with devices such as a digital Multimeter (ex. BK Precision 2831E), why do I need to send a query command once but read the output twice? For instance, I sent a query command for the voltage measured, and received an echo but no value of voltage.
I then sent the query command twice which returned the echo and the measured voltage. In essence, to read out the voltage measured, I had to send the same query command in succession twice.
I do not understand this concept. Can anyone kindly help me out with this reasoning.
I have attached a sample code here below:
def readoutmm(portnumber_multimeter):
import serial
import time
ser2 = serial.Serial(
port="com"+str(portnumber_multimeter),
baudrate=9600,
bytesize=serial.EIGHTBITS,
parity=serial.PARITY_NONE,
stopbits=serial.STOPBITS_ONE
)
ser2.write(b'fetc?\n') # Query command
voltage= ser2.readline() # Returns echo
voltage=ser2.readline() # Returns measured voltage
voltage=float(voltage)
ser2.close()
packet=[voltage]
return packet
This is actually quite common with devices based on RS232/RS485 protocols.
From the manual of the machine you mentioned, I quote:
The character received by the multimeter will be sent back to the controller again. The controller will
not send the next character until the last returned character is received correctly from the meter. If
the controller fails to receive the character sent back from the meter, the possible reasons are listed
as follows:
The serial interface is not connected correctly.
Check if the same baud rate is selected for both the meter and the controller.
When the meter is busy with executing a bus command, it will not accept any character from
the serial interface at the same time. So the character sent by controller will be ignored.
In order
to make sure the whole command is sent and received correctly, the character without a return character should be sent again by the controller.
On a lot of devices this is actually a setting which you can turn on and off.
Now, as for your question:
why do I need to send a query command once but read the output twice?
You are supposed to read every character back before sending a new one to validate if the character was received correctly. But in your code are actually sending all the character before reading a single one of them.
In scenario's where you have a reliable connection, you method will work as well, but as a consequence you'll need to read twice; once to validate if the command was received and the second time to retrieve the actual data.
Do keep in mind that read buffers might be limited to a certain amount. If you are experiencing unexpected behavior while querying large amount of data and sending a lot of commands, it might be due to the fact these buffers are full.

WinHttpWriteData completion

I'm using WinHTTP to transfer large files to a PHP-based web server and I want to display the progress and an estimated speed. After reading the docs I have decided to use chunked transfer encoding. The files get transferred correctly but there is an issue with estimating the time that I cannot solve.
I'm using a loop to send chunks with WinHttpWriteData (header+trailer+footer) and I compute the time difference between start and finish with GetTickCount. I have a fixed bandwidth of 4mbit configured on my router in order to test the correctness of my estimation.
The typical time difference for chunks of 256KB is between 450 - 550ms, which is correct. The problem is that once in a while (few seconds/tens of seconds) WinHttpWriteData returns really really fast, like 4-10ms, which is obviously not possible. The next difference is much higher than the average 500ms.
Why does WinHttpWriteData confirms, either synchronously or asynchronously that it has written the data to the destination when, in reality, the data is still being transferred ? Any ideas ?
Oversimplified, my code looks like:
while (dataLeft)
{
t1 = GetTickCount();
WinHttpWriteData(hRequest, chunkHdr, chunkHdrLen , NULL);
waitWriteConfirm();
WinHttpWriteData(hRequest, actualData, actualDataLen , NULL);
waitWriteConfirm();
WinHttpWriteData(hRequest, chunkFtr, chunkFtrLen , NULL);
waitWriteConfirm();
t2 = GetTickCount();
tdif= t2 - t1;
}
This is simply the nature of how sockets work in general.
Whether you call a lower level function like send() or a higher level function like WinHttpWriteData(), the functions return success/failure based on whether they are able to pass data to the underlying socket kernel to not. The kernel queues up data for eventual transmission in the background. The kernel does not report back when the data is actually transmitted, or if the receiver acks the data. The kernel happily accepts new data as long as there is room in the queue, even if it will take awhile to actually transmit. Otherwise, it will block the sender until room becomes available in the queue.
If you need to monitor actual transmission speed, you have to monitor the low level network activity directly, such as with a packet sniffer or driver hook. Otherwise, you can only monitor how fast you are able to pass data to the kernel (which is usually good enough for most purposes).

Problems receiving large amount of data through ruby TCP Socket

I am sending and receiving JSON data through a TCP socket. It works fine when it is smaller amounts of data, like 200 bytes or so. But when it gets to about 10 KB it only receives part of the data. I have tried all the different TCP socket retrieve data commands I can find (read, gets, gets.chomp, recv) but I cannot find one that will work for all of my tests.
Here is the code I have now:
socket = TCPSocket.new '10.11.50.xx', 13338
response = socket.recv(1000000000)
I have also tried adding a timeout but I could not get it to work:
socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, 1)
I am not sure what I am missing. Any help would be appreciated.
It's badly documented in the Ruby docs, but I think TCPSocket#recv actually just calls the recv system call. That one (see man 2 recv) reads a number of bytes from the stream that is determined by the kernel, though never more than the application specifies. To receive a larger "message", you will need to call it in a loop.
But there is an easier way: because TCPSocket indirectly inherits from the IO class, you get all of its methods for free, including IO#read which does read as many bytes as you specify (if possible).
You wil also need to implement a way to delimit your messages:
use fixed-length messages
send the length of the message up front in a (fixed-size) header
use some kind of terminator, e.g. a NULL byte

How do you know when all the data has been received by the Winsock control that has issued a POST or GET to an Web Server?

I'm using the VB6 Winsock control. When I do a POST to a server I get back the response as multiple Data arrival events.
How do you know when all the data has arrived?
(I'm guessing it's when the Winsock_Close event fires)
I have used VB6 Winsock controls in the past, and what I did was format my messages in a certain way to know when all the data has arrived.
Example: Each message starts with a "[" and ends with a "]".
"[Message Text]"
When data comes in from the DataArrival event check for the end of the message "]". If it is there you received at least one whole message, and possibly the start of a new one. If more of the message is waiting, store your message data in a form level variable and append to it when the DataArrival event fires the next time.
In HTTP, you have to parse and analyze the reply data that the server is sending back to you in order to know how to read it all.
First, the server sends back a list of CRLF-delimited header lines, which are terminated by a blank CRLF-delimited line by itself. You then have to look at the actual values of the 'Content-Length' and 'Transfer-Encoding' headers to know how to read the remaining data.
If there is no 'Transfer-Encoding' header, or if it does not contain a 'chunked' item in it, then the 'Content-Length' header specifies how many remaining bytes to read. But if the 'Transfer-Encoding' header contains a 'chunked' item, then you have to read and parse the remaining data in chunks, one at a time, in order to know when the data ends (each chunk reports its own size, and the last chunk reports a size of 0).
And no, you cannot rely on the connection being closed after the reply has been sent, unless the 'Connection' header explicitally says 'close'. For HTTP 1.1, that header is usually set to 'keep-alive' instead, which means the socket is left open so the client can send more requests on the same socket.
Read RFC 2616 for more details.
No, the Close event doesn't fire when all the data has arrived, it fires when you close the connection. It's not the Winsock control's job to know when all the data has been transmitted, it's yours. As part of your client/server communication protocol implementation, you have to tell the client what to expect.
Suppose your client wants the contents of a file from the server. The client doesn't know how much data is in the file. The exchange might go something like this:
client sends request for the data in the file
the server reads the file, determines the size, attaches the size to the beginning of the data (let's say it uses 4 bytes) that tells the client how much data to expect, and starts sending it
your client code knows to strip the first 4 bytes off any data that arrives after a file request and store it as the amount of data that is to follow, then accumulate the subsequent data, through any number of DataArrival events, until it has that amount
Ideally, the server would append a checksum to the data as well, and you'll have to implement some sort of timeout mechanism, figure out what to do if you don't get the expected amount of data, etc.

Optimally reading data from an Asynchronous Socket

I have a problem with a socket library that uses WSAASyncSelect to put the socket into asynchronous mode. In asynchronous mode the socket is placed into a non-blocking mode (WSAWOULDBLOCK is returned on any operations that would block) and windows messages are posted to a notification window to inform the application when the socket is ready to be read, written to etc.
My problem is this - when receiving a FD_READ event I don't know how many bytes to try and recv. If I pass a buffer thats too small, then winsock will automatically post another FD_READ event telling me theres more data to read. If data is arriving very fast, this can saturate the message queue with FD_READ messages, and as WM_TIMER and WM_PAINT messages are only posted when the message queue is empty this means that an application could stop painting if its receiving a lot of data and useing asynchronous sockets with a too small buffer.
How large to make the buffer then? I tried using ioctlsocket(FIONREAD) to get the number of bytes to read, and make a buffer exactly that large, BUT, KB192599 explicitly warns that that approach is fraught with inefficiency.
How do I pick a buffer size thats big enough, but not crazy big?
As far as I could ever work out, the value set using setsockopt with the SO_RVCBUF option is an upper bound on the FIONREAD value. So rather than call ioctlsocket it should be OK to call getsockopt to find out the SO_RCVBUF setting, and use that as the (attempted) value for each recv.
Based on your comment to Aviad P.'s answer, it sounds like this would solve your problem.
(Disclaimer: I have always used FIONREAD myself. But after reading the linked-to KB article I will probably be changing...)
You can set your buffer to be as big as you can without impacting performance, relying on the TCP PUSH flag to make your reads return before filling the buffer if the sender sent a smaller message.
The TCP PUSH flag is set at a logical message boundary (normally after a send operation, unless explicitly set to false). When the receiving end sees the PUSH flag on a TCP packet, it returns any blocking reads (or asynchronous reads, doesn't matter) with whatever's accumulated in the receive buffer up to the PUSH point.
So if your sender is sending reasonable sized messages, you're ok, if he's not, then you limit your buffer size such that even if you read into it all, you don't negatively impact performance (subjective).

Resources