I want to send messages between Ruby processes via TCP without using ending chars that could restrict the potential message content. That rules out the naïve socket.puts/gets approach.
Is there a basic TCP message implementation somewhere in the standard libs?.
(I'd like to avoid Drb to keep everything simple.)
It seems like there is no canonical, reusable solution.
So here's a basic implementation for the archives:
module Messaging
# Assumes 'msg' is single-byte encoded
# and not larger than 4,3 GB ((2**(4*8)-1) bytes)
def dispatch(msg)
write([msg.length].pack('N') + msg)
end
def receive
if (message_size = read(4)) # sizeof (N)
message_size = message_size.unpack('N')[0]
read(message_size)
end
end
end
# usage
message_hub = TCPSocket.new('localhost', 1234).extend(Messaging)
The usual way to send strings in that situation is to send an integer (encoded however you like) for the size of the string, followed by that many bytes. You can save space but still allow arbitrary sizes by using a UTF-8-like scheme for that integer.
Related
Title may be wildly incorrect for what I'm trying to work out.
I'm trying to interpret packets I am recieving from a racing game in a way that I understand, but I honestly don't really know what I'm looking at, or what to search to understand it.
Information on the packets I am recieving here:
https://forums.codemasters.com/topic/54423-f1%C2%AE-2020-udp-specification/?tab=comments#comment-532560
I'm using python to print the packets, here's a snippet of the output, which I don't understand how to interpret.
received message: b'\xe4\x07\x01\x03\x01\x07O\x90.\xea\xc2!7\x16\xa5\xbb\x02C\xda\n\x00\x00\x00\xff\x01\x00\x03:\x00\x00\x00 A\x00\x00\xdcB\xb5+\xc1#\xc82\xcc\x10\t\x00\xd9\x00\x00\x00\x00\x00\x12\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00$tJ\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01
I'm very new to coding, and not sure what my next step is, so a nudge in the right direction will help loads, thanks.
This is the python code:
import socket
UDP_IP = "127.0.0.1"
UDP_PORT = 20777
sock = socket.socket(socket.AF_INET, # Internet
socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))
while True:
data, addr = sock.recvfrom(4096)
print ("received message:", data)
The website you link to is describing the data format. All data represented as a series of 1's and 0's. A byte is a series of 8 1's and 0's. However, just because you have a series of bytes doesn't mean you know how to interpret them. Do they represent a character? An integer? Can that integer be negative? All of that is defined by whoever crafted the data in the first place.
The type descriptions you see at the top are telling you how to actually interpret that series of 1's and 0's. When you see "unit8", that is an "unsigned integer that is 8 bits (1 byte) long". In other words, a positive number between 0 and 255. An "int8" on the other hand is an "8-bit integer", or a number that can be positive or negative (so the range is -128 to 127). The same basic idea applies to the *16 and *64 variants, just with 16 bits or 64 bits. A float represent a floating point number (a number with a fractional part, such as 1.2345), generally 4 bytes long. Additionally, you need to know the order to interpret the bytes within a word (left-to-right or right-to-left). This is referred to as the endianness, and every computer architecture has a native endianness (big-endian or little-endian).
Given all of that, you can interpret the PacketHeader. The easiest way is probably to use the struct package in Python. Details can be found here:
https://docs.python.org/3/library/struct.html
As a proof of concept, the following will interpret the first 24 bytes:
import struct
data = b'\xe4\x07\x01\x03\x01\x07O\x90.\xea\xc2!7\x16\xa5\xbb\x02C\xda\n\x00\x00\x00\xff\x01\x00\x03:\x00\x00\x00 A\x00\x00\xdcB\xb5+\xc1#\xc82\xcc\x10\t\x00\xd9\x00\x00\x00\x00\x00\x12\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00$tJ\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
#Note that I am only taking the first 24 bytes. You must pass data that is
#the appropriate length to the unpack function. We don't know what everything
#else is until after we parse out the header
header = struct.unpack('<HBBBBQfIBB', data[:24])
print(header)
You basically want to read the first 24 bytes to get the header of the message. From there, you need to use the m_packetId field to determine what the rest of the message is. As an example, this particular packet has a packetId of 7, which is a "Car Status" packet. So you would look at the packing format for the struct CarStatus further down on that page to figure out how to interpret the rest of the message. Rinse and repeat as data arrives.
Update: In the format string, the < tells you to interpret the bytes as little-endian with no alignment (based on the fact that the documentation says it is little-endian and packed). I would recommend reading through the entire section on Format Characters in the documentation above to fully understand what all is happening regarding alignment, but in a nutshell it will try to align those bytes with their representation in memory, which may not match exactly the format you specify. In this case, HBBBBQ takes up 2 bytes more than you'd expect. This is because your computer will try to pack structs in memory so that they are word-aligned. Your computer architecture determines the word alignment (on a 64-bit computer, words are 64-bits, or 8 bytes, long). A Q takes a full word, so the packer will try to align everything before the Q to a word. However, HBBBB only requires 6 bytes; so, Python will, by default, pad an extra 2 bytes to make sure everything lines up. Using < at the front both ensures that the bytes will be interpreted in the correct order, and that it won't try to align the bytes.
Just for information if someone else is looking for this. In python there is the library f1-2019-telemetry existing. On the documentation, there is a missing part about the "how to use" so here is a snippet:
from f1_2020_telemetry.packets import *
...
udp_socket = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
udp_socket.bind((host, port))
while True:
udp_packet = udp_socket.recv(2048)
packet = unpack_udp_packet(udp_packet)
if isinstance(packet, PacketSessionData_V1): # refer to doc for classes / attribute
print(packet.trackTemperature) # for example
if isinstance(packet, PacketParticipantsData_V1):
for i, participant in enumerate(packet.participants):
print(DriverIDs[participant.driverId]) # the library has some mapping for pilot name / track name / ...
Regards,
Nicolas
I have a socket server listening for UDP packets from a GSM device. Some of the data comes in as multibytes, such as time, which requires multibytes for accuracy. This is an example:
179,248,164,14
The bytes are represented in decimal notation. My goal is to convert that into seconds:
245692595
I am trying to do that and was told:
"You must take those 4 bytes and place them into a single long integer in little endian format. If you are using Python to read and encode the data, you will need to look at using the .read() and struct.unpack() methods to successfully convert it to an integer. The resulting value is the number of seconds since 01/01/2000."
So, I tried to do this:
%w(179 248 164 14).sort.map(&:to_i).inject(&:+)
=> 605
And I obviously am getting the wrong answer.
You should use the pack and unpack methods to do this:
[179,248,164,14].pack('C*').unpack('I')[0]
# => 245692595
It's not about adding them together, though. You're doing the math wrong. The correct way to do it with inject is this:
[179,248,164,14].reverse.inject { |s,v| s * 256 + v }
# => 245692595
Note you will have to account for byte ordering when representing binary numbers that are more than one byte long.
If the original data is already a binary string you won't have to perform the pack operation and can proceed directly to the unpack phase.
Is it possible to design a streaming JSON algorithm that writes JSON directly to a socket with the following properties:
* can only write to, but cannot delete or seek within the stream
* does not use either an IMPLICIT or explicit stack
* uses only a constant amount of memory stack depth no matter how deep the object nesting within the json
{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{...}}}}}}}}}}}}}}}}}
Short answer: No.
Slightly longer: At least not in the general case. If you could guarantee that the nesting has no branching, you could use a simple counter to close the braces at the end.
No, because you could use such a program to compress infinite amounts of memory into a finite space.
Encoding implementation:
input = read('LIBRARY_OF_CONGRESS.tar.bz2')
input_binary = convert_to_binary(input)
json_opening = replace({'0':'[', '1':'{'}, input_binary)
your_program <INPUTPIPE >/dev/null
INPUTPIPE << json_opening
Execute the above program then clone virtual machine it is running on. That is your finite-space compressed version of the infinitely-large input data set. Then to decode...
Decoding implementation:
set_output_pipe(your_program, OUTPUTPIPE)
INPUTPIPE << EOL
json_closing << OUTPUTPIPE
output_binary = replace({']':'0', '}':'1'}, reverse(json_closing))
output = convert_from_binary(output_binary)
write(output, 'LIBRARY_OF_CONGRESS-copy.tar.bz2')
And of course, all good code should have a test case...
Test case:
bc 'LIBRARY_OF_CONGRESS.tar.bz2' 'LIBRARY_OF_CONGRESS-copy.tar.bz2'
I have a problem - I don't know the amount of data being sent to my UDP server.
The current code is this - testing in irb:
require 'sockets'
sock = UDPSocket.new
sock.bind('0.0.0.0',41588)
sock.read # Returns nothing
sock.recvfrom(1024) # Requires length of data to be read - I don't know this
I could set recvfrom to 65535 or some other large number but this seems like an unnecessary hack.
recvfrom and recvfrom_nonblock both throw away anything after that length specified.
Am I setting the socket up incorrectly?
Note that UDP is a datagram protocol, not a stream like TCP. Each read from UDP socket dequeues one full datagram. You might pass these flags to recvfrom(2):
MSG_PEEK
This flag causes the receive operation to return
data from the beginning of the receive queue without
removing that data from the queue. Thus, a subsequent
receive call will return the same data.
MSG_WAITALL
This flag requests that the operation block until the
full request is satisfied. However, the call may still
return less data than requested if a signal is caught,
an error or disconnect occurs, or the next data to be
received is of a different type than that returned.
MSG_TRUNC
Return the real length of the packet, even when it was
longer than the passed buffer. Only valid for packet sockets.
If you really don't know how large of a packet you might get (protocol limit is 65507 bytes, see here) and don't care about doubling the number of system calls, do the MSG_PEEK first, then read exact number of bytes from the socket.
Or you can set an approximate max buffer size, say 4096, then use MSG_TRUNC to check if you lost any data.
Also note that UDP datagrams are rarely larger then 1472 - ethernet data size of 1500 minus 20 bytes of IPv4 header minus 8 bytes of UDP header - nobody likes fragmentation.
Edit:
Socket::MSG_PEEK is there, for others you can use integer values:
MSG_TRUNC 0x20
MSG_WAITALL 0x100
Look into your system headers (/usr/include/bits/socket.h on Linux) to be sure.
Looking at the documentation for Ruby's recvfrom(), the argument is a maximum length. Just provide 65535 (max length of a UDP datagram); the returned data should be the sent datagram of whatever size it happens to be, and you should be able to determine the size of it the way you would for any stringlike thing in Ruby.
i'm still kind of newbie to ruby, but i became used on using ruby sockets, cause i used the 'Socket' class many times, and so i having full control to my sockets and their options.
the thing is i'm trying to make it a little easy for me, so i'm trying to use the 'TCPSocket' class ,witch (i guess) doesn't give u much control as the 'Socket' class.
my script looks like this:
require 'socket'
client = TCPSocket.open('5.5.5.5', '5555')
client.send("msg", 0) # 0 means standard packet
client.close
the question is, what is suppose to be instead of the '0' on the send line ?? and if '0' means standard packet, what other than standard can exist there, is it some control over the TCP packet ?? if so, then it will be much easier for me than writing the whole socket by hand using the 'Socket' class.
The second parameter to send is the flags parameter. It gets passed onto the the send system call. You'll normally want to leave this at 0.
On my system, according to the man page, the other possible flags are:
#define MSG_OOB 0x1 /* process out-of-band data */
#define MSG_DONTROUTE 0x4 /* bypass routing, use direct interface */