Sending a UNICODE string to A16 COMS Mainframe via TCP/IP - utf-8

I need to send a UNICODE string message to A16 COMS (Mainframe) via TCP/IP. What algorithm do I need , what transformation of a string. String can contain one or more UNICODE Characters.
While sending ASCII only based string I convert(map) it to EBCDIC and send via TCP/IP connection. I know that EBCDIC doesn't handle UNICODE Character. Besides, I can send via TCP IP only byte array, where in case of ASCII string one character maps to one array cell. In the case of UNICODE character - it can occupy from 1 to 4 byte array cells.
The question is how do I send the UNICODE containing string to A16 Mainframe.
Further clarification:
When I run the code, the TCP client cannot receive any response. It passes timeout and gives an error. Increasing timeout does not help. C# can convert an UNC string to UTF-8 either using System.Text.Encoding or even with an algorithm - almost manually. Those are not a problem. Problem is that A16 COMS expects “one character = one byte”, (mapped to EBCDIC). And with UTF-8 one character may occupy 2, 3 or 4 cells of an array. Now EBCDIC mapping itself does not help, because EBCDIC is designed to work with non-unicode (ASCII based) strings.
I hope that someone whoever did this at some point in his career might read my post because not much can be done by figuring out. Can it be done with TCP Client and its NetworkStream? Send method has only array of bytes in its signature, but with utf-8 array of bytes can be so much longer than the limit.
It is a question asking to share experience, not knowledge.

Related

How do I interpret a python byte string coming from F1 2020 game UDP packet?

Title may be wildly incorrect for what I'm trying to work out.
I'm trying to interpret packets I am recieving from a racing game in a way that I understand, but I honestly don't really know what I'm looking at, or what to search to understand it.
Information on the packets I am recieving here:
https://forums.codemasters.com/topic/54423-f1%C2%AE-2020-udp-specification/?tab=comments#comment-532560
I'm using python to print the packets, here's a snippet of the output, which I don't understand how to interpret.
received message: b'\xe4\x07\x01\x03\x01\x07O\x90.\xea\xc2!7\x16\xa5\xbb\x02C\xda\n\x00\x00\x00\xff\x01\x00\x03:\x00\x00\x00 A\x00\x00\xdcB\xb5+\xc1#\xc82\xcc\x10\t\x00\xd9\x00\x00\x00\x00\x00\x12\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00$tJ\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01
I'm very new to coding, and not sure what my next step is, so a nudge in the right direction will help loads, thanks.
This is the python code:
import socket
UDP_IP = "127.0.0.1"
UDP_PORT = 20777
sock = socket.socket(socket.AF_INET, # Internet
socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))
while True:
data, addr = sock.recvfrom(4096)
print ("received message:", data)
The website you link to is describing the data format. All data represented as a series of 1's and 0's. A byte is a series of 8 1's and 0's. However, just because you have a series of bytes doesn't mean you know how to interpret them. Do they represent a character? An integer? Can that integer be negative? All of that is defined by whoever crafted the data in the first place.
The type descriptions you see at the top are telling you how to actually interpret that series of 1's and 0's. When you see "unit8", that is an "unsigned integer that is 8 bits (1 byte) long". In other words, a positive number between 0 and 255. An "int8" on the other hand is an "8-bit integer", or a number that can be positive or negative (so the range is -128 to 127). The same basic idea applies to the *16 and *64 variants, just with 16 bits or 64 bits. A float represent a floating point number (a number with a fractional part, such as 1.2345), generally 4 bytes long. Additionally, you need to know the order to interpret the bytes within a word (left-to-right or right-to-left). This is referred to as the endianness, and every computer architecture has a native endianness (big-endian or little-endian).
Given all of that, you can interpret the PacketHeader. The easiest way is probably to use the struct package in Python. Details can be found here:
https://docs.python.org/3/library/struct.html
As a proof of concept, the following will interpret the first 24 bytes:
import struct
data = b'\xe4\x07\x01\x03\x01\x07O\x90.\xea\xc2!7\x16\xa5\xbb\x02C\xda\n\x00\x00\x00\xff\x01\x00\x03:\x00\x00\x00 A\x00\x00\xdcB\xb5+\xc1#\xc82\xcc\x10\t\x00\xd9\x00\x00\x00\x00\x00\x12\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00$tJ\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
#Note that I am only taking the first 24 bytes. You must pass data that is
#the appropriate length to the unpack function. We don't know what everything
#else is until after we parse out the header
header = struct.unpack('<HBBBBQfIBB', data[:24])
print(header)
You basically want to read the first 24 bytes to get the header of the message. From there, you need to use the m_packetId field to determine what the rest of the message is. As an example, this particular packet has a packetId of 7, which is a "Car Status" packet. So you would look at the packing format for the struct CarStatus further down on that page to figure out how to interpret the rest of the message. Rinse and repeat as data arrives.
Update: In the format string, the < tells you to interpret the bytes as little-endian with no alignment (based on the fact that the documentation says it is little-endian and packed). I would recommend reading through the entire section on Format Characters in the documentation above to fully understand what all is happening regarding alignment, but in a nutshell it will try to align those bytes with their representation in memory, which may not match exactly the format you specify. In this case, HBBBBQ takes up 2 bytes more than you'd expect. This is because your computer will try to pack structs in memory so that they are word-aligned. Your computer architecture determines the word alignment (on a 64-bit computer, words are 64-bits, or 8 bytes, long). A Q takes a full word, so the packer will try to align everything before the Q to a word. However, HBBBB only requires 6 bytes; so, Python will, by default, pad an extra 2 bytes to make sure everything lines up. Using < at the front both ensures that the bytes will be interpreted in the correct order, and that it won't try to align the bytes.
Just for information if someone else is looking for this. In python there is the library f1-2019-telemetry existing. On the documentation, there is a missing part about the "how to use" so here is a snippet:
from f1_2020_telemetry.packets import *
...
udp_socket = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
udp_socket.bind((host, port))
while True:
udp_packet = udp_socket.recv(2048)
packet = unpack_udp_packet(udp_packet)
if isinstance(packet, PacketSessionData_V1): # refer to doc for classes / attribute
print(packet.trackTemperature) # for example
if isinstance(packet, PacketParticipantsData_V1):
for i, participant in enumerate(packet.participants):
print(DriverIDs[participant.driverId]) # the library has some mapping for pilot name / track name / ...
Regards,
Nicolas

Concatenated SMS extended symbols at segments border - what is correct split method?

For concatenated SMS messages (in GSM encoding), if extended table symbol (one of these: }{[]|~^\€) is placed at the end of segment, what is correct way to split such message:
Leave first byte of symbol (0b) at the end of segment and put second byte to the beginning of next one, and so fill all available bytes of UD (which seems logically correct)
OR
Move whole symbol bytes to the next segment and leave unused byte at the end?
I didn't found any clarification neither in SMPP 3.4 specs or implementation guide nor in GSM 03.38 specs, so assume that method selection is up to content provider or sending software.
When you are using a 7-bit encoding, split when you reach 153 characters (whether that is in-between the 1B and the next character or not).
I can't think of any reason to prefer option 2 (leave unused byte at the end).
The message should not be rendered unless it's completely received by the subscriber's device. Furthermore, nothing is even represented by 1B septet in GSM7, so even if a device tried to render something unexpected it would fail.

Windows DHCP client hostname encoding

Recently I have been trying to save list of hostnames from captured DHCP packets. I have found out, every DHCP hostname (option 12) should have form defined in RFC 1035. So if I understand it correctly, hostname should be encoded in 7-bit ASCII and have other restrictions like:
- name should not start with digit and should omit some forbidden characters.
Almost every device I have encountered in packets fulfill this constraint, but not Windows devices (Vendor ID MSFT 5.0). IMHO Windows DHCP client takes computer (mobile) name and fill it in hostname option.
Problem occurs, when computer name is set for example to "Lukáš-PC". Wireshark display this hostname as Luk\240\347-PC. (240 and 347 are numbers in octal). To see for myself I have printed values in packets with printf("%hhu", c) (C language).
á = 160
š = 231
IMHO I think this is simple char variable overflow. I tried deduce original value from overflow value, but I haven't found any relation between character and known encodings. So my questions are:
Is there any way to convert these values back to original?
If yes, what was original character encoding, when overflow happened?
Thanks.
Default char is usually signed, and extends to int when passed to a variadic function. To ensure that it is printed unsigned, use printf("%hhu", c) or printf("%d", (unsigned char)c);.
The correct encoding is impossible to know because it depends on each system's settings.
Note that any compliant systems MUST encode names according to RFC 3490, but Windows seems to enjoy violating standards.
The characters á and š that you are seing are encoded using code page 852 (Latin-2 - Central European languages).
Unfortunately there is no simple way how you can figure out the encoding used only by looking at DHCP requests. In principle the DHCP client can use any code page it wants. If you are working in a private/controlled network, then it is probably safe to assume the all clients are using the same code page and explicitly encode the strings using that particular code page.

when ParseFromArray return true in protocol buffer

I ParseFromArray the protocol buffer's protocol, the protocol is not lack any filed. But the ParseFromArray function returns false. Why?
I'm assuming you are using C++. ParseFromArray() fails if:
The input data is not in valid protobuf format.
The input data is lacking a required field.
If you are sure that all required fields are set, then it must be the case that your input data is corrupted. You should verify that the bytes and size you are passing into ParseFromArray() are exactly the bytes and size that you got from SerializeToArray() and ByteSize() on the sending side. You will probably find that you are losing some bytes somewhere, or that some bytes got corrupted.
Common reasons for corruption include:
Passing the encoded bytes over a text-only channel. E.g. if you write the data to (or read it from) a file that is not opened in "binary" mode, or if you at some point store the bytes in a Java String, the data will become corrupted, as these channels expect text, and encoded protobufs are not text.
Passing the bytes as a char*, i.e. assuming NUL-termination. Encoded protobufs can contain '\0' bytes, meaning that you cannot represent one as a char* alone -- you must include the size separately.
Serializing to an array that is larger than needed, and then forgetting to pay attention to how much data was actually written. When you call SerializeToArray(), you must also call ByteSize() to see how large the message is, and you must make sure the receiving end receives that size and passes it to ParseFromArray(). Otherwise, the parser will think that the extra bytes at the end of the buffer are part of the message, and will fail to parse them.

Serial Port Data Structure

I need to send data to a hardware device over serial port. I'm using a program called serial port tool for os x.
After I connect to the device there is a form box where I can type data to send. I have no idea how to format the data.
Here is an excerpt from the manual for the device.
"The Net Manager Command structure consists of one start byte, one command byte, five bytes of data, and a one byte checksum. Each message packet is formatted as follows:"
an example command is:
Byte0=30 Byte1=7 Byte2=5 Byte3=1 Byte4=2 Byte5=0 Byte6=245
How do I type that into the form box in serial port tool?
Thanks,
Seth
Does the "serial port tool" you're using come with any documentation?
Assuming the "form box" is expecting printable characters, what you're looking for is a way to input an arbitrary byte value. For example, there might be a mechanism that lets you use an octal or hexadecimal escape sequence (such as \036 or \24).

Resources