Encoding of web socket channel and/or [encoding convertto...] messages? - websocket

I have data in SQLite that is text including a curly apostrophe (U-2019 &#8217) but the character is typed as in work’s. A browser makes the request for the data over a web socket connection which is configured as chan configure $sock -encoding iso8859-1 -translation crlf -buffering full simply because that is what I set it to before sending the headers to the browser in response to the request to upgrade the connection to a web socket and failed to consider it again thereafter.
I was mistaken it is endoded as chan configure $sock -buffering full -blocking 0 -translation binary.
If I use replace( ..., '’', '&#8217') in the SQL then the browser displays the text with the curly apostrophe rendered normally. And if I follow the example in this SO question and encode the result of the SQL query as set encoded [encoding convertto utf-8 $SQLResult] it renders correctly also.
What is the correct way to handle this? Should the SQL results be endoded to utf-8 or should the web socket be configured differently? If the socket is encoded as utf-8 then it fails on the first message sent from the browser/client.
Thank you.

The data in SQLite should be Unicode (probably UTF-8 but that shouldn't matter; the package binding handles that). That means that the data arrives in Tcl correctly encoded. Tcl's internal encoding schemes are complicated and should be ignored for the purposes of this discussion: pretend that's all Unicode too.
With a websocket, you need to say what sort of messages ("frames") you are sending. For normal payloads, they can be either binary frames or text frames. Binary frames are a bunch of bytes. Text frames are UTF-8 (as specified by the standard). STOMP frames are text frames where the content is JSON (with a few basic rules and some higher-order things over what operations can be stated).
If you are sending text, you should use UTF-8. Ideally by configuring the socket to do that in the part of the websocket client that mediates between the point that receives the text to send (maybe JSON) and the part that writes the frame onto the underlying socket itself. Above that point, you should just work with ordinary Tcl strings. If you're forming JSON messages in Tcl, the rl_json package is recommended, or you can get SQLite to produce JSON directly. (JSON's serialized form is always UTF-8, and it is conceptually Unicode.)
If you are sending binary frames, you handle all the details yourself. Using encoding convertto utf-8 may help with forming the message frame body, or you can use any other encoding scheme you prefer (such as the one you mention in your question).

Related

See raw bytes being sent in COSMOS

In COSMOS is there an easy way to see the raw bytes being sent over the line by the Command Sender? We want to capture what is sent out and don't know where all the protocol layers get added.
If you go to the Cmd Packets tab in the Command and Telemetry Server you can click View Raw on the command packet you're sending to see the raw bytes. Note that these are the raw bytes in the defined command. If your interface changes the data in some way (puts on a CRC, etc) then the bytes over the line may be different.
The following page describes the COSMOS log format: https://cosmosrb.com/docs/logging/
You can also add the LOG_RAW keyword after your interface definition to log all the raw data going out over the interface. Note that COSMOS does not interpret this log and you'll have to use a hex editor to view it.

Unable to send data in chunks to server in Golang

I'm completely new to Golang. I am trying to send a file from the client to the server. The client should split it into smaller chunks and send it to the rest end point exposed by the server. The server should combine those chunks and save it.
This is the client and server code I have written so far. When I run this to copy a file of size 39 bytes, the client is sending two requests to the server. But the server is displaying the following errors.
2017/05/30 20:19:28 Was not able to access the uploaded file: unexpected EOF
2017/05/30 20:19:28 Was not able to access the uploaded file: multipart: NextPart: EOF
You are dividing buffer with the file into separate chunks and sending each of them as separate HTTP message. This is not how multipart is intended to be used.
multipart MIME means that a single HTTP message may contain one or more entities, quoting HTTP RFC:
MIME provides for a number of "multipart" types -- encapsulations of
one or more entities within a single message-body. All multipart types
share a common syntax, as defined in section 5.1.1 of RFC 2046
You should send the whole file and send it in a single HTTP message (file contents should be a single entity). The HTTP protocol will take care of the rest but you may consider using FTP if the files you are planning to transfer are large (like > 2GB).
If you are using a multipart/form-data, then it is expected to take the entire file and send it up as a single byte stream. Go can handle multi-gigabyte files easily this way. But your code needs to be smart about this.
ioutil.ReadAll(r.Body) is out of the question unless you know for sure that the file will be very small. Please don't do this.
multipartReader, err := r.MultipartReader() use a multipart reader. This will iterate over uploading files, in the order they are included in the encoding. This is important, because you can keep the file entirely out of memory, and do a Copy from one filehandle to another. This is how large files are handled easily.
You will have issues with middle-boxes and reverse proxies. We have to change defaults in Nginx so that it will not cut off large files. Nginx (or whatever reverse-proxy you might use) will need to cooperate, as they often are going to default to some really tiny file size max like 300MB.
Even if you think you dealt with this issue on upload with some file part trick, you will then need to deal with large files on download. Go can do single large files very efficiently by doing a Copy from filehandle to filehandle. You will also end up needing to support partial content (http 206) and not modified (304) if you want great performance for downloading files that you uploaded. Some browsers will ignore your pleas to not ask for partial content when things like large video is involved. So, if you don't support this, then some content will fail to download.
If you want to use some tricks to cut up files and send them in parts, then you will end up needing to use a particular Javascript library. This is going to be quite harmful to interoperability if you are going for programmatic access from any client to your Go server. But maybe you can't fix middle-boxes that impose size limits, and you really want to cut files up into chunks. You will have a lot of work to handle downloading the files that you managed to upload in chunks.
What you are trying to do is the typical code that is written with a tcp connection with most other languages, in GO you can use tcp too with net.Listen and eventually accept on the listener object. Then this should be fine.

Binary mode in FTP?

The binary mode in FTP transfers data as raw bits. This raised a doubt for me recently when I was reading about Base64 encoding. The whole point of Base64 is to convert binary to text and it makes a lot of sense. Because the binary could be interpreted as some special character / command by some media in the middle.
So does the binary mode transfer raw binary? Or does it do some kind of encoding before transmission?
Contrary to telnet or SMTP (mail) connections the binary mode FTP data channel does not interpret any octets in a special way, so no encoding or escaping is necessary.

Should I use a binary or a text file for storing protobuf messages?

Using Google protobuf, I am saving my serialized messaged data to a file - in each file there are several messages. We have both C++ and Python versions of the code, so I need to use protobuf functions that are available in both languages. I have experimented with using SerializeToArray and SerializeAsString and there seems to be the following unfortunate conditions:
SerializeToArray: As suggested in one answer, the best way to use this is to prefix each message with it's data size. This would work great for C++, but in Python it doesn't look like this is possible - am I wrong?
SerializeAsString: This generates a serialized string equivalent to it's binary counterpart - which I can save to a file, but what happens if one of the characters in the serialization result is \n - how do we find line endings, or the ending of messages for that matter?
Update:
Please allow me to rephrase slightly. As I understand it, I cannot write binary data in C++ because then our Python application cannot read the data, since it can only parse string serialized messages. Should I then instead use SerializeAsString in both C++ and Python? If yes, then is it best practice to store such data in a text file rather than a binary file? My gut feeling is binary, but as you can see this doesn't look like an option.
We have had great success base64 encoding the messages, and using a simple \n to separate messages. This will ofcoirse depend a lot on your use - we need to store the messages in "log" files. It naturally has overhead encoding/decoding this - but this has not even remotely been an issue for us.
The advantage of keeping these messages as line separated text has so far been invaluable for maintenance and debugging. Figure out how many messages are in a file ? wc -l . Find the Nth message - head ... | tail. Figure out what's wrong with a record on a remote system you need to access through 2 VPNs and a citrix solution ? copy paste the message and mail it to the programmer.
The best practice for concatenating messages in this way is to prepend each message with its size. That way you read in the size (try a 32bit int or something), then read that number of bytes into a buffer and deserialize it. Then read the next size, etc. etc.
The same goes for writing, you first write out the size of the message, then the message itself.
See Streaming Multiple Messages on the protobuf documentation for more information.
Protobuf is a binary format, so reading and writing should be done as binary, not text.
If you don't want binary format, you should consider using something other than protobuf (there are lots of textual data formats, such as XML, JSON, CSV); just using text abstractions is not enough.

when using a FTPS connection to transfer a file, what is the difference between a 'Binary mode taransfer' and 'ASCII mode transfer'?

I am using a FTPS connection to send a text file
[this file will contain EDI(Electronic Data Interchange) information]to a mailbox INOVIS.I have configured the system to open a FTPS connection and using the PUT command I write the file to a folder on the FTP server.
The problem is: what mode of file transfer should I use? How do I switch between modes?
Moreover which mode is the 'best-practice' to use when transferring file over FTPS connection.
If some one can provide me a small ftp script it would be helpful.
Many of the other answers to this question are a collection of nearly correct to outright wrong information.
ASCII mode means that the file should be converted to canonical text form on the wire. Among other things this means:
NVT-ASCII character set. Even if the original file is in some other character set, such as ASCII, EBCDIC or UTF-8. Technically this disallows characters with the 8th bit set, but most implementations won't enforce this.
CRLF line endings.
EBCDIC mode means a similar set of rules, except that the data on the wire should be in EBCDIC.
LOCAL mode allows sending data with a size other than 8 bits per byte.
IMAGE (or BINARY) mode means that the data should be send without any changes. It is up to the user to ensure that the target system can understand the data once it arrives.
Among other things, this means that the recommendation to use BINARY mode to send text data will fail if one of the systems involved doesn't use a ASCII based character set.
ASCII mode changes new line characters between unix and DOS formats. \n to \r\n and viceversa.
Actually, ASCII/BINARY has nothing to do with the 8th bit. It's a convention for translating line endings.
When you are on a windows machine talking to a Unix FTP server (FTPS or FTP - doesn't matter - the protocol is the same), the server will replace any <CR><LF>-Combination with <LF> before storing the file and consequently do the translation in reverse in case you get the file from the unix server.
The idea behind ASCII mode is to convert the line endings to the respective endings of the target platform.
As todays world seems to be converging to the unix convention (<LF>) and as nearly all of todays editors (aside of notepad) can easily handle Unix-Line-Endings, the days of ASCII mode are, indeed, numbered and I would by all means recommend to always use BINARY transfer mode.
The prospect of having data altered in mid-transfer is somewhat frightening anyways.
ASCII mode also makes file sharing of text files across different platforms more straightforward for end users. They won't have to worry about the default line ending (cr/lf versus just lf for example) since the ASCII mode will do that translation for them on the fly.
For most file types you will ALWAYS want to use BINARY mode though.
ACSII mode converts text files between UNIX and Windows formats based on the server and client platform (CR/LF vs LF), Binary doesn't. Of course, if you transfer nearly anything in ASCII mode that isn't text, it will probably be corrupted for that reason.
If you want an exact copy the data use binary mode - using ascii mode will assume the data is 7bit text (chars 0-127) and truncate any data outside of this range. Dates back to arcane 7bit networking days where ascii mode could save you time.
In a globalized environment that we live in - such that it is quite common to find non-ascii characters e.g. foreign languages, currency symbols etc. - you should always use BINARY mode.
For the FTP protocol, the ASCII transfer mode will consider the 8th bit of each of your character as insignificant and will use it for error checking. As for binary transfer mode, your data will be sent as is. Note that sending binary data in ASCII mode will (almost) always end up in data corruption. However, transferring ASCII data in binary mode will work as long as the sending and receiving systems use the 8th bit in the same way (in modern system the 8th bit should stay at 0 to prevent collision with extended ASCII charsets).

Resources