Is Binary Opcode encoding and decoding implementation specific in websockets?

Is Binary Opcode encoding and decoding implementation specific in websockets? - websocket

Suppose I am creating a websocket client. And a specific websocket url returns frame as 'Binary Frame (Opcode 2)' .The questions are 1. Why would the developer want to wrap the original message inside a binary opcode frame?
2. Is retrieving the message implementation centric? In another way, does the the client has to know the same logic that was used to encode at the server? 3. If the above is false then is there a global way to decode/parse the binary opcode to see the actual data that is being sent?

Handling Websocket frames
A Websocket Frames basically look like this:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Explanation
FIN
The FIN bit tells you if this is all the Data you will receive. Large Websocket messages can be fragmented (Send via multiple frames)
RSV1-3
This are reserved bits, they may be used later
opcode
This can be used to determine what type of frame you received
Possible options are
0 (Continue)
If this frame is a part, not the first part of a fragmented message
1 (Text)
Normal UTF-8 Encoded Text data
2 (Binary)
Binary data
8 (Close)
This closes the Websocket
9 (Ping)
Server pings client to check if client is still reachable
10 (Pong)
Client responds with a Pong and the Ping Data to validate its reachability
MASK
Most significant bit of the 2nd byte, tells you if the payload has been masked. A Server must not mask any frame!
Payload Length (This is where things can get complicated)
Take the 2nd byte and read every bit except the Most significant bit
Byte is 125 or fewer that's your length
Byte is 126
Your length is an uint16 of byte 3 and 4
Byte is 127
Your length is a uint64 of byte 3 to 8
Masking key
Only exists if the MASK bit is set
The next 4 bytes is the masking key, this key is used to decode the payload
Payload
This is not the whole payload if the FIN bit is set
Payload can be decoded either as Text (UTF-8) or Binary (Can be any data)
The payload needs to be masked if the MASK bit is set
Steps to decode a Websocket Frame
Get the FIN bit
Get the first byte and & it with 127, the result is your FIN bit
Get the OpCode (The OpCode tells you what type of frame you have)
Get the first byte and & it with 15, the result is your opcode
Get the MASK Bit
Get the second byte and & it with 127, if it is 127 you have a masking key
Get the payload length
Take the 2nd byte and read every bit except the Most significant bit
Byte is 125 or less thats your length
Byte is 126
Your length is an uint16 of byte 3 and 4
Byte is 127
Your length is a uint64 of byte 3 to 8
Get the masking key
The next 4 bytes is the masking key, this key is used to decode the payload
Payload
The payload is length big and is starting from the masking key
If the MASK bit is set the payload needs to be masked
To demask the payload you just need to xor operation on every byte with the masking key on index count modulo 4
int count = 0;
for (int i = dataIndex; i < totalLength; i++)
{
frameData[i] = (byte)(frameData[i] ^ key[count % 4]);
count++;
}
Interpret the data
Text
Data needs to be interpreted as UTF-8 Text
Binary
This data needs to be interpreted depending on what data you await, this can be achieved by adding a byte to the start of the payload and then interpret it accordingly.

Related

hpack encoding integer significance

After reading this, https://httpwg.org/specs/rfc7541.html#integer.representation
I am confused about quite a few things, although I seem to have the overall gist of the idea.
For one, What are the 'prefixes' exactly/what is their purpose?
For two:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
What are the leading Xs? What is the starting 0 for?
>>> bin(10)
'0b1010'
>>>
Typing this in the python IDE, you see almost the same output... Why does it differ?
This is when the number fits within the number of prefix bits though, making it seemingly simple.
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
The octet-like diagram shows three different numbers being produced... Since the numbers are produced throughout the loop, how do you replicate this octet-like diagram within an integer? What is the actual final result? The diagram or "I" being 10, or 00001010.
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
As im trying to figure this out, I wrote a quick python implementation of it, It seems that i am left with a few useless variables, one being g which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?

For one, What are the 'prefixes' exactly/what is their purpose?
Integers are used in a few places in HPACK messages and often they have leading bits that cannot be used to for the actual integer. Therefore, there will often be a few leading digits that will be unavailable to use for the integer itself. They are represented by the X. For the purposes of this calculation it doesn't make what those Xs are: could be 000, or 111, or 010 or...etc. Also, there will not always be 3 Xs - that is just an example. There could only be one leading X, or two, or four...etc.
For example, to look up a previous HPACK decoded header, we use 6.1. Indexed Header Field Representation which starts with a leading 1, followed by the table index value. Therefore that 1 is the X in the previous example. We have 7-bits (instead of only 5-bits in the original example in your question). If the table index value is 127 or less we can represent it using those 7-bits. If it's >= 127 then we need to do some extra work (we'll come back to this).
If it's a new value we want to add to the table (to reuse in future requests), but we already have that header name in the table (so it's just a new value for that name we want as a new entry) then we use 6.2.1. Literal Header Field with Incremental Indexing. This has 2 bits at the beginning (01 - which are the Xs), and we only have 6-bits this time to represent the index of the name we want to reuse. So in this case there are two Xs.
So don't worry about there being 3 Xs - that's just an example. In the above examples there was one X (as first bit had to be 1), and two Xs (as first two bits had to be 01) respectively. The Integer Representation section is telling you how to handle any prefixed integer, whether prefixed by 1, 2, 3... etc unusable "X" bits.
What are the leading Xs? What is the starting 0 for?
The leading Xs are discussed above. The starting 0 is just because, in this example we have 5-bits to represent the integers and only need 4-bits. So we pad it with 0. If the value to encode was 20 it would be 10100. If the value was 40, we couldn't fit it in 5-bits so need to do something else.
Typing this in the python IDE, you see almost the same output... Why does it differ?
Python uses 0b to show it's a binary number. It doesn't bother showing any leading zeros. So 0b1010 is the same as 0b01010 and also the same as 0b00001010.
This is when the number fits within the number of prefix bits though, making it seemingly simple.
Exactly. If you need more than the number of bits you have, you don't have space for it. You can't just use more bits as HPACK will not know whether you are intending to use more bits (so should look at next byte) or if it's just a straight number (so only look at this one byte). It needs a signal to know that. That signal is using all 1s.
So to encode 40 in 5 bits, we need to use 11111 to say "it's not big enough", overflow to next byte. 11111 in binary is 31, so we know it's bigger than that, so we'll not waste that, and instead use it, and subtract it from the 40 to give 9 left to encode in the next byte. A new additional byte gives us 8 new bits to play with (well actually only 7 as we'll soon discover, as the first bit is used to signal a further overflow). This is enough so we can use 00001001 to encode our 9. So our complex number is represented in two bytes: XXX11111 and 00001001.
If we want to encode a value bigger than can fix in the first prefixed bit, AND the left over is bigger than 127 that would fit into the available 7 bits of the second byte, then we can't use this overflow mechanism using two bytes. Instead we use another "overflow, overflow" mechanism using three bytes:
For this "overflow, overflow" mechanism, we set the first byte bits to 1s as usual for an overflow (XXX11111) and then set the first bit of the second byte to 1. This leaves 7 bits available to encode the value, plus the next 8 bits in the third byte we're going to have to use (actually only 7 bits of the third byte, because again it uses the first bit to indicate another overflow).
There's various ways they could go have gone about this using the second and third bytes. What they decided to do was encode this as two numbers: the 128 mod, and the 128 multiplier.
1337 = 31 + (128 * 10) + 26
So that means the frist byte is set to 31 as per pervious example, the second byte is set to 26 (which is 11010) plus the leading 1 to show we're using the overflow overflow method (so 100011010), and the third byte is set to 10 (or 00001010).
So 1337 is encoded in three bytes: XXX11111 100011010 00001010 (including setting X to whatever those values were).
Using 128 mod and multiplier is quite efficient and means this large number (and in fact any number up to 16,383) can be represented in three bytes which is, not uncoincidentally, also the max integer that can be represented in 7 + 7 = 14 bits). But it does take a bit of getting your head around!
If it's bigger than 16,383 then we need to do another round of overflow in a similar manner.
All this seems horrendously complex but is actually relatively simply, and efficiently, coded up. Computers can do this pretty easily and quickly.
It seems that i am left with a few useless variables, one being g
You are not print this value in the if statement. Only the left over value in the else. You need to print both.
which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
Exactly, it's because the first bit (value 128) needs to be set as per explanation above, to show we are continuing/overflowing into needing a third byte.

Direct mapped cache example

i am really confused on the topic Direct Mapped Cache i've been looking around for an example with a good explanation and it's making me more confused then ever.
For example: I have
2048 byte memory
64 byte big cache
8 byte cache lines
with direct mapped cache how do i determine the 'LINE' 'TAG' and "Byte offset'?
i believe that the total number of addressing bits is 11 bits because 2048 = 2^11
2048/64 = 2^5 = 32 blocks (0 to 31) (5bits needed) (tag)
64/8 = 8 = 2^3 = 3 bits for the index
8 byte cache lines = 2^3 which means i need 3 bits for the byte offset
so the addres would be like this: 5 for the tag, 3 for the index and 3 for the byte offset
Do i have this figured out correctly?

Do i figured out correctly? YES
Explanation
1) Main memmory size is 2048 bytes = 211. So you need 11 bits to address a byte (If your word size is 1 byte) [word = smallest individual unit that will be accessed with the address]
2) You can calculating tag bits in direct mapping by doing (main memmory size / cash size). But i will explain a little more about tag bits.
Here the size of a cashe line( which is always same as size of a main memmory block) is 8 bytes. which is 23 bytes. So you need 3 bits to represent a byte within a cashe line. Now you have 8 bits (11 - 3) are remaining in the address.
Now the total number of lines present in the cache is (cashe size / line size) = 26 / 23 = 23
So, you have 3 bits to represent the line in which the your required byte is present.
The number of remaining bits now are 5 (8 - 3).
These 5 bits can be used to represent a tag. :)
3) 3 bit for index. If you were trying to label the number of bits needed to represent a line as index. Yes you are right.
4) 3 bits will be used to access a byte withing a cache line. (8 = 23)
So,
11 bits total address length = 5 tag bits + 3 bits to represent a line + 3 bits to represent a byte(word) withing a line
Hope there is no confusion now.

Questions about websocket framing

According to the RFC 6455 specification about websocket's.
Data frame structure is follows:
frame-fin ; 1 bit in length
frame-rsv1 ; 1 bit in length
frame-rsv2 ; 1 bit in length
frame-rsv3 ; 1 bit in length
frame-opcode ; 4 bits in length
frame-masked ; 1 bit in length
frame-payload-length ; either 7, 7+16,
; or 7+64 bits in
; length
[ frame-masking-key ] ; 32 bits in length
frame-payload-data ; n*8 bits in
; length, where
; n >= 0
So the minimum length of byte array to hold a frame would be 224 bytes (56 bits)? As I read on internet to represent a bit in byte array we need 4 bytes (1000).
How do I mask data? And what data should I mask? Only frame-payload-data or all the frame except the mask key?

The frame-masking-key field is only present when the frame is masked, which is only done for frames sent by a client to a server. And the frame-payload-data is optional; a frame may be empty, containing no data. Therefore the minimum length of a frame in the client-to-server direction is (1+1+1+1+4+1+7+32)=48 bits or 6 bytes, and the minimum length of a frame in the server-to-client direction is (1+1+1+1+4+1+7)=16 bits or 2 bytes.
Those would be frames that carry no payload. Obviously frames that carry payload data will require additional space.
As I read on internet to represent a bit in byte array we need 4 bytes
(1000).
Umm, no, each byte holds 8 bits. It might be convenient within a program to use larger data units to represent bit values, but that is completely independent of the format that is used in the actual frame.
How do I mask data? And what data should I mask? Only frame-payload-data
or all the frame except the mask key?
You mask by XOR-ing the frame-masking-key over the frame-payload-data. This is described in section 5.3 of RFC 6455.

Character length to be expected in Laravel 5 Crypt function [duplicate]

This question already has answers here:
Laravel AES-256 Encryption & MySQL
(2 answers)
Closed 2 years ago.
Just a quick question if I'm using the Laravel 5 Crypt::encrypt() function and I would like to save it into a database, how many characters am i expecting? Does the character length depends on the length of my message or would it be at a fixed length?
Currently I am using varchar 255 in my database and from time to time there are missing characters here and there thus causing problems during decryption.
Thank You

From the official Laravel documentation:
Laravel provides facilities for strong AES encryption via the Mcrypt
PHP extension.
From official PHP documentation using mcrypt_generic.
If you want to store the encrypted data in a database make sure to
store the entire string as returned by mcrypt_generic, or the string
will not entirely decrypt properly. If your original string is 10
characters long and the block size is 8 (use
mcrypt_enc_get_block_size() to determine the blocksize), you would
need at least 16 characters in your database field. Note the string
returned by mdecrypt_generic() will be 16 characters as well...use
rtrim($str, "\0") to remove the padding.
More here
So I guess the correct answer, is that the size of characters generated by the encrypt function depends on the size of the text you are parsing through the encrypt function.
Assuming you are using MySQL,why don't you just use a TEXT if you are parsing a lot of information?
More info about MySQL field types here

The answer is difficult to define because it does depend on your input size. But even a fixed input size yields different size output.
I created a simple script to test real-world sizes for different string lengths.
Here is the GitHub gist
Here's sample output:
Testing Laravel Crypt::encrypt() result length
Number of passes: 1000000
Minimum input length: 1
Maximum input length: 32
Input length: 1 - Output length 188 - 200
Input length: 2 - Output length 188 - 200
Input length: 3 - Output length 188 - 200
Input length: 4 - Output length 188 - 200
Input length: 5 - Output length 188 - 200
Input length: 6 - Output length 188 - 200
Input length: 7 - Output length 188 - 200
Input length: 8 - Output length 188 - 200
Input length: 9 - Output length 216 - 228
Input length: 10 - Output length 216 - 228
Input length: 11 - Output length 216 - 228
Input length: 12 - Output length 216 - 228
Input length: 13 - Output length 216 - 228
Input length: 14 - Output length 216 - 228
Input length: 15 - Output length 216 - 228
Input length: 16 - Output length 216 - 228
Input length: 17 - Output length 216 - 228
Input length: 18 - Output length 216 - 228
Input length: 19 - Output length 216 - 228
Input length: 20 - Output length 216 - 228
Input length: 21 - Output length 216 - 228
Input length: 22 - Output length 216 - 228
Input length: 23 - Output length 216 - 228
Input length: 24 - Output length 244 - 256
Input length: 25 - Output length 244 - 256
Input length: 26 - Output length 244 - 256
Input length: 27 - Output length 244 - 256
Input length: 28 - Output length 244 - 256
Input length: 29 - Output length 244 - 256
Input length: 30 - Output length 244 - 256
Input length: 31 - Output length 244 - 256
Input length: 32 - Output length 244 - 256
Note - if you're running this yourself, you'll need to set it to around 1 million passes per string length to get the actual hard min and max limits. 500,000 wasn't enough in my testing. Also, the get_random_input function only outputs a maximum 32 character string, so it would have to be modified to test longer strings.

The output DOES depend on the size of the input so it is safer to use a TEXT datatype for your column instead of a VARCHAR.
To test it take the largest possible string in your db column and run it through the encrypt() function to see how large the resulting string is. Note that if you are enforcing a length limit on raw text (before encryption) then you may get away with using VARCHAR.

What is the best way of sending the data to serial port?

This is related with microcontrollers but thought to post it here because it is a problem with algorithms and data types and not with any hardware stuff. I'll explain the problem so that someone that doesn't have any hardware knowledge can still participate :)
In Microcontroller there is an Analog to Digital converter with 10
bit resolution. (It will output a
value between 0 and 1023)
I need to send this value to PC using the serial port.
But you can only write 8 bits at once. (You need to write bytes). It is
a limitation in micro controller.
So in the above case at least I need to send 2 bytes.
My PC application just reads a sequence of numbers for plotting. So
it should capture two consecutive
bytes and build the number back. But
here we will need a delimiter
character as well. but still the delimiter character has an ascii value between 0 - 255 then it will mixup the process.
So what is a simplest way to do this? Should I send the values as a sequence of chars?
Ex : 1023 = "1""0""2""3" Vs "Char(255)Char(4)"
In summary I need to send a sequence of 10 bit numbers over Serial in fastest way. :)

You need to send 10 bits, and because you send a byte at a time, you have to send 16 bits. The big question is how much is speed a priority, and how synchronised are the sender and receiver? I can think of 3 answers, depending on these conditions.
Regular sampling, unknown join point
If the device is running all the time, you aren't sure when you are going to connect (you could join at any time in the sequence) but sampling rate is slower than communication speed so you don't care about size I think I'd probably do it as following. Suppose you are trying to send the ten bits abcdefghij (each letter one bit).
I'd send pq0abcde then pq1fghij, where p and q are error checking bits. This way:
no delimiter is needed (you can tell which byte you are reading by the 0 or 1)
you can definitely spot any 1 bit error, so you know about bad data
I'm struggling to find a good two bit error correcting code, so I guess I'd just make p a parity bit for bits 2,3 and 4 (0, a b above) and q a parity bit for 5 6 and 7 (c,d,e above). This might be clearer with an example.
Suppose I want to send 714 = 1011001010.
Split in 2 10110 , 01010
Add bits to indicate first and second byte 010110, 101010
calculate parity for each half: p0=par(010)=1, q0=par(110)=0, p1=par(101)=0, q1=par(010)=1
bytes are then 10010110, 01101010
You then can detect a lot of different error conditions, quickly check which byte you are being sent if you lose synchronisation, and none of the operations take very long in a microcontroller (I'd do the parity with an 8 entry lookup table).
Dense data, known join point
If you know that the reader starts at the same time as the writer, just send the 4 ten bit values as 5 bytes. If you always read 5 bytes at a time then no problems. If you want even more space saving, and have good sample data already, I'd compress using a huffman coding.
Dense data, unknown join point
In 7 bytes you can send 5 ten bit values with 6 spare bits. Send 5 values like this:
byte 0: 0 (7 bits)
byte 1: 1 (7 bits)
byte 2: 1 (7 bits)
byte 3: 1 (7 bits)
byte 4: 0 (7 bits)
byte 5: 0 (7 bits)
byte 6: (8 bits)
Then whenever you see 3 1's in a row for the most significant bit, you know you have bytes 1, 2 and 3. This idea wastes 1 bit in 56, so could be made even more efficient, but you'd have to send more data at a time. Eg (5 consecutive ones, 120 bits sent in 16 bytes):
byte 0: 0 (7 bits) 7
byte 1: 1 (7 bits) 14
byte 2: 1 (7 bits) 21
byte 3: 1 (7 bits) 28
byte 4: 1 (7 bits) 35
byte 5: 1 (7 bits) 42
byte 6: 0 (7 bits) 49
byte 7: (8 bits) 57
byte 8: (8 bits) 65
byte 9: (8 bits) 73
byte 10: (8 bits) 81
byte 11: 0 (7 bits) 88
byte 12: (8 bits) 96
byte 13: (8 bits) 104
byte 14: (8 bits) 112
byte 15: (8 bits) 120
This is quite a fun problem!

The best method is to convert the data to an ASCII string and send it that way - it makes debugging a lot easier and it avoids various communication issues (special meaning of certain control characters etc).
If you really need to use all the available bandwidth though then you can pack 4 10 bit values into 5 consecutive 8 bit bytes. You will need to be careful about synchronization.

Since you specified "the fastest way" I think expanding the numbers to ASCII is ruled out.
In my opinion a good compromise of code simplicity and performance can be obtained by the following encoding:
Two 10bit values will be encoded in 3 bytes like this.
first 10bit value bits := abcdefghij
second 10bit value bits := klmnopqrst
Bytes to encode:
1abcdefg
0hijklmn
0_opqrst
There is one bit more (_) available that could be used for a parity over all 20bits for error checking or just set to a fixed value.
Some example code (puts 0 at the position _):
#include <assert.h>
#include <inttypes.h>
void
write_byte(uint8_t byte); /* writes byte to serial */
void
encode(uint16_t a, uint16_t b)
{
write_byte(((a >> 3) & 0x7f) | 0x80);
write_byte(((a & 3) << 4) | ((b >> 6) & 0x7f));
write_byte(b & 0x3f);
}
uint8_t
read_byte(void); /* read a byte from serial */
void
decode(uint16_t *a, uint16_t *b)
{
uint16_t x;
while (((x = read_byte()) & 0x80) == 0) {} /* sync */
*a = x << 3;
x = read_byte();
assert ((x & 0x80) == 0); /* put better error handling here */
*a |= (x >> 4) & 3;
*b = x << 6;
x = read_byte();
assert ((x & 0xc0) == 0); /* put better error handling here */
*b |= x;
}

I normally use a start byte and checksum and in this case fixed length, so send 4 bytes, the receiver can look for the start byte and if the next three add up to a know quantity then it is a good packet take out the middle two bytes, if not keep looking. The receiver can always re-sync and it doesnt waste the bandwidth of ascii. Ascii is your other option, a start byte that is not a number and perhaps four numbers for decimal. Decimal is definitely not fun in a microcontroller, so start with something non-hex like X for example and then three bytes with the hex ascii values for your number. Search for the x examine the next three bytes, hope for the best.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio