I have a stream of data that im attempting to encode with UUencode, in order to pass the data on to an external chip. The chip accepts 512 bytes of raw data at once. I encode 512 bytes with UUencode.
As far as i understand, the data should be converted into 11 lines of 45 bytes (which will be 60 bytes after encoding) and 1 remaining line of 17 bytes.
Obviously the 17 bytes cant map directly to uuencoded segments as it isnt a multiple of 3, yet when i get the uuencoded data back, the final line returns 24 encoded bytes (or 18 raw bytes).
This means that i now have 513 bytes of data in total. My question is, is this a fault with my uuencode algorithm (although from a purely mathematical perspective i cant see how it can be) or alternatively, where does the extra byte come from, and how do i get rid of it again?
UUEncoding 512 bytes will get you 684 encoded bytes (not 513). An input data stream of length 384 bytes will encode to exactly 512 bytes.
UUEncoding is simply a means to transform a 3 binary byte input data segment into a 4 text byte output data segment. Any input segment that is not 3 bytes long is padded with null bytes until it is. The UUEncoding algorithm has no representation for the original data length.
Contrast this with UUEncoded files which format and add information to the data stream by breaking the stream into lines of a specific length and add a line length indicator to the front of each encoded line of data. In your example, your 17 final bytes would be encoded to 24 bytes, but this line of data would be preceded by a byte that gives the length of the line as 17 instead of 18.
The only way to get rid of the padding is to know it is there in the first place by encoding the length of the data.
Related
Situation:
I have some string that can be at most 240,000 characters long. I store this string in AWS DynamoDB. Since DynamoDB uses UTF-8 encoding and each String can be at most 400 KB, worst case scenario my string would be 240,000 char * 4 bytes/char = 960,000 bytes, which exceeds the limit of 409,600 bytes.
I did some testing on compressing strings. Based on the result, it seems like 1 byte UTF-8 character strings can be compressed by 25% using gzip and 2-3 byte UTF-8 character strings can be compressed by 60% using gzip.
I used this link to compress string to binary: http://www.txtwizard.net/compression
Question:
How much can I compress using gzip? What's the worst case and average case? (How much it can compress for 4 byte UTF-8 strings? 1 byte UTF-8 strings?)
If I have this 240,000 characters long string, can I always compress it to less than 400,000 bytes?
I searched in google, but I could see data is variable but maximum size is not given.
I could not find in IEEE1609.3 as well
Please help me out
WSM messages have variable length payloads. The theoretical maximum is 4096 bytes because the length field is only 12 bits (4 bits out of 2 bytes are reserved).
However, the recommended variable length of a WSM, including Data, is 1400 bytes as specified in the Annex B of the standard
I am compressing 8 bit bytes and the algorithm works only if the number of unique single bytes found on the data is 128 or less.
I take all the unique bytes. At the start I store a table containing once each unique byte. If they are 120 I store 120 bytes.
Then, instead of storing each item in space of 8 bits, I store each item in 7 bits, one after another. Those 7 bits contain the item's position on the table.
Question: how can I avoid storing those 120 bytes at the start, by storing the possible tables in my code?
What you are trying do is special case of huffman coding where you are only considering unique byte not their frequency hence giving each byte fixed length code but you can do better use their frequency to give them variable length codes using huffman coding and get more compression.
But if you intend to use the same algorithm then consider this way :-
Dont store 120 bytes store 256 bits (32 bytes) where 1 indicate if value is present
because it will give you all info. You use bit to get the values which
are found in the file and construct the mapping tables again
I don't know the exact algorithm, but probably the idea of the compression algorithm is that you cannot. It has to store those values, so it can write a shortcut for all other bytes in the data.
There is one way in which you could avoid writing those 120 bytes: when you know the contents of those bytes beforehand. For example, when you know that whatever you are going to send, will only contain those bytes. Then you can simply make the table known on both sides, and simply store everything but those 120 bytes.
It's clear how the algorithm manages plain text as the characters byte values to the state matrix.
But what about AES encryption of binary files?
How does the algorithm manages larger than 16 bytes files, as long as the state is standarized to be 4x4 bytes?
The AES primitive is the basis of constructions that allow encryption/decryption of arbitrary binary streams.
AES-128 takes a 128-bit key and a 128-bit data block and "encrypts" or "decrypts" this block. 128 bit is 16 bytes. Those 16 bytes can be text (e.g. ASCII, one character per byte) or binary data.
A naive implementation would just break a file with longer than 16 bytes into groups of 16 bytes and encrypt each of these with the same key. You might also need to "pad" the file to make it a multiple of 16 bytes. The problem with that is that it exposes information about the file because every time you encrypt the same block with the same key you'll get the same ciphertext.
There are different ways to build on the AES function to encrypt/decrypt more than 16 bytes securely. For example you can use CBC or use counter mode.
Counter mode is a little easier to explain so let's look at that. If we have AES_e(k, b) encrypt block b with key k we do not want to re-use the same key to encrypt the same block more than once. So the construction we'll use is something like this:
Calculate AES_e(k, 0), AES_e(k, 1), AES_e(k, n)
Now we can take arbitrary input, break it into 16 bytes blocks, and XOR with this sequence. Since the attacker does not know they key they can not regenerate this sequence and decode our (longer) message. The XOR is applied bit by bit between the blocks generated above and the cleartext. The receiving side can now generate the same sequence, XOR it with the ciphertext and retrieve the cleartext.
In application you also want to combine this with some sort of authentication mechanism so you something like AES-GCM or AES-CCM.
Imagine you have a 17 byte plain text.
state matrix will be filled with the first 16 bytes and one block will be encrypt.
Next block will be 1 byte that left and state matrix will be padded with data in order to fill those 16 bytes AES needs.
It works well with bytes/binary files because AES always consider bytes unities.Does not matter if that is a ascii chunk or any other think. Just remember that everything in a computer is binary/bytes/bits. Once data be a stream data (chunks of information in bytes) it'll work fine.
A packed array in Postscript is supposed to be a space-saving feature, where objects can be squeezed tightly in memory by omitting extraneous information. Like a null can be just a single byte because it carries no information. Booleans could be a signel byte, too. Integers could be 5 (or 3) bytes (if it's a small number). Reference objects would need the full 8-bytes that a normal object does. But the Postscript Manual says that packed objects occupy 1-9 bytes!
When the PostScript language scanner encounters a procedure delimited by
{ … }, it creates either an array or a packed array, according to the current packing
mode (see the description of the setpacking operator in Chapter 8). An
array value occupies 8 bytes per element. A packed array value occupies 1 to 9
bytes per element, depending on each element’s type and value; a typical average
is 2.5 bytes per element. --PLRM 3ed, B.2. Virtual Memory Use, p. 742
So what object gets bigger when packed? And Why? Hydrogen-bonding??!
Any value that you can't represent in 7 bytes or less, will need 9 bytes.
The packed format starts with a byte that contains how many data bytes follow, so any value that needs all 8 bytes of data will be 9 bytes including the leading length byte.