LZMA compression settings details - lzma

I really need to know what each lzma parameter (mf, fb, lp, ...) means. I could not find any good documentation in the internet. I need details of this algorithm. the most detailed one is:
http://www.bugaco.com/7zip/MANUAL/switches/method.htm
I would appreciate any help.
Best wishes,
Shadi.

According to Wikipedia no complete natural language specification of the compressed format seems to exist. However configuration settings are specified.
During my work with LZMA SDK I discovered the following compression settings CLzmaEncProps and CLzma2EncProps structure types:
LZMA Options:
level
Description: The compression level.
Range: [0;9].
Default: 5.
dictSize
Description: The dictionary size.
Range: [1<<12;1<<27] for 32-bit version or [1<<12;1<<30] for 64-bit version.
Default: 1<<24.
lc
Description: The number of high bits of the previous byte to use as a context for literal encoding.
Range [0;8].
Default: 3
Sometimes lc = 4 gives gain for big files.
lp
Description: The number of low bits of the dictionary position to include in literal_pos_state.
Range: [0;4].
Default: 0.
It is intended for periodical data when period is equal 2^value (where lp=value). For example, for 32-bit (4 bytes) periodical data you can use lp=2. Often it's better to set lc=0, if you change lp switch.
pb
Description: pb is the number of low bits of the dictionary position to include in pos_state.
Range: [0;4].
Default: 2.
It is intended for periodical data when period is equal 2^value (where lp=value).
algo
Description: Sets compression mode.
Options: 0 = fast, 1 = normal.
Default: 1.
fb
Description: Sets the number of fast bytes for the Deflate/Deflate64 encoder.
Range: [5;255].
Default: 128.
Usually, a big number gives a little bit better compression ratio and a slower compression process. A large fast bytes parameter can significantly increase the compression ratio for files which contain long identical sequences of bytes.
btMode
Description: Sets Match Finder for LZMA.
Options: 0 = hashChain mode, 1 = binTree mode.
Default: 1.
Default method is bt4. Algorithms from hc* group don't provide a good compression ratio, but they often work pretty fast in combination with fast mode.
numHashBytes
Description: Number of hash bytes. See mf={MF_ID} section here for details.
Options: 2, 3 or 4.
Default: 4.
mc
Description: Sets number of cycles (passes) for match finder.
Range: [1;1<<30].
Default: 32.
If you specify mc = 0, LZMA will use default value. Usually, a big number gives a little bit better compression ratio and slower compression process. For example, mf=HC4 and mc=10000 can provide almost the same compression ratio as mf=BT4.
writeEndMark
Description: Option for writing or not writing the end mark.
Options: 0 - do not write EOPM, 1 - write EOPM.
Default: 0.
numThreads
Description: Number of threads.
Options: 1 or 2
Default: 2
LZMA2 Options:
LZMA2 is modified version of LZMA. It provides the following advantages over LZMA:
Better compression ratio for data than can't be compressed. LZMA2
can store such blocks of data in uncompressed form. Also it
decompresses such data faster.
Better multithreading support. If you compress big file, LZMA2 can split that file to chunks and compress these chunks in multiple threads.
Note: LZMA2 also supports all LZMA parameters, but lp + lc cannot be larger than 4.
blockSize
Description: Sets chunk size.
Default: dictSize * 4.
numBlockThreads
Description: Set the number of threads per chunk(block).
numTotalThreads
Description: The maximum number of threads LZMA2 can use.
Note: LZMA2 uses: 1 thread for each chunk in x1 and x3 modes; and 2 threads for each chunk in x5, x7 and x9 modes. If LZMA2 is set to use only such number of threads required for one chunk, it doesn't split stream to chunks. So you can get different compression ratio for different number of threads.
I think that in order to get more information on this subject you have to study in a more profound way the LZMA. There are very few examples on the internet about it and the documentation is quite incomplete.
More Info Here:
http://sevenzip.sourceforge.jp/chm/cmdline/switches/method.htm
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm
http://linux.die.net/man/1/lzma

Related

What’s the most compression that we can hope for a file that contains 1000 bits (Huffman algorithm)?

How much file that contains 1000 bits, where 1 appears with a 10% probability of 0 - 90% probability can be compressed with Huffman code?
Maybe a factor of two.
But only if you do not include the overhead of sending the description of the Huffman code along with the data. For 1000 bits, that overhead will dominate the problem, and determine your maximum compression ratio. I find that for that small of a sample, 125 bytes, general-purpose compressors get it down to only around 100 to 120 bytes, due to the overhead.
A custom Huffman code just for this on bytes from such a stream gives a factor of 2.10, assuming the other side already knows the code. The best you could hope for is the entropy, e.g. with an arithmetic code, which gives 2.13.

why executation time of tf.nn.conv2d function different while the multiply times are the same?

I am using tensorflow to build cnn net in image classification experiment,I found such phenomenon as:
operation 1:tf.nn.conv2d(x, [3,3,32,32], strides=[1,1,1,1], padding='SAME')
the shape of x is [128,128,32],means convolution using 3x3 kernel on x,both input channels and output channels are 32,the total multiply times is
3*3*32*32*128*128=150994944
operation 2:tf.nn.conv2d(x, [3,3,64,64], strides=[1,1,1,1], padding='SAME')
the shape of x is [64,64,64],means convolution using 3x3 kernel on x,both input channels and output channels are 64,the total multiply times is
3*3*64*64*64*64=150994944
In contrast with operation 1,the feature map size of operation 2 scale down to 1/2 and the channel number doubled. The multiply times are the same so the running time should be same.But in practice the running time of operation 1 is longer than operation 2.
My measure method was shown below
eliminate an convolution of operation 1,the training time for one epoch reduced 23 seconds,means the running time of operation 1 is 23 seconds.
eliminate an convolution of operation 2,the training time for one epoch reduced 13 seconds,means the running time of operation 2 is 13 seconds.
the phenomenon can reproduction every time。
My gpu is nvidia gtx980Ti,os is ubuntu 16.04。
So that comes the question: Why the running time of operation 1 was longer than operation 2?
If I had to guess it has to do with how the image is ordered in memory. Remember that in memory everything is stored in a flattened format. This means that if you have a tensor of shape [128, 128, 32], the 32 features/channels are stored next to eachover. Then all of the rows, then all of the columns. https://en.wikipedia.org/wiki/Row-major_order
Accessing closely packed memory is very important to performance especially on a GPU which has a large memory bus and is optimized for aligned in order memory access. In case with the larger image you have to skip around the image more and the memory access is more out of order. In case 2 you can do more in order memory access which gives you more speed. Multiplications are very fast operations. I bet with a convolution memory access if the bottleneck which limits performance.
chasep255's answer is good and probably correct.
Another possibility (or alternative way of thinking about chasep255's answer) is to consider how caching (all the little hardware tricks that can speed up memory fetches, address mapping, etc) could be producing what you see...
You have basically two things: a stream of X input data and a static filter matrix. In case 1, you have 9*1024 static elements, in case 2 you have 4 times as many. Both cases have the same total multiplication count, but in case 2 the process is finding more of its data where it expects (i.e. where it was last time it was asked for.) Net result: less memory access stalls, more speed.

Compress to fixed length

I am looking for an algorithm that could compress binary data, not to smallest size, but to specified size. For example, if I have uncompressed data that comes in various 1.2, 1.3, 1.4 ... KB sizes, I would specify "compress to 1 KB" and assuming the data could be compressed to .65, .74, .83 KB sizes, the algorithm would stop at 1 KB, and return this standard size, leaving some entropy in the data. I could not pinpoint an algorithm for that. Does one exist?
You can ZIP and pad with zeroes but still in some case the data is highly random and even very efficient compression algorithms cannot compress the data because there is no correlation between data hence you dont get any compression at all so getting compression up to a specific size is not possible.
It is not possible.
Proof: Some combination of data can't be losslessly compressed, so that case fails your need to meet a fixed size, assuming that size is smaller than the original data.
If you can accept lossy compression it is very possible and happens all the time for some video formats (fixed size container per unit of time; compression adjusts to maximize usage).

What would be the effect of increasing the number of bytes?

One byte is used to store each of the three color channels in a pixel. This gives 256 different levels each of red, green and blue. What would be the effect of increasing the number of bytes per channel to 2 bytes?
2^16 = 65536 values per channel.
The raw image size doubles.
Processing the file takes roughly 2 times more time ("roughly", because you have more data, but then again this new data size may be better suited for your CPU and/or memory alignment than the previous sections of 3 bytes -- "3" is an awkward data size for CPUs).
Displaying the image on a typical screen may take more time (where "a typical screen" is 24- or 32-bit and would as yet not have hardware acceleration for this particular job).
Chances are you cannot use the original data format to store the image back into. (Currently, TIFF is the only file format I know that routinely uses 16 bits/channel. There may be more. Can yours?)
The image quality may degrade. (If you add bytes you cannot set them to a sensible value. If 3 bytes of 0xFF signified 'white' in your original image, what would be the comparable 16-bit value? 0xFFFF, or 0xFF00? Why? (For either choice-- and remember, you have to make a similar choice for black.))
Common library routines may stop working correctly. Only the very best libraries are data size-ignorant (and they'd still need to be rewritten to make use of this new size.)
If this is a real world scenario -- say, I just finished writing a fully antialiased graphics 2D library, and then my boss offhandedly adds this "requirement" -- it'd have a particular graphic effect on me as well.

What is the typical memory space usage for a Google protocol buffer?

I'm working on a small device has a reasonably large set of configuration parameters (~100 KB) which are generated from PC software. In the past we've stored the parameters in a binary file and loaded them into a data structure. Maintenance is a bit annoying (different languages, making sure the order of fields in the structure matches, different versions, etc.), so we're considering going to Google protocol buffers.
From the small device's standpoint, I'm concerned about the memory space that will be required to store the serialized protocol buffer. I'm working in C, so I downloaded protobuf-embedded-c and started working on an example. I was a bit surprised by the maximum size of the buffer it was calculating. For example, what follows is the size of an empty buffer and then buffers containing a single variable of the named type:
#define MAX_M_Empty_SIZE 2
#define MAX_M_double_SIZE 12
#define MAX_M_float_SIZE 8
#define MAX_M_int32_SIZE 14
#define MAX_M_int64_SIZE 14
#define MAX_M_uint32_SIZE 9
#define MAX_M_uint64_SIZE 14
#define MAX_M_sint32_SIZE 9
#define MAX_M_sint64_SIZE 14
#define MAX_M_fixed32_SIZE 8
#define MAX_M_fixed64_SIZE 12
#define MAX_M_sfixed32_SIZE 8
#define MAX_M_sfixed64_SIZE 12
#define MAX_M_bool_SIZE 5
Every time I added an 'int32' to the structure, the maximum size increased by 14 bytes. I know that includes the key and probably a worst case for the encoding on the Variant, but what can I expect going forward? Are larger messages more efficient than smaller messages, or is it more dependent on the encoded values?
In summary, I'm just trying to get a feel for the memory space usage on a protocol buffer. I would hate trade ease of use for a large increase in the memory space necessary to store the configuration data. Thanks!
int32 is written as a varint, which means that for positive values the space it takes is dependent on the magnitude. Small positive values can be single-byte; larger positive values can take more. Negative values take a lot more space - in particular, it takes the same as a very large 64-bit number. "varint" is 7-bit plus continuation; so a negative number (or a large positive number) can take 10 bytes. To avoid this, if you know your values could be negative you can use sint32 / sint64 - this uses zig-zag encoding (then varint) - which basically makes small magnitude values take less space than large magnitude values (irrespective of sign).
If you need to optimize for worst-case, then maybe consider using fixed32 / fixed64 instead; this guarantees to take exactly 4 or 8 bytes.
Summary:
always (or almost always) positive, and generally of small-to-moderate size: int32/int64
positive or negative, and generally of small-to-moderate magnitude: sint32/sint64
large values, or need to guarantee size: fixed32/fixed64
There are a few others as well; the full details are in the language guide
(in all cases above, you also need to include the header, but that is usually 1 or 2 bytes)
Nanopb can work with very small memory spaces, and it can also serialize directly to/from a file to avoid the need for a memory buffer:
http://koti.kapsi.fi/jpa/nanopb/

Resources