How to use LZ4 compression in Linux 3.11 - linux-kernel

LZ4 algorithm was included in Linux 3.11 kernel
Can I compress files with this algorithm without installation of additional packages?

This is referring to a kernel-side compression, for things like decompressing the kernel image itself. For a write-up of the benefits, and comparison with existing kernel compression algorithms, see http://events.linuxfoundation.org/sites/events/files/lcjpcojp13_klee.pdf.
For compression/decompression of files from userspace, you need the userspace utility, which can be obtained from https://code.google.com/p/lz4/
Even for use of the LZ4 compressed kernel described in the first paragraph, you still need a userspace utility to compress your kernel file.

Related

How to speed up libjpeg decompression

We are using libjpeg for JPEG decoding on our small embedded platform. We have problems with speed when we decode large images. For example, image which is 20 MB large and has dimensions of 5000x3000 pixels needs 10 seconds to load.
I need some tips on how to improve decoding speed. On other platform with similar performance, I have the same image load in two seconds.
Best decrease from 14 seconds to 10 seconds we got from using larger read buffer (64 kB instead of default 4 kB). But nothing else helped.
We do not need to display image in full resolution, so we use scale_num and scale_denom to display it in smaller size. But I would like to have more performance. Is it possible to use some kind of multithreading etc.? Different decoding settings? Anything, I ran of ideas.
First - profile the code. You're left with little more than speculation if you cannot definitively identify the bottlenecks.
Next, scour the documentation for libjpeg speedup opportunities. You mentioned scale_num and scale_denom. What about the decompressor's dct_method? I've found that the DCT_FASTEST option is good. There are other options to check: do_fancy_upsampling, do_block_smoothing, dither_mode, two_pass_quantize, etc. Some of all of these may be useful to you, depending on your system, libjpeg version, etc.
If profiling tools are unavailable, there are still some things to try. First, I suspect your bottlenecks are non-CPU related. To confirm, load the uncompressed image into a RAM buffer, then decompress it from there as you have been. Did that significantly improve the decompression time? If so, the culprit would appear to be the read operation from your image storage media. Depending on your system, reading from USB (or SD, etc) can be slow. (Note that I'm assuming a read from external media - although hardware details are scant.) Be sure to optimize relevant bus parameters, as well (SPI clocks, configurations, etc).
If you are reading from something like internal flash (i.e. NAND), there are some other things to inspect. How is your NAND controller configured? Have you ensured that the controller is configured for the fastest operation? Check wait states, timings, etc. Note that bus and/or memory contention can be an issue, too - so inspect their respective configurations, as well.
Finally, if you believe your system is actually CPU bound, this stackoverflow question may be of interest:
Can a high-performance jpeglib-turbo implmentation decompress/compress in <100ms?
Multi-threading could only help the decode process if the target had multiple execution units for true concurrent execution. Otherwise it will just time-slice existing CPU resources. It won't help in any case unless the library were designed of make use of it.
If you built the library from source, you should at first ensure you built it with optimisation switched on, and carefully select the compiler options to match the build to your target and its instruction set to enable the compiler to use SIMD or an FPU for example.
Also you need to consider other possible bottlenecks. Is the 10 seconds just the time to decode or does it include the time to read from a filesystem or network for example? Given the improvement observed when you increased the read buffer size, it seems hghly likly that it is the data read rather than the decode that is limiting in this case.
If in fact the filesystem access is the limiting factor rather then the decode, then there may be some benefit in separating the file read from the decode in a separate thread and passing the data via a pipe or queue or multiple shared memory buffers to the decoder. You may then ensure that the decoder can stream the decode without having to wait for filesystem blocking.
Take a look at libjpeg-turbo. If you have supported hardware then it is generally 2-4 times faster then libjpeg on the same CPU. Tipical 12MB jpeg is decoded in less then 2 seconds on Pandaboard. You can also take a look at speed analisys of various JPEG decoders here.

What're lzo and lzf, and the differences?

Hi I heard of lzo and lzf and seems they are all compression algorithms. Are they the same thing? Are there any other algorithms like them(light and fast)?
lzo and lzf are 2 well known very simple compression algorithms.
lzf goes for low memory usage during compression.
lzo goes for maximum decoding speed.
Both are fast, both have little memory requirements, both have comparable compression rates (which means very poor).
You can look at a direct comparison of them with other compressors here for example :
http://phantasie.tonempire.net/t96-compression-benchmark#149
Are there any other algorithms like them(light and fast)?
There is also LZ4 and Google's snappy. According to the benchmarks published by the LZ4 author on the project homepage and Hadoop developers on issue HADOOP-7657, LZ4 seems the fastest of them all.
Both are basic Lempel-Ziv compressors, which allows fast operation (since there is no second phase of encoding using huffman (as gzip/zip do) or statistical encoder) with moderate compression.
One benchmark for comparing codecs on java is jvm-compressor-benchmark. LZO is not yet included, but pure Java LZF has excellent performance (esp. compression speed), and I assume LZO might fare well too, if there was a driver for it.
Another LZ-based algorithm is Snappy by Google, and its native codec is the fastest codec at decompression (and compression is as fast as pure-java LZF compression).
Splittable LZ4 and ZSTD for hadoop, recently born but promising -> https://github.com/carlomedas/4mc

I need to choose a compression algorithm

I need to choose a compression algorithm to compress some data. I don't know the type of data I'll be compressing in advance (think of it as kinda like the WinRAR program).
I've heard of the following algorithms but I don't know which one I should use. Can anyone post a short list of pros and cons? For my application the first priority is decompression speed; the second priority is space saved. Compression (not decompression) speed is irrelevant.
Deflate
Implode
Plain Huffman
bzip2
lzma
I ran a few benchmarks compressing a .tar that contained a mix of high entropy data and text. These are the results:
Name - Compression rate* - Decompression Time
7zip - 87.8% - 0.703s
bzip2 - 80.3% - 1.661s
gzip - 72.9% - 0.347s
lzo - 70.0% - 0.111s
*Higher is better
From this I came to the conclusion that the compression rate of an algorithm depends on its name; the first in alphabetical order will be the one with the best compression rate, and so on.
Therefore I decided to rename lzo to 1lzo. Now I have the best algorithm ever.
EDIT: worth noting that of all of them unfortunately lzo is the only one with a very restrictive license (GPL) :(
If you need high decompression speed then you should be using LZO. Its compression speed and ratio are decent, but it's hard to beat its decompression speed.
In the Linux kernel it is well explained (from those included):
Deflate (gzip) - Fast, worst compression
bzip2 - Slow, middle compression
lzma - Very slow compression, fast decompression (however slower than gzip), best compression
I haven't use others, so it is hard to say, but speeds of algorithms may depend largely on architecture. For example, there are studies that data compression on the HDD speeds the I/O, as the processor is so much faster than the disk that it is worth it. However, it depends largely on the size of bottlenecks.
Similarly, one algorithm may use memory extensively, which may or may not cause problems (12 MiB -- is it a lot or very small? On embedded systems it is a lot; on a modern x86 it is tiny fragment of memory).
Take a look at 7zip. It's open source and contains 7 separate compression methods. Some minor testing we've done shows the 7z format gives a much smaller result file than zip and it was also faster for the sample data we used.
Since our standard compression is zip, we didn't look at other compression methods yet.
For a comprehensive benchmark on text data you might want to check out the Large Text Compression Benchmark.
For other types, this might be indicative.

Fastest real time decompression algorithm

I'm looking for an algorithm to decompress chunks of data (1k-30k) in real time with minimal overhead. Compression should preferably be fast but isn't as important as decompression speed.
From what I could gather LZO1X would be the fastest one. Have I missed anything? Ideally the algorithm is not under GPL.
lz4 is what you're looking for here.
LZ4 is lossless compression algorithm, providing compression speed at
400 MB/s per core, scalable with multi-cores CPU. It features an
extremely fast decoder, with speed in multiple GB/s per core,
typically reaching RAM speed limits on multi-core systems.
Try Google's Snappy.
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
When you cannot use GPL licensed code your choice is clear - zlib. Very permissive license, fast compression, fair compression ratio, very fast decompression, works everywhere and ported to every sane language.

Arduino: Lightweight compression algorithm to store data in EEPROM

I want to store a large amount of data onto my Arduino with a ATmega168/ATmega328 microcontroller, but unfortunately there's only 256 KB / 512 KB of EEPROM storage.
My idea is to make use of an compression algorithm to strip down the size. But well, my knowledge on compression algorithms is quite low and my search for ready-to-use libraries failed.
So, is there a good way to optimize the storage size?
You might have a look at the LZO algorithm, which is designed to be lightweight. I don't know whether there are any implementations for the AVR system, but it might be something you could implement yourself.
You may be somewhat misinformed about the amount of storage available in EEPROM on your chip though; according to the datasheet I have the EEPROM sizes are:
ATmega48P: 256
ATmega88P: 512
ATmega168P: 512
ATmega256P: 1024
Note that those values are in bytes, not KB as you mention in your question. This is not, by any measure, a "shitload".
AVRs only have a few kilobytes of EEPROM at the most, and very few have many more than 64K Flash (no standard Arduinos do).
If you are needing to store something and seldom modify, for instance an image, you could try using the Flash as there is much more space there to work with. For simple images, some crude RLE encoding would go a long way.
Compressing anything more random, for instance logged data, audio, etc, will take a tremendous amount of overhead for the AVR, you will have better luck getting a serial EEPROM chip to hold this data. Arduino's site has a page on interfacing with a 64K chip, which sounds . If you want more than that, look at interfacing with a SD card with SPI, for instance in this audio shield
A NASA study here (Postscript)
A repost of 1989 article on LZW here
Keep it simple and perform analysis of the cost/payout of adding compression. This includes time and effort, complexity, resource usage, data compressibility, etc.
An algorithm something like LZSS would probably be a good choice for an embedded platform. They are simple algorithms, and don't need much memory.
LZS is one I'm familiar with. It uses a 2 kB dictionary for compression and decompression (the dictionary is the most recent 2 kB of the uncompressed data stream). (LZS was patented by HiFn, however as far as I can tell, all patents have expired.)
But I see that an ATmega328, used on recent Arduinos, only has 512 bytes to 2 kB SRAM, so maybe even LZS is too big for it. I'm sure you could use a variant with a smaller dictionary, but I'm not sure what compression ratios you'd achieve.
The method described in the paper “Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks” might run on an ATmega328.
Reference: C. Sadler and M. Martonosi, “Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks,” Proceedings of the ACM Conference on Embedded Networked Sensor Systems (SenSys) 2006, November 2006. .pdf.
S-LZW Source for MSPGCC: slzw.tar.gz. Updated 10 March 2007.
You might also want to take a look at LZJB, being very short, simple, and lightweight.
Also, FastLZ might be worth a look. It gets better compression ratios than LZJB and has pretty minimal memory requirements for decompression:
If you just want to remove some repeating zero's or such, use Run-length encoding
Repeating byte sequences will be stored as:
<mark><byte><count>
It's super-simple algorithm, which you can probably code yourself in few lines of code.
Is an external EEPROM (for example via I2C) not an option? Even if you use a compression algorithm the down side is that the size of data you may store in the internal EEPROM may not be determined in a simple way any more..
And of corse, if you really mean kBYTES, then consider a SDCard connected to the SPI... There are some light weighted open source FAT-compatible file systems in the net.
heatshrink is a data compression/decompression library for embedded/real-time systems based on LZSS. It says it can run in under 100 bytes of memory.

Resources