Fastest real time decompression algorithm

Fastest real time decompression algorithm - algorithm

I'm looking for an algorithm to decompress chunks of data (1k-30k) in real time with minimal overhead. Compression should preferably be fast but isn't as important as decompression speed.
From what I could gather LZO1X would be the fastest one. Have I missed anything? Ideally the algorithm is not under GPL.

lz4 is what you're looking for here.
LZ4 is lossless compression algorithm, providing compression speed at
400 MB/s per core, scalable with multi-cores CPU. It features an
extremely fast decoder, with speed in multiple GB/s per core,
typically reaching RAM speed limits on multi-core systems.

Try Google's Snappy.
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

When you cannot use GPL licensed code your choice is clear - zlib. Very permissive license, fast compression, fair compression ratio, very fast decompression, works everywhere and ported to every sane language.

Related

Fast compression of large (100 MB ) text file with byte values in it

I have a large text file of size in order of 100 MB to compress. It has to be fast (12-14 seconds). What algorithms I can consider and what will be the expected compression ratio for them?
I got some file compression algorithms like FLZP,SR2,ZPAQ,Fp8,LPAQ8,PAQ9A.... which are performant among these ? The time limit is strict for me.

The algorithms that you have picked are the most well compressing in the world. Therefore, they are slow.
There are fast compression algorithms made for your use case. Names such as LZ4 and Snappy come up.

You have not defined what performance criterion you are looking for: more speed or more compression ? LZ based compressor (FLZP, LZO, LZ4, LZHAM, Snappy, ...) are the fastest. The PAQ compressors use context mixing for each bit, so they are slow but offer the best compression ratios. In between you can find things like Brotli, Zstd (they both offer a wide range of options to tune speed/compression) or the older Bzip/Bzip2. Personally I like BCM for its great speed/compression compromise and its simple code: https://github.com/encode84/bcm.

What're lzo and lzf, and the differences?

Hi I heard of lzo and lzf and seems they are all compression algorithms. Are they the same thing? Are there any other algorithms like them(light and fast)?

lzo and lzf are 2 well known very simple compression algorithms.
lzf goes for low memory usage during compression.
lzo goes for maximum decoding speed.
Both are fast, both have little memory requirements, both have comparable compression rates (which means very poor).
You can look at a direct comparison of them with other compressors here for example :
http://phantasie.tonempire.net/t96-compression-benchmark#149

Are there any other algorithms like them(light and fast)?
There is also LZ4 and Google's snappy. According to the benchmarks published by the LZ4 author on the project homepage and Hadoop developers on issue HADOOP-7657, LZ4 seems the fastest of them all.

Both are basic Lempel-Ziv compressors, which allows fast operation (since there is no second phase of encoding using huffman (as gzip/zip do) or statistical encoder) with moderate compression.
One benchmark for comparing codecs on java is jvm-compressor-benchmark. LZO is not yet included, but pure Java LZF has excellent performance (esp. compression speed), and I assume LZO might fare well too, if there was a driver for it.
Another LZ-based algorithm is Snappy by Google, and its native codec is the fastest codec at decompression (and compression is as fast as pure-java LZF compression).

Splittable LZ4 and ZSTD for hadoop, recently born but promising -> https://github.com/carlomedas/4mc

Parallelizeable jpeg like compression using only DCT, run length encoding stages, what sort of compression/performance possible?

We have to compress a ton o' (monochrome) image data and move it quickly. If one were to just use the parallelizeable stages of jpeg compression (DCT and run length encoding of the quantized results) and run it on a GPU so each block is compressed in parallel I am hoping that would be very fast and still yeild a very significant compression factor like full jpeg does.
Does anyone with more GPU / image compression experience have any idea how this would compare both compression and performance wise over using libjpeg on a CPU? (If it is a stupid idea, feel free to say so - I am extremely novice in my knowledge of cuda and the various stages of jpeg compression.) Certainly it will be less compression and hopefully(?) faster but I have no idea how significant those factors may be.

You could hardly get more compression in GPU - there are just no complex-enough algorithms which can use that MUCH power.
When working with simple alos like JPEG - it's so simple that you'll spend most of the time transferring data via PCI-E bus (which has significant latency, especially when card does not support DMA transfers).
Positive side is that if card have DMA, you can free up CPU for more important stuff, and get image compression "for free".
In the best case, you can get about 10x improvement on top-end GPU compared to top-end CPU provided that both CPU & GPU code is well-optimized.

I need to choose a compression algorithm

I need to choose a compression algorithm to compress some data. I don't know the type of data I'll be compressing in advance (think of it as kinda like the WinRAR program).
I've heard of the following algorithms but I don't know which one I should use. Can anyone post a short list of pros and cons? For my application the first priority is decompression speed; the second priority is space saved. Compression (not decompression) speed is irrelevant.
Deflate
Implode
Plain Huffman
bzip2
lzma

I ran a few benchmarks compressing a .tar that contained a mix of high entropy data and text. These are the results:
Name - Compression rate* - Decompression Time
7zip - 87.8% - 0.703s
bzip2 - 80.3% - 1.661s
gzip - 72.9% - 0.347s
lzo - 70.0% - 0.111s
*Higher is better
From this I came to the conclusion that the compression rate of an algorithm depends on its name; the first in alphabetical order will be the one with the best compression rate, and so on.
Therefore I decided to rename lzo to 1lzo. Now I have the best algorithm ever.
EDIT: worth noting that of all of them unfortunately lzo is the only one with a very restrictive license (GPL) :(

If you need high decompression speed then you should be using LZO. Its compression speed and ratio are decent, but it's hard to beat its decompression speed.

In the Linux kernel it is well explained (from those included):
Deflate (gzip) - Fast, worst compression
bzip2 - Slow, middle compression
lzma - Very slow compression, fast decompression (however slower than gzip), best compression
I haven't use others, so it is hard to say, but speeds of algorithms may depend largely on architecture. For example, there are studies that data compression on the HDD speeds the I/O, as the processor is so much faster than the disk that it is worth it. However, it depends largely on the size of bottlenecks.
Similarly, one algorithm may use memory extensively, which may or may not cause problems (12 MiB -- is it a lot or very small? On embedded systems it is a lot; on a modern x86 it is tiny fragment of memory).

Take a look at 7zip. It's open source and contains 7 separate compression methods. Some minor testing we've done shows the 7z format gives a much smaller result file than zip and it was also faster for the sample data we used.
Since our standard compression is zip, we didn't look at other compression methods yet.

For a comprehensive benchmark on text data you might want to check out the Large Text Compression Benchmark.
For other types, this might be indicative.

Compression to Improve Hard Disk Write Performance

On a modern system can local hard disk write speeds be improved by compressing the output stream?
This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core affect this?
I know this would be affected by how much CPU is being used to generate the data so rules of thumb on how much idle CPU time would be needed would be good.
I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.

Yes, yes, yes, absolutely.
Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.
If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.
We do this all the time when reading optical media.
If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.

Yes, this has been true for at least 10 years. There are operating-systems papers about it. I think Chris Small may have worked on some of them.
For speed, gzip/zlib compression on lower quality levels is pretty fast; if that's not fast enough you can try FastLZ. A quick way to use an extra core is just to use popen(3) to send output through gzip.

For what it is worth Sun's filesystem ZFS has the ability to have on the fly compression enabled to decrease the amount of disk IO without a significant increase in overhead as an example of this in practice.

The Filesystems and storage lab from Stony Brook published a rather extensive performance (and energy) evaluation on file data compression on server systems at IBM's SYSTOR systems research conference this year: paper at ACM Digital Library, presentation.
The results depend on the
used compression algorithm and settings,
the file workload and
the characteristics of your machine.
For example, in the measurements from the paper, using a textual workload and a server environment using lzop with low compression effort are faster than plain write, but bzip and gz aren't.
In your specific setting, you should try it out and measure. It really might improve performance, but it is not always the case.

CPUs have grown faster at a faster rate than hard drive access. Even back in the 80's a many compressed files could be read off the disk and uncompressed in less time than it took to read the original (uncompressed) file. That will not have changed.
Generally though, these days the compression/de-compression is handled at a lower level than you would be writing, for example in a database I/O layer.
As to the usefulness of a second core only counts if the system will be also doing a significant number of other things - and your program would have to be multi-threaded to take advantage of the additional CPU.

Logging the data in binary form may be a quick improvement. You'll write less to the disk and the CPU will spend less time converting numbers to text. It may not be useful if people are going to be reading the logs, but they won't be able to read compressed logs either.

Windows already supports File Compression in NTFS, so all you have to do is to set the "Compressed" flag in the file attributes.
You can then measure if it was worth it or not.

This depends on lots of factors and I don't think there is one correct answer. It comes down to this:
Can you compress the raw data faster than the raw write performance of your disk times the compression ratio you are achieving (or the multiple in speed you are trying to get) given the CPU bandwidth you have available to dedicate to this purpose?
Given today's relatively high data write rates in the 10's of MBytes/second this is a pretty high hurdle to get over. To the point of some of the other answers, you would likely have to have easily compressible data and would just have to benchmark it with some test of reasonableness type experiments and find out.
Relative to a specific opinion (guess!?) to the point about additional cores. If you thread up the compression of the data and keep the core(s) fed - with the high compression ratio of text, it is likely such a technique would bear some fruit. But this is just a guess. In a single threaded application alternating between disk writes and compression operations, it seems much less likely to me.

If it's just text, then compression could definitely help. Just choose an compression algorithm and settings that make the compression cheap. "gzip" is cheaper than "bzip2" and both have parameters that you can tweak to favor speed or compression ratio.

If you are I/O bound saving human-readable text to the hard drive, I expect compression to reduce your total runtime.
If you have an idle 2 GHz core, and a relatively fast 100 MB/s streaming hard drive,
halving the net logging time requires at least 2:1 compression and no more than roughly 10 CPU cycles per uncompressed byte for the compressor to ponder the data.
With a dual-pipe processor, that's (very roughly) 20 instructions per byte.
I see that LZRW1-A (one of the fastest compression algorithms) uses 10 to 20 instructions per byte, and compresses typical English text about 2:1.
At the upper end (20 instructions per byte), you're right on the edge between IO bound and CPU bound. At the middle and lower end, you're still IO bound, so there is a a few cycles available (not much) for a slightly more sophisticated compressor to ponder the data a little longer.
If you have a more typical non-top-of-the-line hard drive, or the hard drive is slower for some other reason (fragmentation, other multitasking processes using the disk, etc.)
then you have even more time for a more sophisticated compressor to ponder the data.
You might consider setting up a compressed partition, saving the data to that partition (letting the device driver compress it), and comparing the speed to your original speed.
That may take less time and be less likely to introduce new bugs than changing your program and linking in a compression algorithm.
I see a list of compressed file systems based on FUSE, and I hear that NTFS also supports compressed partitions.

If this particular machine is often IO bound,
another way to speed it up is to install a RAID array.
That would give a speedup to every program and every kind of data (even incompressible data).
For example, the popular RAID 1+0 configuration with 4 total disks gives a speedup of nearly 2x.
The nearly as popular RAID 5 configuration, with same 4 total disks, gives all a speedup of nearly 3x.
It is relatively straightforward to set up a RAID array with a speed 8x the speed of a single drive.
High compression ratios, on the other hand, are apparently not so straightforward. Compression of "merely" 6.30 to one would give you a cash prize for breaking the current world record for compression (Hutter Prize).

This used to be something that could improve performance in quite a few applications way back when. I'd guess that today it's less likely to pay off, but it might in your specific circumstance, particularly if the data you're logging is easily compressible,
However, as Shog9 commented:
Rules of thumb aren't going to help
you here. It's your disk, your CPU,
and your data. Set up a test case and
measure throughput and CPU load with
and without compression - see if it's
worth the tradeoff.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Fastest real time decompression algorithm - algorithm

lz4 is what you're looking for here. LZ4 is lossless compression algorithm, providing compression speed at 400 MB/s per core, scalable with multi-cores CPU. It features an extremely fast decoder, with speed in multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.

When you cannot use GPL licensed code your choice is clear - zlib. Very permissive license, fast compression, fair compression ratio, very fast decompression, works everywhere and ported to every sane language.

Related

Fast compression of large (100 MB ) text file with byte values in it

What're lzo and lzf, and the differences?

Parallelizeable jpeg like compression using only DCT, run length encoding stages, what sort of compression/performance possible?

I need to choose a compression algorithm

Compression to Improve Hard Disk Write Performance

Categories

Resources