There are lots of implementations of the Deflate decompression algorithm in different languages. The decompression algorithm itself is described in RFC1951. However, the compression algorithm seems more elusive and I've only ever seen it implemented in long C/C++ files.
I'd like to find an implementation of the compression algorithm in a higher level language, e.g. Python/Ruby/Lua/etc., for study purposes. Can someone point me to one?
Pyflate is a pure python implementation of gzip (which uses DEFLATE).
http://www.paul.sladen.org/projects/pyflate/
Edit: Here is a python implementation of LZ77 compression, which is the first step in DEFLATE.
https://github.com/olle/lz77-kit/blob/master/src/main/python/lz77.py
The next step, Huffman encoding of the symbols, is a simple greedy algorithm which shouldn't be too hard to implement.
Related
For example, a LZM algorithm example could be the LZMA, but the Huffman example I can't find. I understand that BWT uses it to some extent, but it uses another type of algorithm too.
I think you mean implementation, not algorithm. Huffman coding is an algorithm.
zlib provides the Z_HUFFMAN_ONLY compression strategy, which only uses Huffman coding to compress the input. The string matching zlib normally uses is turned off with that option.
(This question is probably flirting with the "no software recommendations" rule; I understand why it might be closed).
In their paper F_2 Lanczos revisited, Peterson and Monico give a version of the Lanczos algorithm for finding a subspace of the kernel of a linear map over Z/2Z. If my cursory reading of their paper is correct (whether it is or not is clearly not a question for SO), the algorithm presented requires a number of iterations that scales inversely proportional to the word size of the machine used. The authors implemented their proof of concept algorithm with a 64 bit word size.
Does there exist a publicly available implementation of that algorithm utilizing wide SIMD words for (a potentially significant) speedup?
An existing implementation would be a software recommendation. A more interesting question is "Is it possible to use SIMD to make this algorithm run faster?" From my glance at the paper, it sounds like SIMD is exactly what they are describing ("We will partition a 64 bit machine word x into eight subwords...where each ... is an 8-bit word") so if the authors' implementation is publicly available somewhere, the answer is "yes" because they're already using it. If this algorithm were written in C/C++ or something like that, an optimizing compiler would likely do a pretty good job of vectorizing it with SIMD even without manually specifying how to split the registers (can be verified by looking at the assembly). It would arguably be preferable to implement in high level language without splitting registers manually, because then the compiler could optimize it for any target machine's word size.
I am doing my graduation project which is arabic language Compression and encryption.
The Encryption algorithm is done using AES and it works very well.
But my problem is the Compression I don't know what algorithms I use what is the most easy to implement and has a good output performance.
it is important that you FIRST compress and THEN encrypt.
otherwise, you won't be able to compress.
that said, use whatever compression you can get, like xz, bzip2, gzip, ...
I know this is an old answer, but it feels like there is an insufficient answer with regards to what compression algorithm the asker should use, hence this answer.
If I were you I would use an existing library such as zlib, which has been implemented in multiple languages.
Using this library you can decide if you want to use the deflate algorithm or the gzip one. The difference between these two is nicely described in their FAQ.
Hope this helps.
What is the minimum source length (in bytes) for LZ77? Can anyone suggest a small and fast real time compression technique (preferable with c source). I need it to store compressed text and fast retrieval for excerpt generation in my search engine.
thanks for all the response, im using D language for this project so it's kinda hard to port LZO to D codes. so im going with either LZ77 or Predictor. thanks again :)
I long ago had need for a simple, fast compression algorithm, and found Predictor.
While it may not be the best in terms of compression ratio, Predictor is certainly fast (very fast), easy to implement, and has a good worst-case performance. You also don't need a license to implement it, which is goodness.
You can find a description and C source for Predictor in Internet RFC 1978: PPP Predictor Compression Protocol.
The lzo compressor is noted for its smallness and high speed, making it suitable for real-time use. Decompression, which uses almost zero memory, is extremely fast and can even exceed memory-to-memory copy on modern CPUs due to the reduced number of memory reads. lzop is an open-source implementation; versions for several other languages are available.
If you're looking for something more well known this is about the best compressor in terms of general compression you'll get. LZMA, the 7-zip encoder. http://www.7-zip.org/sdk.html
There's also LZJB:
https://hg.java.net/hg/solaris~on-src/file/tip/usr/src/uts/common/os/compress.c
It's pretty simple, based on LZRW1, and is used as the basic compression algorithm for ZFS.
Inkeeping with my interests in algorithms (see here), I would like to know if there are (contrary to my previous question), algorithms and data structures that are mainstream in parallel programming. It is probably early to ask about mainstream parallel algos and ds, but some of the gurus here may have had good experiences/bad experiences with some of them.
EDIT: I am more interested in successful practical applications of algos and ds than in academic papers.
Thanks
Many of Google's whitepapers, especially but not exclusively ones linked from this page, describe successful practical applications of parallel distributed computing and/or their DS and algorithmic underpinnings. For example, this paper deals with modifying a DBMS's data structures to extract intra-transaction parallelism; this one (and some others) introduces the popular mapreduce architecture, since implemented e.g. in hadoop; this one is about highly parallelizable approximate matrix factoring suitable for use in "kernel methods" in machine learning; etc, etc...
Maybe, I totally miss the point, but there are a ton of mainstream parallel algos and data structures, e.g. matrix multiplication, FFT, PDE and linear equation solvers, integration and simulation (Monte-Carlo / random numbers), searching and sorting, and so on. Take a look at the Designing and Building Parallel Programs or Patterns for Parallel Programming. And then there is CUDA and the like. What are you after?
Sorting:
Standard Template Library for Extra Large Data Sets
Sort Benchmark