What's the difference between 7z and lzma compressors? - 7zip

7-Zip claims using LZMA as the compression algorithm.
However, the LZMA SDK comes with two executables, 7zr.exe and lzma.exe, which have different options/switches and which produce different results which are not interchangeble, even though they are closely sized.
So the question is: What's the difference between these two processors?

I ended up recieving a response to a similar question directly from Igor Pavlov in the 7-Zip forums, in SourceForge, so I thought that the response might be useful to others if I reproduce it here.
1) .lzma file format uses simple header, it supports only LZMA method,
and it supports only one file per archive. It doesn't store file name.
2) .7z file format uses complex headers, it supports different methods
(including LZMA), and it supports big number of files in archive.
lzma.exe works only with .lzma files
7zr.exe supports .7z files and .lzma files.
lzma.exe and 7z.exe use different default settings for LZMA
(dictionary size and other). So you see the difference in compression
ratio. If you set same settings for LZMA, the difference will be
smaller.
Igor Pavlov in Sourceforge forums

Related

How to compress file on HFS programmatically?

macOS HFS+ supports transparent filesystem-level compression. How can I enable this compression for certain files via a programmatic API? (e.g. Cocoa or C interface)
I'd like to achieve effect of ditto --hfsCompression src dst, but without shelling out.
To clarify: I'm asking how to make uncompressed file compressed. I'm not interested in reading or preserving existing HFS compression state.
There's afsctool which is an open source implementation of HFS+ compression. It was originally by hacker brkirch (macrumors forum link, as he still visits there) but has since been expanded and improved a great deal by #rjvb who is doing amazing things with it.
The copyfile.c file discloses some of the implementation details.
There's also a compression tool based on that: afsctool.
I think you're asking two different questions (and might not know it).
If you're asking "How can I make arbitrary file 'A' an HFS compressed file?" the answer is, you can,'t. HFS compressed files are created by the installer and (technically[1]) only Apple can create them.
If you are asking "How can I emulate the --hfsCompression logic in ditto such that I can copy an HFS compressed file from one HFS+ volume to another HFS+ volume and preserve its compression?" the answer to that is pretty straight forward, albeit not well documented.
HFS Compressed files have a special UF_COMPRESSED file flag. If you see that, the data fork of the file is actually an uncompressed image of a hidden resource. The compressed version of the file is stored in a special extended attribute. It's special because it normally doesn't appear in the list of attributes when you request them (so if you just ls -l# the file, for example, you won't see it). To list and read this special attribute you must pass the XATTR_SHOWCOMPRESSION flag to both the listxattr() and getxattr() functions.
To restore a compressed file, you reverse the process: Write an empty file, then restore all of its extended attributes, specifically the special one. When you're done, set the file's UF_COMPRESSED flag and the uncompressed data will magically appear in its data fork.
[1] Note: It's rumored that the compressed resource of a file is just a ZIPed version of the data, possibly with some wrapper around it. I've never taken the time to experiment, but if you're intent on creating your own compressed files you could take a stab at reverse-engineering the compressed extended attribute.

Why using unix-compress and go compress/lzw produce different files, not readable by the other decoder?

I compressed a file in a terminal with compress file.txt and got (as expected) file.txt.Z
When I pass that file to ioutil.ReadFile in Go,
buf0, err := ioutil.ReadFile("file.txt.Z")
I get the error (the line above is 116):
finder_test.go:116: lzw: invalid code
I found that Go would accept the file if I compress it using the compress/lzw package, I just used code from a website that does that. I only modified the line
outputFile, err := os.Create("file.txt.lzw")
I changed the .lzw to .Z. then used the resulting file.txt.Z in the Go code at the top, and it worked fine, no error.
Note: file.txt is 16.0 kB, unix-compressed file.txt.Z is 7.8 kB, and go-compressed file.txt.Z is 8.2 kB
Now, I was trying to understand why this happened. So, I tried to run
uncompress.real file.txt.Z
and it did not work. I got
file.txt.Z: not in compressed format
I need to use a compressor (preferably unix-compress) to compress files using lzw-compression then use the same compressed files on two different algorithms, one written in C and the other in Go, because I intend to compare the performance of the two algorithms. The C program will only accept the files compressed with unix-compress and the Go program will only accept the files compressed with Go's compress/lzw.
Can someone explain why that happened? Why are the two .Z files not equivalent? How can I overcome this?
Note: I am working on Ubuntu installed in VirtualBox on a Mac.
A .Z file does not only contain LZW compressed data, there is also a 3-bytes header that the Go LZW code does not generate because it is meant to compress data, not generate a Z file.
Presumably you only want to test the performance of two of your/some third party algorithms (& not the compression algorithms themselves), you may want to write a shell script which calls the compress command passing the files/dir's required and then call this script from your C / GO program. This is one way you can overcome this, but leaves open other parts of your queries on the correct way to use the compression libraries.
There is an ancient bug named "alignment bit groups" behind this question. I've described it in wikipedia "Special output format". Please read.
I've implemented a new library lzws. It has all possible options:
--without-magic-header (-w) - disable magic header
--max-code-bit-length (-b) - set max code bit length (9-16)
--raw (-r) - disable block mode
--msb (-m) - enable most significant bit
--unaligned-bit-groups (-u) - enable unaligned bit groups
You can use any options in all possible combinations. All combinations has been tested. I am sure that you can find combinations suitable for go lzw implementation.
You can use ruby-lzws binding if you like to use ruby.

In-depth understanding of binary files

I am learning C++ specially about binary file structure/manipulation, and since I am totally new to the subject of binary files, bits, bites & hexadecimal numbers, I decided to take one step backward and establish a solid understanding on the subjects.
In the picture I have included below, I wrote two words (blue thief) in a .txt file.
The reason for this, is when I decode the file using a hexeditor, I wanted to understand how the information is really stored in hex format. Now, don't get me wrong, I am not trying to make a living out of reading hex formats all day, but only to have a minimum level of understanding the basics of a binary file's composition. I also, know all files have different structures, but just for the sake of understanding, I wanted to know, how exactly the words "blue thief" and a single ' ' (space) were converted into those characters.
One more thing, is that, I have heard that binary files contain three types of information:
header, ftm & and the data! is that only concerned with multimedia files like audios, videos? because, I can't seem to see anything, other than what it looks like a the data chunk in this file only.
The characters in your text file are encoded in a Windows extension of ASCII--one byte for each character that you see in Notepad. What you see is what you get.
Generally, a hard distinction is made between text and binary files on Windows systems. On Unix/Linux systems, the distinction is fuzzier... you could argue that there is no distinction, in fact.
On Windows systems, the distinction is enforced by file extensions. All files with the extension ".TXT" are assumed to be text files (i.e., to contain only hex codes that represent visible onscreen characters, where "visible" includes whitespace).
Binary files are a whole different kettle of fish. Most, as you mention, include some sort of header describing how the data that follows is encoded. These headers can vary tremendously in size depending on the type of data (again, assumed to be indicated by the extension on Windows systems as well as Unix). A simple example is the WAV format for uncompressed audio. If you open a WAV file in your hex editing program, you'll see that the first four bytes are "RIFF"--this is a marker, often called a "magic number" even though it is readable as text, indicating that the contents are an audio file. Newer versions of the WAV specification have complicated this somewhat, but originally the WAV header was just the "RIFF" tag plus a dozen or so bytes indicating the sample rate of the following data. (You can see this by comparing the raw data in a track on an audio CD to the WAV file created by ripping an uncompressed copy of that track at 44.1 KHz--the data should be the same, with just a header section added at the start of the WAV file.)
Executable files (compiled programs) are a special type of binary file, but they follow roughly the same scheme of a header followed by data in a prescribed format. In this case, though, the "data" is executable machine code, and the header indicates, among other things, what operating system the file runs on. (For example, most Linux executables begin with the characters "ELF".)

How to prevent rsync compression on files with no extension?

I'm using rsync to perform synchronisation between two machines on a network, so I have rsync's --compress setting enabled, however I have various file-types that I'm excluded that I know are already compressed such .jpg, .mp4 etc, using the --skip-compress option.
However, I have a large number of files with no extension that I know to have poor compression (due to encryption), as part of OS X's sparsebundle disk image format (where each "block" of the image is its own file with no file extension.
Anyway, I don't have many other files that should conflict, as other files that I have with no extension should be either excluded already or are quite small (so not really worth compressing).
However, I'm at a loss as to how I should add no extension files to rsync's --skip-compress list?
Going up one level: How much time are you saving with --skip-compress?
On a 0.5 megabyte/s network link, a 21 megabyte mp3 file with suffices mp3, txt and none I tried
--skip-compress="[]/gz/foo" and --skip-compress="gz//foo". I could not find a difference in the speed over 5 tries.

How to convert .epsf to .eps?

I'm looking for a method of converting .epsf to .eps for a publication I'm submitting. The submission site requires .eps (even though my understanding is that modern renderers should be able to read .epsf as well - the site is archaic, I have to upload all 100 images individually.) My co-author sent me the zipped files to upload (and now to convert) - I didn't make them myself. Further, the programs that made these images may exist on my co-authors computer but where is uncertain.
I've tried this in Mathematica 8 to reasonable but not full success - as in colored files become black and white, files with duplicate entries (as in Fig11a.eps and Fig11a.epsf both exist though they are different, it seems that the .eps is the background and the .epsf is the foreground layer) convert incorrectly. My attempt was to import the .epsf files to Mathematica and export them as .eps.
Also, I've using a middle man format - e.g. gif/tiff/png/jpg - with similar results. I haven't been able to find a program that's free (I assume photoshop could pull this off) that I could use - also I'd like to do it as a batch. A method that uses requires python/Mathematica or XP/Linux OS's would be fine. Thanks.
You do not need to convert anything. Encapsulated PostScript files can have both extensions (both EPS and EPSF). If you publisher refuses to accept files with an EPSF extension just rename them to EPS.
Any processing/conversion you do on the files (using GhostScript, Mathematica, etc.) carries the risk of corrupting the graphics in some way. But there's no need to do it. Send them as they are or rename them if you prefer.
(If you have any doubt, you can check the EPS Format Specification from 1992 which says that on the Macintish the recommended file extension is .epsf while on DOS it's .EPS)

Resources