Why does 7z create different files? - bash

I'm using 7z command in bash script to create a 7z archive for backup purposes. My script does also check if this newly created 7z archive exists in my backup folder and if it does, I go and run md5sum to see if content differs. So if the archive file doesn't exits yet or the md5sum differs from the previous I copy it to my backup folder. So I tried a simple example to test the script, but the problem is that I sometimes get different md5sum for the same folder I am compressing. Why is that so? Is there any other reliable way of checking if file content differs? The commands are simple:
SourceFolder="/home/user/Documents/"
for file in $SourceFolder*
do
localfile=${file##*/}
7z a -t7z "$SourceFolder${localfile}.7z" "$file"
md5value=`md5sum "$SourceFolder${localfile}.7z"|cut -d ' ' -f 1`
...copyinf files goes from here on...

The reliable way to check if two different losslessly compressed files have identical contents is to expand their contents and compare those (e.g. using md5sum). Comparing the compressed files is going to end badly sooner or later, regardless of which compression scheme you use.

I've partially solved this. It looks like it matters if you specify full path to the folder you are compressing or not. The resulting file is not the same. .This affects both 7z and tar.I mean like this:
value1=$(tar -c /tmp/at-spi2/|md5sum|cut -d ' ' -f 1)
value2=$(tar -c at-spi2/|md5sum|cut -d ' ' -f 1)
So obviously I'm doing this wrong. Is there a switch for 7z and tar which would remove absolute path?

Related

file name comparison within specific folders and sub-folders using bash script

I would like to do some file name comparison with the bash script to determine the file should run a compress routine or not.
Here what I want to do, look through the UPLOAD folder and all sub-folders (couple hundreds of folders in total), if filenameA.jpg and filenameA.orig are both exist in the same folder that means it is compressed before and no need to compress it again, otherwise will compress the filenameA.jpg file.
This way only compress the newer added file and not file already compressed before.
Can someone tell me how to do the if / loop statement using bash script? I plan to run it by Cron job.
Thank you for your help.
Use find to recursively search for all files named *.jpg.
For each file returned you would check for a corresponding ".orig" file, and based on the result compress of not.
Something like this perhaps should get you started:
find UPLOAD -type f -name '*.jpg' | while read JPG
do
ORIG="${JPG%.jpg}.orig"
if [ -s ${ORIG} ]
then
echo "File ${JPG} already compressed to ${ORIG}"
else
echo "File ${JPG} need compressing ..."
gzip -c ${JPG} > ${ORIG}
fi
done

How to create tar files automatically

I like to create tar-files to distribute some scripts using bash.
For every script certain configuration-files and libraries (or toolboxes) are needed,
e.g. a script called CheckTool.py needs Checks.ini, CheckToolbox.py and CommontToolbox.py to run, which are stored in specific folders on my harddisk and need to be copied in the same manner on the users harddisk.
I can create a tarfile manually for each script, but i like to have it more simple.
For this i have the idea to define a list of all needed files and their pathes for a specific script and read this in a bashscript, which creates the tar file.
I started with:
#!/bin/bash
while read line
do
echo "$line"
done < $1
Which is reading the files and pathes. In my example the lines are:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
My question is how do I have to organize the data to make a tar file with the specified files using bash?
Or is there someone having a better idea?
No need for a complicated script, use option -T of tar. Every file listed in there will be added to the tar file:
-T, --files-from FILE
get names to extract or create from FILE
So your script becomes:
#!/bin/bash
tar -cvpf something.tar -T listoffiles.txt
listoffiles.txt format is super easy, one file per line. You might want to put full path to ensure you get the right files:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
You can add tar commands to the script as needed, or you could loop on the list files, from that point on, your imagination is the limit!

Copying multiple files with same name in the same folder terminal script

I have a lot of files named the same, with a directory structure (simplified) like this:
../foo1/bar1/dir/file_1.ps
../foo1/bar2/dir/file_1.ps
../foo2/bar1/dir/file_1.ps
.... and many more
As it is extremely inefficient to view all of those ps files by going to the
respective directory, I'd like to copy all of them into another directory, but include
the name of the first two directories (which are those relevant to my purpose) in the
file name.
I have previously tried like this, but I cannot get which file is from where, as they
are all named consecutively:
#!/bin/bash -xv
cp -v --backup=numbered {} */*/dir/file* ../plots/;
Where ../plots is the folder where I copy them. However, they are now of the form file.ps.~x~ (x is a number) so I get rid of the ".ps.~*~" and leave only the ps extension with:
rename 's/\.ps.~*~//g' *;
rename 's/\~/.ps/g' *;
Then, as the ps files have hundreds of points sometimes and take a long time to open, I just transform them into jpg.
for file in * ; do convert -density 150 -quality 70 "$file" "${file/.ps/}".jpg; done;
This is not really a working bash script as I have to change the directory manually.
I guess the best way to do it is to copy the files form the beginning with the names
of the first two directories incorporated in the copied filename.
How can I do this last thing?
If you just have two levels of directories, you can use
for file in */*/*.ps
do
ln "$file" "${file//\//_}"
done
This goes over each ps file, and hard links them to the current directory with the /s replaced by _. Use cp instead of ln if you intend to edit the files but don't want to update the originals.
For arbitrary directory levels, you can use the bash specific
shopt -s globstar
for file in **/*.ps
do
ln "$file" "${file//\//_}"
done
But are you sure you need to copy them all to one directory? You might be able to open them all with yourreader */*/*.ps, which depending on your reader may let browse through them one by one while still seeing the full path.
You should run a find command and print the names first like
find . -name "file_1.ps" -print
Then iterate over each of them and do a string replacement of / to '-' or any other character like
${filename/\//-}
The general syntax is ${string/substring/replacement}. Then you can copy it to the required directory. The complete script can be written as follows. Haven't tested it (not on linux at the moment), so you might need to tweak the code if you get any syntax error ;)
for filename in `find . -name "file_1.ps" -print`
do
newFileName=${filename/\//-}
cp $filename YourNewDirectory/$newFileName
done
You will need to place the script in the same root directory or change the find command to look for the particular directory if you are placing the above script in some other directory.
References
string manipulation in bash
find man page

Mac zip compress without __MACOSX folder?

When I compress files with the built in zip compressor in Mac OSX, it causes an extra folder titled "__MACOSX" to be created in the extracted zip.
Can I adjust my settings to keep this folder from being created or do I need to purchase a third party compression tool?
UPDATE: I just found a freeware app for OSX that solves my problem: "YemuZip"
UPDATE 2: YemuZip is no longer freeware.
Can be fixed after the fact by zip -d filename.zip __MACOSX/\*
And, to also delete .DS_Store files: zip -d filename.zip \*/.DS_Store
When I had this problem I've done it from command line:
zip file.zip uncompressed
EDIT, after many downvotes: I was using this option for some time ago and I don't know where I learnt it, so I can't give you a better explanation. Chris Johnson's answer is correct, but I won't delete mine. As one comment says, it's more accurate to what OP is asking, as it compress without those files, instead of removing them from a compressed file. I find it easier to remember, too.
Inside the folder you want to be compressed, in terminal:
zip -r -X Archive.zip *
Where -X means: Exclude those invisible Mac resource files such as “_MACOSX” or “._Filename” and .ds store files
source
Note: Will only work for the folder and subsequent folder tree you are in and has to have the * wildcard.
This command did it for me:
zip -r Target.zip Source -x "*.DS_Store"
Target.zip is the zip file to create. Source is the source file/folder to zip up. The -x parameter specifies the file/folder to exclude.
If the above doesn't work for whatever reason, try this instead:
zip -r Target.zip Source -x "*.DS_Store" -x "__MACOSX"
I'm using this Automator Shell Script to fix it after.
It's showing up as contextual menu item (right clicking on any file showing up in Finder).
while read -r p; do
zip -d "$p" __MACOSX/\* || true
zip -d "$p" \*/.DS_Store || true
done
Create a new Service with Automator
Select "Files and Folders" in "Finder"
Add a "Shell Script Action"
zip -r "$destFileName.zip" "$srcFileName" -x "*/\__MACOSX" -x "*/\.*"
-x "*/\__MACOSX": ignore __MACOSX as you mention.
-x "*/\.*": ignore any hidden file, such as .DS_Store .
Quote the variable to avoid file if it's named with SPACE.
Also, you can build Automator Service to make it easily to use in Finder.
Check link below to see detail if you need.
Github
The unwanted folders can be also be deleted by the following way:
zip -d filename.zip "__MACOSX*"
Works best for me
The zip command line utility never creates a __MACOSX directory, so you can just run a command like this:
zip directory.zip -x \*.DS_Store -r directory
In the output below, a.zip which I created with the zip command line utility does not contain a __MACOSX directory, but a 2.zip which I created from Finder does.
$ touch a
$ xattr -w somekey somevalue a
$ zip a.zip a
adding: a (stored 0%)
$ unzip -l a.zip
Archive: a.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
-------- -------
0 1 file
$ unzip -l a\ 2.zip # I created `a 2.zip` from Finder before this
Archive: a 2.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
0 01-02-16 20:31 __MACOSX/
149 01-02-16 20:29 __MACOSX/._a
-------- -------
149 3 files
-x .DS_Store does not exclude .DS_Store files inside directories but -x \*.DS_Store does.
The top level file of a zip archive with multiple files should usually be a single directory, because if it is not, some unarchiving utilites (like unzip and 7z, but not Archive Utility, The Unarchiver, unar, or dtrx) do not create a containing directory for the files when the archive is extracted, which often makes the files difficult to find, and if multiple archives like that are extracted at the same time, it can be difficult to tell which files belong to which archive.
Archive Utility only creates a __MACOSX directory when you create an archive where at least one file contains metadata such as extended attributes, file flags, or a resource fork. The __MACOSX directory contains AppleDouble files whose filename starts with ._ that are used to store OS X-specific metadata. The zip command line utility discards metadata such as extended attributes, file flags, and resource forks, which also means that metadata such as tags is lost, and that aliases stop working, because the information in an alias file is stored in a resource fork.
Normally you can just discard the OS X-specific metadata, but to see what metadata files contain, you can use xattr -l. xattr also includes resource forks and file flags, because even though they are not actually stored as extended attributes, they can be accessed through the extended attributes interface. Both Archive Utility and the zip command line utility discard ACLs.
You can't.
But what you can do is delete those unwanted folders after zipping. Command line zip takes different arguments where one, the -d, is for deleting contents based on a regex. So you can use it like this:
zip -d filename.zip __MACOSX/\*
Cleanup .zip from .DS_Store and __MACOSX, including subfolders:
zip -d archive.zip '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Walkthrough:
Create .zip as usual by right-clicking on the file (or folder) and selecting "Compress ..."
Open Terminal app (search Terminal in Spotlight search)
Type zip in the Terminal (but don't hit enter)
Drag .zip to the Terminal so it converts to the path
Copy paste -d '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Hit enter
Use zipinfo archive.zip to list files inside, to check (optional)
I have a better solution after read all of the existed answers. Everything could done by a workflow in a single right click.
NO additional software, NO complicated command line stuffs and NO shell tricks.
The automator workflow:
Input: files or folders from any application.
Step 1: Create Archive, the system builtin with default parameters.
Step 2: Run Shell command, with input as parameters. Copy command below.
zip -d "$#" "__MACOSX/*" || true
zip -d "$#" "*/.DS_Store" || true
Save it and we are done! Just right click folder or bulk of files and choose workflow from services menu. Archive with no metadata will be created alongside.
IMAGE UPDATE: I chose "Quick Action" when creating a new workflow - here’s an English version of the screenshot:
do not zip any hidden file:
zip newzipname filename.any -x "\.*"
with this question, it should be like:
zip newzipname filename.any -x "\__MACOSX"
It must be said, though, zip command runs in terminal just compressing the file, it does not compress any others. So do this the result is the same:
zip newzipname filename.any
Keka does this. Just drag your directory over the app screen.
Do you mean the zip command-line tool or the Finder's Compress command?
For zip, you can try the --data-fork option. If that doesn't do it, you might try --no-extra, although that seems to ignore other file metadata that might be valuable, like uid/gid and file times.
For the Finder's Compress command, I don't believe there are any options to control its behavior. It's for the simple case.
The other tool, and maybe the one that the Finder actually uses under the hood, is ditto. With the -c -k options, it creates zip archives. With this tool, you can experiment with --norsrc, --noextattr, --noqtn, --noacl and/or simply leave off the --sequesterRsrc option (which, according to the man page, may be responsible for the __MACOSX subdirectory). Although, perhaps the absence of --sequesterRsrc simply means to use AppleDouble format, which would create ._ files all over the place instead of one __MACOSX directory.
This is how i avoid the __MACOSX directory when compress files with tar command:
$ cd dir-you-want-to-archive
$ find . | xargs xattr -l # <- list all files with special xattr attributes
...
./conf/clamav: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/global: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/web_server: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
Delete the attribute first:
find . | xargs xattr -d com.apple.quarantine
Run find . | xargs xattr -l again, make sure no any file has the xattr attribute. then you're good to go:
tar cjvf file.tar.bz2 dir
Another shell script that could be used with the Automator tool (see also benedikt's answer on how to create the script) is:
while read -r f; do
d="$(dirname "$f")"
n="$(basename "$f")"
cd "$d"
zip "$n.zip" -x \*.DS_Store -r "$n"
done
The difference here is that this code directly compresses selected folders without macOS specific files (and not first compressing and afterwards deleting).

Batch script to move files into a zip

Is anybody able to point me in the right direction for writing a batch script for a UNIX shell to move files into a zip one at at time and then delete the original.
I cant use the standard zip function because i don't have enough space to fit the zip being created.
So any suggestions please
Try this:
zip -r -m source.zip *
Not a great solution but simple, i ended up finding a python script that recursively zips a folder and just added a line to delete the file after it is added to the zip
You can achieve this using find as
find . -type f -print0 | xargs -0 -n1 zip -m archive
This will move every file into the zip preserving the directory structure. You are then left with empty directories that you can easily remove. Moreover using find gives you a lot of freedom on what files you want to compress.
I use :
zip --move destination.zip src_file1 src_file2
Here the detail of "--move" option from the man pages
--move
Move the specified files into the zip archive; actually, this
deletes the target directories/files after making the specified zip
archive. If a directory becomes empty after removal of the files, the
directory is also removed. No deletions are done until zip has
created the archive without error. This is useful for conserving disk
space, but is potentially dangerous so it is recommended to use it in
combination with -T to test the archive before removing all input
files.

Resources