Zip a directory while retaining relative path - bash

I have a directory of files:
/home/user/files/1.txt
/home/user/files/2.txt
/home/user/files/3.txt
I'd like to zip up the files directory into files.zip so when extracted I get:
files/1.txt
files/2.txt
files/3.txt
I know I can do:
# bash
cd /home/user; zip -r files.zip files/
Is there a way to do this without cding to the user directory?
I know that the --junk-paths flag will store just the filenames and junk the path but I'd like to keep the files directory as a container.

Couldn't find direct way using zip command but you can try "tar" command with -C option.
$ pwd
/home/shenzi
$ ls -l giga/files
total 3
-rw-r--r-- 1 shenzi Domain Users 3 Aug 5 11:24 1.txt
-rw-r--r-- 1 shenzi Domain Users 4 Aug 5 11:25 2.txt
-rw-r--r-- 1 shenzi Domain Users 9 Aug 5 11:25 3.txt
$ tar -C giga -cvf files.zip files/*
files/1.txt
files/2.txt
files/3.txt
$ tar -tvf files.zip
-rw-r--r-- shenzi/Domain Users 3 2014-08-05 11:24 files/1.txt
-rw-r--r-- shenzi/Domain Users 4 2014-08-05 11:25 files/2.txt
-rw-r--r-- shenzi/Domain Users 9 2014-08-05 11:25 files/3.txt
USE: -xvf to extract

Related

Remove duplicates from tar archive

I'm trying to create an archive of multiple text files. Sometimes these files are updated, when these files are updated I use the --update option in tar to append these files to the archive.
Say we have two files, test1.txt, and test2.txt. These files are added to archive test.tar.
Inspecting the tar with tar -tf test.tar
I get as expected:
test1.txt
test2.txt
Now if I update test2.txt, and append it into the archive using tar -f test.tar -u test2.txt.
I expect the output of running tar -tf test.tar to be:
test1.txt
test2.txt
But instead I get:
test1.txt
test2.txt
test2.txt
So how do I shake this tar to remove the older test2.txt? I know that after extracting the archive, I'd get only the most recent changes to both files, so this problem might seem trivial in this demo, but I'm actually archiving thousands of 5000-line files so the archive sizes get comically large with repeated runs.
What I'm currently doing is I'm extracting the files into a temp directory then re-archiving each time my script is run. This is obviously very inefficient. I'm hoping there's a tar option I'm missing somewhere.
TAR is simply a concatenation of the raw file contents with some metadata mixed in between. As you noticed, updating a file simply appends the file to the end of the TAR and, by convention, the last file occurring in the TAR "wins". TAR does not simply update a file because that could mean that all file contents after the updated file might have to be moved some bytes away to make space for the larger newer file version.
There actually is a TAR option not having been mentioned here which fits your use case: --occurrence=[NUMBER]. With this option, you can specify which of the multiple versions of a file with the same name/path is to be extracted or deleted. It would work fine with your simple example. This is how I set it up:
echo foo > test1.txt
echo foo > test2.txt
tar -cf updated.tar test1.txt test2.txt
sleep 1s
echo barbara > test2.txt
tar --update -f updated.tar test1.txt test2.txt
sleep 1s
echo foobar > test2.txt
tar --update -f updated.tar test1.txt test2.txt
tar tvlf updated.tar
-rwx------ user/group 4 2022-03-29 19:00 test1.txt
-rwx------ user/group 4 2022-03-29 19:00 test2.txt
-rwx------ user/group 8 2022-03-29 19:01 test2.txt
-rwx------ user/group 7 2022-03-29 19:01 test2.txt
Note that tar --update will only check the timestamp not the contents and the timestamp only has 1s granularity! Therefore, we need to wait 1s to be sure that the timestamp is at least one second later or tar will not add it to the archive. This is especially important when copy-pasting this code.
Simply calling --delete will delete all versions:
tar --delete -f updated.tar test2.txt
tar tvlf updated.tar
-rwx------ user/group 4 2022-03-29 19:00 test1.txt
When specifying --occurrence=1, only the first occurrence, i.e., the oldest version will be deleted:
tar --delete -f updated.tar test2.txt
tar tvlf updated.tar
-rwx------ user/group 4 2022-03-29 19:00 test1.txt
-rwx------ user/group 8 2022-03-29 19:01 test2.txt
-rwx------ user/group 7 2022-03-29 19:01 test2.txt
Unfortunately, for --delete, you can only delete exactly one file version. So, you would have to repeat deleting the oldest version until only the most recent is left. It is possible to do it in bash and that would at least be more space-efficient than extracting it to a temporary folder but it would probably be slower because it has to go over the archive many times and each time the archive is basically completely rewritten in place.
I recommend using ratarmount, which I wrote, instead. It will mount the archive (without actually extracting it) and expose a folder view showing the most recent versions of each file. Using this, you can create the new trimmed-down archive:
python3 -m pip install --user ratarmount
ratarmount updated.tar
ls -lA updated/
-rwx------ 1 user group 4 Mar 29 19:14 test1.txt
-rwx------ 1 user group 7 Mar 29 19:14 test2.txt
tar -c -f most-recent.tar -C updated/ .
tar tvlf updated.tar
drwxrwxrwx user/group 0 2022-03-29 19:00 ./
-rwx------ user/group 4 2022-03-29 19:00 ./test1.txt
-rwx------ user/group 7 2022-03-29 19:01 ./test2.txt
And there you have it. The output of tar tvlf looks a bit different with the preceding dot because we used -C and specified to archive the . folder. Normally, this doesn't hurt but you can circumvent this with any of these slightly more problematic alternatives:
tar -c -f most-recent.tar -C updated/ test1.txt test2.txt
tar -c -f most-recent.tar -C updated/ $( cd updated && find . -mindepth 1 -maxdepth 1 )
( cd updated/ && tar -c -f ../most-recent.tar {[^.],.[!.],..?}*; )
If you encounter problems with ratarmount please open an issue here.
Note that ratarmount even exposes those older versions but in well-hidden special folders:
ratarmount updated.tar
ls -lA updated/test2.txt.versions/
-rwx------ 1 user group 4 Mar 29 20:10 1
-rwx------ 1 user group 8 Mar 29 20:10 2
-rwx------ 1 user group 7 Mar 29 20:10 3
The file names inside the special .versions folder match the arguments given to --occurrence.
The above mentioned version in bash with --occurrence would look like this:
function deleteAllButMostRecentInTar()
{
local archive=$1
local filesToDelete=$( mktemp )
while true; do
tar --list --file "$archive" | sort | uniq -c |
sed -n -E '/^[ \t]*1 /d; s|^[ \t]*[0-9]+ ||p' > "$filesToDelete"
if [[ -s "$filesToDelete" ]]; then
local fileCount=$( cat -- "$filesToDelete" | wc -l )
echo -n "Found $fileCount files with more than version. Deleting ..."
tar --delete --occurrence=1 --files-from="$filesToDelete" \
--file "$archive"
echo " OK"
else
break
fi
done
rm -- "$filesToDelete"
echo
}
deleteAllButMostRecentInTar updated.tar
tar tvlf updated.tar
-rwx------ user/group 4 2022-03-29 19:00 test1.txt
-rwx------ user/group 7 2022-03-29 19:01 test2.txt

List only directories, including directories with spaces in their names? [duplicate]

This question already has answers here:
Listing only directories using ls in Bash? [closed]
(29 answers)
Closed 4 years ago.
I would like to list all directories in a directory. Some of them have spaces in their names. There are also files in the target directory, which I would like to ignore.
Here is the output of ls -lah data/:
drwxr-xr-x 5 me staff 160B 24 Sep 11:30 Wrecsam - Wrexham
-rw-r--r-- 1 me staff 77M 24 Sep 11:31 Wrexham.csv
drwxr-xr-x 5 me staff 160B 24 Sep 11:32 Wychavon
-rw-r--r-- 1 me staff 84M 24 Sep 11:33 Wychavon.csv
I would like to iterate only over the "Wrecsam - Wrexham" and "Wychavon" directories.
This is what I've tried.
for d in "$(find data -maxdepth 1 -type d -print | sort -r)"; do
echo $d
done
But this gives me output like this:
Wychavon
Wrecsam
-
Wrexham
I want output like this:
Wychavon
Wrecsam - Wrexham
What can I do?
Your for loop is not doing the right thing because of word splitting. You can use a glob instead of having to invoke an external command in a subshell:
shopt -s nullglob # make glob expand to nothing if there are no matches
for dir in data/*/; do
echo dir="$dir"
done
Related:
Looping over directories in Bash
Why you shouldn't parse the output of ls(1)

zip two file with same content, but final md5sum is different

I have the following operation on my mac:
$ echo "dgrgrrgrgrg" > test1.txt
after a few seconds, copy test1.txt:
$ cp test1.txt test2.txt
$ ls -l
total 16
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:18 test1.txt
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:19 test2.txt
now chech md5sum:
$ md5 *.txt
MD5 (test1.txt) = 8bab5a3e202c901499d83cb25d5a8c80
MD5 (test2.txt) = 8bab5a3e202c901499d83cb25d5a8c80
it's obvious that test1.txt and test2.txt have the same md5sum, now I zip these two files:
$ zip -X test1.zip test1.txt
adding: test1.txt (deflated 8%)
$ zip -X test2.zip test2.txt
adding: test2.txt (deflated 8%)
$ ls -l
total 32
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:18 test1.txt
-rw-r--r-- 1 hqfy staff 127 Mar 31 10:22 test1.zip
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:19 test2.txt
-rw-r--r-- 1 hqfy staff 127 Mar 31 10:23 test2.zip
size of test1.zip and test2.zip are the same, but when I check md5sum:
$ md5 *.zip
MD5 (test1.zip) = af8783f96ce98aef717ecf6229ffb07e
MD5 (test2.zip) = 59e752a03a2930adbe7f30b9cbf14561
I've googled it, using zip with option -X, but it did not work in my case, how can I create the two zip files with the same md5sum?
Quoting from the zip man page here..
With -X, zip strips all old fields and only includes the Unicode and
Zip64 extra fields (currently these two extra fields cannot be
disabled).
So, a different md5sum is expected when zipping (even with -X).
I know that this question is very old, but I may have an answer for you:
The timestamps for the two files (which are very obviously different) are included in the .zip file. That is why the md5sums are different. If you can somehow remove those timestamps, then the md5sums will be the same.
Also note that macOS adds a folder (__MACOSX) to a zip file that contains extra metadata and such. That may also be the issue.

Bash script to get privileges difference of files in two directories

How can I write a bash script on Linux to determine which files in two directories have different permissions?
For example, I have two directories:
fold1 having two files:
1- file1 (-rw-rw-r--)
2- file2 (-rw-rw-r--)
fold2 having same-name files with different permissions:
1- file1 (-rwxrwxr-x)
2- file2 (-rw-rw-r--)
I need a script to output the file names that have different permissions,
so the script will print only file1
I am currently checking the permissions manually by displaying the files with:
for i in `find .`; do ls -l $i ls -l ../file2/$i; done
Parsing find . output with: for i in $(find .) is going to give you trouble for any filenames with spaces, newlines, or other perfectly normal characters:
$ touch "one file"
$ for i in `find .` ; do ls -l $i ; done
total 0
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 17:30 one file
ls: cannot access ./one: No such file or directory
ls: cannot access file: No such file or directory
$
Since permissions can also differ by owner or by group, I think you should include those as well. If you need to include the SELinux security label, the stat(1) program makes that easy to get as well via the %C directive:
for f in * ; do stat -c "%a%g%u" "$f" "../scatman/${f}" |
sort | uniq -c | grep -q '^\s*1' && echo "$f" is different ; done
(Do whatever you want for the echo command...)
Example:
$ ls -l sarnold/ scatman/
sarnold/:
total 0
-r--r--r-- 1 sarnold sarnold 0 2012-02-08 18:00 funky file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:01 second file
-rw-r--r-- 1 root root 0 2012-02-08 18:05 third file
scatman/:
total 0
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 17:30 funky file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:01 second file
-rw-r--r-- 1 sarnold sarnold 0 2012-02-08 18:05 third file
$ cd sarnold/
$ for f in * ; do stat -c "%a%g%u" "$f" "../scatman/${f}" | sort | uniq -c | grep -q '^\s*1' && echo "$f" is different ; done
funky file is different
third file is different
$

Bash command to delete all but last 5 directories [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Delete all but the most recent X files in bash
I have a script to create incremental backups daily and I need to delete all backups but last 5.
For example, I have this folders:
drwxr-xr-x 4 root root 4096 Oct 29 01:10 2010-10-29
drwxr-xr-x 4 root root 4096 Oct 30 01:10 2010-10-30
drwxr-xr-x 4 root root 4096 Oct 31 01:10 2010-10-31
drwxr-xr-x 4 root root 4096 Nov 1 01:10 2010-11-01
drwxr-xr-x 4 root root 4096 Nov 2 01:10 2010-11-02
drwxr-xr-x 4 root root 4096 Nov 3 01:10 2010-11-03
drwxr-xr-x 4 root root 4096 Nov 4 01:10 2010-11-04
drwxr-xr-x 4 root root 4096 Nov 5 01:10 2010-11-05
drwxr-xr-x 4 root root 4096 Nov 6 01:10 2010-11-06
drwxr-xr-x 4 root root 4096 Nov 7 01:10 2010-11-07
drwxr-xr-x 4 root root 4096 Nov 8 01:10 2010-11-08
And I need to maintain only the last 5 directories and delete the others. After command execute, I need to have only this:
drwxr-xr-x 4 root root 4096 Nov 4 01:10 2010-11-04
drwxr-xr-x 4 root root 4096 Nov 5 01:10 2010-11-05
drwxr-xr-x 4 root root 4096 Nov 6 01:10 2010-11-06
drwxr-xr-x 4 root root 4096 Nov 7 01:10 2010-11-07
drwxr-xr-x 4 root root 4096 Nov 8 01:10 2010-11-08
I don't need to delete previous to 5 days, I need to delete all except 5 last directories :)
Now I'm using:
find /backup/increment -maxdepth 1 -type d -mtime +5 -exec rm -rf {} \;
But I need to improved not based in time :)
EDIT: This is an example for a server that do backups all days, but I need an script that delete all folders previous to last 5 because my computer do backups at 00:10 at night, but not all nights the backup is done it, because my computer isn't working all days, and I need to have always the last 5 backups :)
use the tail command to print lines starting with the n th line (Option -n +N):
rm `ls -t | tail -n +6`
ls -t outputs the current directory sorted by time. tail -n +6 takes al lines starting with the 6th line. Quoting with backticks feeds the result of the pipe into the rm command.
OLD SOLUTION, not correct ...
use the head command, which prints the first n lines of some output:
rm `ls -t1 | head -n 5`
ls -t outputs the current directory sorted by time. head -n 5 takes the first five entries of the previous output. Quoting with backticks feeds the result of the pipe into the rm command.
Please try out first before applying to live data :) ...
The first thing that came to my mind. It's not elegant:
a=0;
for i in `ls -t`;
do
a=`expr $a + 1`;
if [ $a -gt 5 ]; then
echo "removing $i";
rm -rf $i
fi;
done
create two dummy files with the start and the end date
touch -t 1010290000 before
touch -t 2011042359 after
find all the files between the 2 dummy files and "rm -rf" the result
find . -newer before \! -newer after -exec rm -rf {} \;
ls -tr | perl -ne '{#files = <>; print #files[0..$#files-5 ]}' | xargs -n1 echo rm -rf
You would remove the echo before the rm -rf to get it to work.
The trick will be the -t option to ls, which sorts by modification time, from newest to oldest.
A really naive solution, using a temporary file, might go this way:
ls -t > /tmp/file_list
num_files_to_keep=5
# wc -l gets the line count of a file
# first word of wc output is the actual line count, and that's all we need, so
# delete everything after the space.
num_files=`wc -l /tmp/file_list | sed "s/ .*//"`
#if appropriate you should add a check for num_files < num_files_to_keep
num_files_to_delete=$(( $num_files - $num_files_to_keep ))
rm `tail -n $num_files_to_delete /tmp/file_list`

Resources