Unpack .tar.gz and modify result files - bash

I wanted to write a bash script that will unpack .tar.gz archives and for each result file it will set an additional attribute with the name of the original archive. Just to know what the origin is of the unpacked file.
I tried to store the inside files in an array and then for-loop them.
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ]
then
readarray -t fileNames < <(tar tzf "$archive")
for file in "${fileNames}"; do
echo "${file}"
tar xvzf "${archive}" -C "$1" --no-wildcards "${file}" &&
attr -s package -V "${archive}" "${file}"
done
fi
done
The result is that only one file is extracted and no extra attribute is set.

#! /bin/bash
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ] ; then
# Unpack the archive into subfolder $1
tar xvf "$archive" -C "$1"
# Assign attributes
tar tf "$archive" | (cd "$1" && xargs -t -L1 attr -s package -V "$archive" )
fi
done
Notes:
Script is unpacking each archive with a single 'tar'. This is more efficient than unpacing one file at a time. It also avoid issues with unpacking folders, which will lead to unnecessary repeated work.
Script is using 'attr'. Will be better to use 'setfattr', if supported on target file system to set attributes on multiple files with a few calls (using xargs, with multiple files per command)
It is not clear what is the structure of the output folder. From the question, it looks as if all archives will be placed into the same folder "$1". The following solution assume that this is the intended behavior, and that each archive will have distinct file names. If each archive is to be placed into different sub folder, it will be easier/more efficient to implement.

Related

How to decompress multiple nested archives of different formats?

I have hundreds of .zip and .tar archives nested in each other with the unknown depth and I need to decompress all of them to get to the last one, how can I achieve that?
I have the part for the zip files:
while 'true'
do
find . '(' -iname '*.zip' ')' -exec sh -c 'unzip -o -d "${0%.*}" "$0"' '{}' ';'
done
but once it stumbles upon the .tar file it expectedly does nothing. I'm running the script on mac.
The structure is just an archive in an archive, the extensions are not in any particular order, like:
a.zip/b.zip/c.tar/d.tar/e.zip/f.tar...
and so on
You can use an existing command like 7z x to extract either archive type or build your own using case "$file" in; *.zip) unzip ...;; *.tar) ... and so on.
The following script unpacks nested archives as long as the unpacked content is exactly one .tar or .zip archive. It stops when multiple archives, multiple files, or even directory containing just one .zip, were unpacked at once.
#! /usr/bin/env bash
# this function can be replaced by `7z x "$1"`
# if 7zip is installed (package managers often call it p7zip)
extract() {
case "$1" in
*.zip) unzip "$1" ;;
*.tar) tar -xf "$1" ;;
*) echo "Unknown archive type: $1"; exit 1 ;;
esac
}
isOne() {
[ $# = 1 ]
}
mkdir out tmp
ln {,out/}yourOutermostArchive.zip # <-- Adapt this line
cd out
shopt -s nullglob
while isOne * && isOne *.{zip,tar}
do
a=(*)
mv "$a" ../tmp/
extract "../tmp/$a"
rm "../tmp/$a"
done
rm -r ../tmp
cd ..

Shell: Copy list of files with full folder structure stripping N leading components from file names

Consider a list of files (e.g. files.txt) similar (but not limited) to
/root/
/root/lib/
/root/lib/dir1/
/root/lib/dir1/file1
/root/lib/dir1/file2
/root/lib/dir2/
...
How can I copy the specified files (not any other content from the folders which are also specified) to a location of my choice (e.g. ~/destination) with a) intact folder structure but b) N folder components (in the example just /root/) stripped from the path?
I already managed to use
cp --parents `cat files.txt` ~/destination
to copy the files with an intact folder structure, however this results in all files ending up in ~/destination/root/... when I'd like to have them in ~/destination/...
I think I found a really nice an concise solution by using GNU tar:
tar cf - -T files.txt | tar xf - -C ~/destination --strip-components=1
Note the --strip-components option that allows to remove an arbitrary number of path components from the beginning of the file name.
One minor problem though: It seems tar always "compresses" the whole content of folders mentioned in files.txt (at least I couldn't find an option to ignore folders), but that is most easily solved using grep:
cat files.txt | grep -v '/$' > files2.txt
This might not be the most graceful solution - but it works:
for file in $(cat files.txt); do
echo "checking for $file"
if [[ -f "$file" ]]; then
file_folder=$(dirname "$file")
destination_folder=/destination/${file_folder#/root/}
echo "copying file $file to $destination_folder"
mkdir -p "$destination_folder"
cp "$file" "$destination_folder"
fi
done
I had a look at cp and rsync, but it looks like they would benefit more if you to cd into /root first.
However, if you did cd to the correct directory before hand, you could always run it as a subshell so that you would be returned to your original location once the subshell has finished.

Gzip no such file or directory error, still zips files

I'm just learning shell scripting specifically in bash, I want to be able to use gzip to take files from a target directory and send them to a different directory. I enter directories in the command line. ext is for the extensions I want to zip and file will be the new zipped file. My script zips the files correctly, to and from the desired directories, but I get a no such file or directory error. How do I avoid this?
Current code
cd $1
for ext in $*; do
for file in `ls *.$ext`; do
gzip -c $file > $2/$file.gz
done
done
and my I/O
blackton#ltsp-amd64-charlie:~/Desktop/60256$ bash myCompress /home/blackton/Desktop/ /home/blackton/ txt
ls: cannot access *./home/blackton/Desktop/: No such file or directory
ls: cannot access *./home/blackton/: No such file or directory
gzip: alg: No such file or directory
gzip: proj.txt: No such file or directory
There are two separate things causing problems here.
In your outer loop
for ext in $*; do
done
you are looping over all the command line parameters, using each as the extension to search for - including the directory names.
Since the extension is the third parameter, you only want to run the inner loop once on $3:
for file in `ls *.$3`; do
gzip -c $file > $2/$file.gz
done
The next problem is spaces.
You do not want to run ls here - the wildcard expansion will provide the filenames directly, e.g. for file in *.$3, and it will fill $file with a whole filename at a time. The output from ls is split on each space, so you end up with two filenames alg and proj.txt, instead of one alg proj.txt.
That is not enough by itself, though. You also need to quote $file whenever you use it, so the command expands to gzip -c "alg proj.txt" instead of gzip -c alg proj.txt, which tells gzip to compress two files. In general, all variable expansions that you expect to be a filename should be quoted:
cd "$1"
for file in *."$3"; do
gzip -c "$file" > "$2/$file.gz"
done
One further problem is that if there are no files matching the extension, the wildcard will not expand and the command executed will be
gzip -c "*.txt" > "dir/*.txt.gz"
This will create a file that is literally called "*.txt.gz" in the target directory. A simple way to avoid this would be to check that the original file exists first - this will also avoid accidentally trying to gzip an oddly named directory.
cd "$1"
for file in *."$3"; do
if [ -f "$file" ]; then
gzip -c "$file" > "$2/$file.gz"
fi
done
you can try this;
#!/bin/bash
Src=$1
Des=$2
ext="txt"
for file in $Src/*; do
if [ "${file##*.}" = "${ext}" ]; then
base=$(basename $file)
mkdir -p $2 #-p ensures creation if directory does not exist
gzip -c $file > $Des/$base.gz
fi
done

Extract file using bash script

I created a script which will extract all *.tar.gz file. This file is decompressed five times .tar.gz file, but the problem is that only the first *.tar.gz file is being extracted.
for file in *.tar.gz; do
gunzip -c "$file" | tar xf -
done
rm -vf "$file"
What should I do this? Answers are greatly appreciated.
If your problem is that the tar.gz file contains another tar.gz file which should be extracted as well, you need a different sort of loop. The wildcard at the top of the for loop is only evaluated when the loop starts, so it doesn't include anything extracted from the tar.gz
You could try something like
while true; do
for f in *.tar.gz; do
case $f in '*.tar.gz') exit 0;; esac
tar zxf "$f"
rm -v "$f"
done
done
The case depends on the fact that (by default) when no files match the wildcard, it remains unexpanded. You may have to change your shell's globbing options if they differ from the default.
If you really mean that it is compressed (not decompressed) five times, despite the single .gz extension, perhaps you need instead
for i in 1 2 3 4; do
gunzip file.tar.gz
mv file.tar file.tar.gz
done
tar zxf file.tar.gz

Extract a .tgz into specific subfolder only if there are files in the tar that would extract to my CWD

Most tar files extract into their own subfolder (because the people that write open source utilities are amazing people).
Some extract into my cwd, which clutters everything up. I know there's a way to see what's in the tar, but I want to write a bash script that essentially guarantees I won't end up with 15 files extracted into my home folder.
Any pointers?
pseudo code:
if [listing of tar files] has any file that doesn't have a '/' in it:
mkdir [tar filename without extension]
tar xzvf [tar filename] into [the new folder]
else:
tar xzvf [tar filename] into cwd
EDIT:
Both solutions are great, I chose the below solution because I was asking for a bash script, and it doesn't rely on extra software.
However, on my own machine, I am using aunpack because it can handle many, many more formats.
I am using it with a shell script that downloads and unpacks all at once. Here is what I am using:
#!/bin/bash
wget -o temp.log --content-disposition $1
old=IFS
IFS='
'
r=`cat temp.log`
rm temp.log
for line in $r; do
substring=$(expr "$line" : 'Saving to: `\(.*\)'\')
if [ "$substring" != "" ]
then
aunpack $substring
rm $substring
IFS=$old
exit
fi
done
IFS=$old
The aunpack command from the atool package does that:
aunpack extracts files from an archive. Often one wants to extract all
files in an archive to a single subdirectory.
However, some archives contain multiple files in their root
directories. The aunpack program overcomes this problem
by first extracting files to a unique (temporary)
directory, and then moving its contents back if possible. This
also prevents local files from being overwritten by mistake.
You can use combination of tar options to achieve this:
tar option for listing is:
-t, --list
list the contents of an archive
tar option to extract into different directory is:
-C, --directory DIR
change to directory DIR
So in your script you can list the files & check if there are any files in the listing which do not have "/" and based on that output you can call tar with appropriate options.
Sample for your reference is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for all the entries w/o "/" in their names using grep -v
# Count the number of such entries using wc -l, if the count is > 0, create directory
if [ `tar -tf ${TAR_FILE} |grep -v "/"|wc -l` -gt 0 ];then
echo "Found file(s) which is(are) not in any directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
[ -d ${DIR_NAME} ] || mkdir ${DIR_NAME}
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
In some cases the tar file may contain different directories. If you find it a little annoying to look for different directories which are extracted by the same tar file then the script can be modified to create a new directory even if the listing contains different directories. The slightly advanced sample is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for only directory names using cut,
# Current cut option used lists each files as different entry
# Count the number unique directories, if the count is > 1, create directory
if [ `tar -tf ${TAR_FILE} |cut -d '/' -f 1|uniq|wc -l` -gt 1 ];then
echo "Found file(s) which is(are) not in same directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
# If directory exists prompt user to enter directory to extract to
# It can be a new or existing directory
if [ -d ${DIR_NAME} ];then
echo "${DIR_NAME} exists. Enter (new/existing) directory to extract to"
read NEW_DIR_NAME
# Test if the user entered directory exists, if not then create it
[ -d ${NEW_DIR_NAME} ] || mkdir ${NEW_DIR_NAME}
else
mkdir ${DIR_NAME}
fi
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
Hope this helps!

Resources