Extract file using bash script - bash

I created a script which will extract all *.tar.gz file. This file is decompressed five times .tar.gz file, but the problem is that only the first *.tar.gz file is being extracted.
for file in *.tar.gz; do
gunzip -c "$file" | tar xf -
done
rm -vf "$file"
What should I do this? Answers are greatly appreciated.

If your problem is that the tar.gz file contains another tar.gz file which should be extracted as well, you need a different sort of loop. The wildcard at the top of the for loop is only evaluated when the loop starts, so it doesn't include anything extracted from the tar.gz
You could try something like
while true; do
for f in *.tar.gz; do
case $f in '*.tar.gz') exit 0;; esac
tar zxf "$f"
rm -v "$f"
done
done
The case depends on the fact that (by default) when no files match the wildcard, it remains unexpanded. You may have to change your shell's globbing options if they differ from the default.
If you really mean that it is compressed (not decompressed) five times, despite the single .gz extension, perhaps you need instead
for i in 1 2 3 4; do
gunzip file.tar.gz
mv file.tar file.tar.gz
done
tar zxf file.tar.gz

Related

Extracting specific file from a tar.bz2 containing a matching pattern

So I have this one big tarball:
du -sh file.tar.bz2
871M file.tar.bz2
This tarball contains hundreds of files:
tar -jtvf file.tar.bz2 | head -3
./file-140556-001_045.txt
./file-121720-001_012.txt
./file-171008-001_036.txt
And I can do a bzgrep no problem:
bzgrep '0316629989093' file.tar.bz2
Binary file (standard input) matches
And using bzgrep -a I can extract the line containing the search pattern. But what I was trying to accomplish is getting the file name inside the tarball that matches the search pattern, so I can extract it without uncompressing the whole tarball.
For example: ./file-171008-001_036.txt
Is there any way to do this from the bzgrep command?
I tried all possible options on bzgrep and it seems not possible to extract the filenames matching the pattern. That's too bad.
What you can do as a workaround is to extract files one by one and delete them after you searched into them.
Something like this :
#!/bin/bash
ARCHIVE="file.tar.bz2"
PATTERN="0316629989093"
tar -jtf "$ARCHIVE" | while read file; do
tar -xf "$ARCHIVE" "$file"
grep -q "$PATTERN" "$file" && echo "$file matches"
rm "$file"
done
Outputs
file-171008-001_036.txt matches
Pros: not all the file are uncompressed at once, so disk usage is limited.
Cons: all the archive is decompressed, so the execution time is pretty bad.

Unpack .tar.gz and modify result files

I wanted to write a bash script that will unpack .tar.gz archives and for each result file it will set an additional attribute with the name of the original archive. Just to know what the origin is of the unpacked file.
I tried to store the inside files in an array and then for-loop them.
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ]
then
readarray -t fileNames < <(tar tzf "$archive")
for file in "${fileNames}"; do
echo "${file}"
tar xvzf "${archive}" -C "$1" --no-wildcards "${file}" &&
attr -s package -V "${archive}" "${file}"
done
fi
done
The result is that only one file is extracted and no extra attribute is set.
#! /bin/bash
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ] ; then
# Unpack the archive into subfolder $1
tar xvf "$archive" -C "$1"
# Assign attributes
tar tf "$archive" | (cd "$1" && xargs -t -L1 attr -s package -V "$archive" )
fi
done
Notes:
Script is unpacking each archive with a single 'tar'. This is more efficient than unpacing one file at a time. It also avoid issues with unpacking folders, which will lead to unnecessary repeated work.
Script is using 'attr'. Will be better to use 'setfattr', if supported on target file system to set attributes on multiple files with a few calls (using xargs, with multiple files per command)
It is not clear what is the structure of the output folder. From the question, it looks as if all archives will be placed into the same folder "$1". The following solution assume that this is the intended behavior, and that each archive will have distinct file names. If each archive is to be placed into different sub folder, it will be easier/more efficient to implement.

Tar compress files when some can be missing

I am writing a bash script that pulls files from another server to the current directory. The issue is that I get a lot of files and I only need ~3 of them; however all 3 might not be there.
For example, make server all:
server call --> file1.txt file2.txt file3.xls file4.json .... (etc)
Then compress files with tar:
tar zcf needed_files.tgz file4.json file23.doc *.txt
But file4.json was not there, so I would expect tar to compress file23.doc and all .txt files but the script fails with:
tar: file4.json: Cannot stat: No such file or directory
I have tried other combinations of tar commands like czvf but no luck.
tar should successfully compress the existing files despite the "no such file or directory" errors.
Anyway, you could also use nullglob in combination with extglob #() to get only the existing files:
shopt -s extglob nullglob
files=( "fileA"#() "fileB"#() *.txt )
(( ${#files[#]} )) && tar zcf needed_files.tgz -- "${files[#]}"
Try an extended glob.
shopt -s extglob # set extended globbing on
if echo file[1234].+(txt|xls|json) | grep -vq '\['
then tar cvzf needed_files.tgz file[1234].+(txt|xls|json)
else echo No matching files for extglob 'file[1234].+(txt|xls|json)'
fi
If matching files exist, it will list them.
If not, it will literally echo back the pattern.
grepping out the pattern metacharacters tells you whether there are any files in the set. If they do exist, use the same glob to provide the files to tar, and it will receive exactly the set of matching files. If they don't, the condition test lets you skip it.
Of course, it breaks if you make files with [ in the names, etc...
Or, you could do it in a loop....
for f in file[1234].+(txt|xls|json)
do if [[ -e "$f" ]]
then [[ -e needed_files.tar ]] && c=r || c=c
tar ${c}vf needed_files.tar "$f"
fi
done
Not perfect, but might suit your tastes better.
Neither is a great solution, but one of them ought to get you rolling.
tar zcf needed_files.tgz $(ls -d file4.json file23.doc *.txt 2>/dev/null)
Notice that prints only existing files
ls -d file4.json file23.doc *.txt 2>/dev/null
Also you can use --ignore-failed-read option, but it will also ignore other read errors.

Shell: Copy list of files with full folder structure stripping N leading components from file names

Consider a list of files (e.g. files.txt) similar (but not limited) to
/root/
/root/lib/
/root/lib/dir1/
/root/lib/dir1/file1
/root/lib/dir1/file2
/root/lib/dir2/
...
How can I copy the specified files (not any other content from the folders which are also specified) to a location of my choice (e.g. ~/destination) with a) intact folder structure but b) N folder components (in the example just /root/) stripped from the path?
I already managed to use
cp --parents `cat files.txt` ~/destination
to copy the files with an intact folder structure, however this results in all files ending up in ~/destination/root/... when I'd like to have them in ~/destination/...
I think I found a really nice an concise solution by using GNU tar:
tar cf - -T files.txt | tar xf - -C ~/destination --strip-components=1
Note the --strip-components option that allows to remove an arbitrary number of path components from the beginning of the file name.
One minor problem though: It seems tar always "compresses" the whole content of folders mentioned in files.txt (at least I couldn't find an option to ignore folders), but that is most easily solved using grep:
cat files.txt | grep -v '/$' > files2.txt
This might not be the most graceful solution - but it works:
for file in $(cat files.txt); do
echo "checking for $file"
if [[ -f "$file" ]]; then
file_folder=$(dirname "$file")
destination_folder=/destination/${file_folder#/root/}
echo "copying file $file to $destination_folder"
mkdir -p "$destination_folder"
cp "$file" "$destination_folder"
fi
done
I had a look at cp and rsync, but it looks like they would benefit more if you to cd into /root first.
However, if you did cd to the correct directory before hand, you could always run it as a subshell so that you would be returned to your original location once the subshell has finished.

Collapse nested directories in bash

Often after unzipping a file I end up with a directory containing nothing but another directory (e.g., mkdir foo; cd foo; tar xzf ~/bar.tgz may produce nothing but a bar directory in foo). I wanted to write a script to collapse that down to a single directory, but if there are dot files in the nested directory it complicates things a bit.
Here's a naive implementation:
mv -i $1/* $1/.* .
rmdir $1
The only problem here is that it'll also try to move . and .. and ask overwrite ./.? (y/n [n]). I can get around this by checking each file in turn:
IFS=$'\n'
for file in $1/* $1/.*; do
if [ "$file" != "$1/." ] && [ "$file" != "$1/.." ]; then
mv -i $file .
fi
done
rmdir $1
But this seems like an inelegant workaround. I tried a cleaner method using find:
for file in $(find $1); do
mv -i $file .
done
rmdir $1
But find $1 will also give $1 as a result, which gives an error of mv: bar and ./bar are identical.
While the second method seems to work, is there a better way to achieve this?
Turn on the dotglob shell option, which allows the your pattern to match files beginning with ..
shopt -s dotglob
mv -i "$1"/* .
rmdir "$1"
First, consider that many tar implementations provide a --strip-components option that allows you to strip off that first path. Not sure if there is a first path?
tar -tf yourball.tar | awk -F/ '!s[$1]++{print$1}'
will show you all the first-level contents. If there is only that one directory, then
tar --strip-components=1 -tf yourball.tar
will extract the contents of that directory in tar into the current directory.
So that's how you can avoid the problem altogether. But it's also a solution to your immediate problem. Having extracted the files already, so you have
foo/bar/stuff
foo/bar/.otherstuff
you can do
tar -cf- foo | tar --strip-components=2 -C final_destination -xf-
The --strip-components feature is not part of the POSIX specification for tar, but it is on both the common GNU and OSX/BSD implementations.

Resources