Parallel processing of untar/remove in unix shell script - shell

Question:
I want to untar a tarfile which has many tar files within itself and remove the files in all the tar files and I want all of these processes to run in parallel in Unix bash scripting.
Conditions:
The script should return an error if any untar/remove process has any error.
It should only return success after all N (untar and remove) processes complete successfully.
Proposed solution:
mkdir a
tar -C a -xvf b.tar
cd a
for i in *
do
rm -r $i &
done

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | parallel rm
It is useful if you do not have space to extract the full tar.gz file, but you need to process files as you unpack them:
tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | parallel do_stuff {}\; rm {}
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

mkdir a
tar -C a -xvf b.tar
cd a
success=$(for i in *
do
rm -r $i || echo failed & # if a job fails false will be echoed
done
wait)
# if any of the jobs failed, success will be set to a value other than ""
[[ -z "$success" ]] && exit 0 || exit 1

The answer tar xvf a.tar | tac | xargs -P 4 rm -rv is inspired from Burton Samograd's comment about xargs -P
$ mkdir -p a/b/c/d
mkdir: created directory `a'
mkdir: created directory `a/b'
mkdir: created directory `a/b/c'
mkdir: created directory `a/b/c/d'
$ touch a/1 a/2 a/3 a/b/4 a/b/5
$ tar cf a.tar a
$ rm -rfv a
removed directory: `a/b/c/d'
removed directory: `a/b/c'
removed `a/b/4'
removed `a/b/5'
removed directory: `a/b'
removed `a/3'
removed `a/1'
removed `a/2'
removed directory: `a'
$ tar xvf a.tar | tac | xargs -P 4 rm -rv
removed `a/2'
removed `a/1'
removed `a/3'
removed `a/b/5'
removed `a/b/4'
removed directory: `a/b/c/d'
removed directory: `a/b/c'
removed directory: `a/b'
removed directory: `a'

Related

Extract from tar file to different directory in Bash

I'm new to Bash and trying to unzip a tarball. Code so far:
#!/bin/bash
tar="/cdrom/java/jre1-8u181-x64tar.gz"
# Unpack tarball
gunzip < $tar | tar xf -
This extracts the archive in current directory. How can I specify a location?
Using Solaris 10, Bash 3.2.51
This works pretty well everywhere - including Solaris, and as you only change directory in a sub-shell, it doesn't affect your location in the current session:
gunzip < $tar | ( cd /some/where/else && tar xf -)
To extract the file to a specific directory
gunzip < $tar | tar -xf - --directory /path/to/extract/to
or
gunzip < $tar | tar -xf - -C /path/to/extract/to
As you wrote your command is unpacking in the current directory:
gunzip < $tar | tar xf -
Add the "-C" option to give it an alternate target directory:
gunzip < $tar | tar xf - -C /another/target/directory
Note that the Solaris tar does not understand the --directory option.
See the Solaris tar manpage.
Just for the sake of completeness if you have Gnu-Tar (which is available for Solaris too) you can use this simpler command (which unzips and unpacks in one go):
tar xzf $tar -C /another/target/directory
On a side note:
many people use a leading dash for the tar command parameters. That is redundant.
See the answers to this question if you are interested.
The -xf part of tar means to extract into the "f" file. try changing the tar command to something like
Edit
...| tar -xf - -C /path/to/your/desired/result/folder
sorry, #pitseeker is correct. The -C option tells tar to change directory then do the extract

How to remove empty directories from a tarball in-place

I extracted a layer from a docker image which archived in a file called layer.tar. I want to remove empty directories from it.
I don't want to unpack then repack files in that archive, I want to keep the original info, so I want to do it in-place.
I know how to delete files from tar but I don't know any simple method to delete empty directories in-place.
Let's create a archive t.tar with a/b/c/ and a/b/c/d/ empty directories:
mkdir -p dir
cd dir
mkdir -p a/b/c/d
mkdir -p 1/2/3/4
touch a/fil_ea a/b/file_ab # directory a/b/c and a/b/c/d are empty
touch 1/2/3/file_123 1/2/3/4/file_1234 # directories 1/2/3/4 not empty
tar cf ../t.tar a 1
cd ..
Using tar tf and some filtering we can extract the directories and files in a tar archive. Then for each directory in tmpdirs we can check if it has any files in tmpfiles with a simple grep and then remove those directories using --delete tar option:
tar tf t.tar | tee >(grep '/$' > tmpdirs) | grep -v '/$' > tmpfiles
cat tmpdirs | xargs -n1 -- sh -c 'grep -q "$1" tmpfiles || echo "$1"' -- \
| tac \
| xargs -- tar --delete -f t.tar
Not that tac is a bit unneeded, but the files where sorted alphabetically in tar, so when tar removes the directory a/b/c/ with all subdirectories first and then tries to remove a/b/c/d/ directory it fails with an Not found in archive in error. tac is a cheap way to fix that, so tar first removes a/b/c/d/ and then a/b/c/.

Script in Bash to delete

I am trying to delete all files containing the name TRAR in the filename. This is for a Linux system and this is my first time doing such a script, below is what I have tried, but it does not work
cd /appl/virtuo/gways/input_d
rm -rf TRAR*
When I manually enter the directory and run rm -rf TRAR* , all the files are removed, I need this script to work so that it can be added to run via a cronjob..
VENDOR=ericsson-msc
RELEASE=R13.2
BASE_DIR=/appl/virtuo/gways
RAW_DIR=${BASE_DIR}/config/${VENDOR}/${RELEASE}/trdipfile_raw_landing_area
#rm -rf $RAW_DIR/*
cd ${RAW_DIR}
ssh netperf#10.76.26.1 "cd /var/opt/ericsson/sgw/outputfiles/apgfiles/oms ; find . -newer ~/msc- trdif-timestamp -type f | egrep TRDIP | cpio -oc ; touch ~/msc-trdif-timestamp" 2>/dev/null | cpio - icdu 2>/dev/null
If you run this script by crontab then you should add
!#/bin/sh to first line of file. Anc change permissions for this file. For example
chmod 755 script.sh
Or you can add to crontab command as /bin/sh /<folder with scripts>/script.sh

How do I write a shell script to remove the unzipped files in a wrong directory?

I accidentally unzipped files into a wrong directory, actually there are hundreds of files... now the directory is messed up with the original files and the wrongly unzip files. I want to pick the unzipped files and remove them using shell script, e.g.
$unzip foo.zip -d test_dir
$cd target_dir
$ls test_dir | rm -rf
nothing happened, no files were deleted, what's wrong with my command ? Thanks !
The following script has two main benefits over the other answers thus far:
It does not require you to unzip a whole 2nd copy to a temp dir (I just list the file names)
It works on files that may contain spaces (parsing ls will break on spaces)
while read -r _ _ _ file; do
arr+=("$file")
done < <(unzip -qql foo.zip)
rm -f "${arr[#]}"
Right way to do this is with xargs:
$find ./test_dir -print | xargs rm -rf
Edited Thanks SiegeX to explain to me OP question.
This 'read' wrong files from test dir and remove its from target dir.
$unzip foo.zip -d /path_to/test_dir
$cd target_dir
(cd /path_to/test_dir ; find ./ -type f -print0 ) | xargs -0 rm
I use find -0 because filenames can contain blanks and newlines. But if not is your case, you can run with ls:
$unzip foo.zip -d /path_to/test_dir
$cd target_dir
(cd /path_to/test_dir ; ls ) | xargs rm -rf
before to execute you should test script changing rm by echo
Try
for file in $( unzip -qql FILE.zip | awk '{ print $4 }'); do
rm -rf DIR/YOU/MESSED/UP/$file
done
unzip -l list the content with a bunch of information about the zipped files. You just have to grep the file name out of it.
EDIT: using -qql as suggested by SiegeX
The following worked for me (bash)
unzip -l filename.zip | awk '{print $NF}' | xargs rm -Rf
Do this:
$ ls test_dir | xargs rm -rf
You need ls test_dir | xargs rm -rf as your last command
Why:
rm doesn't take input from stdin so you can't pipe the list of files to it. xargs takes the output of ls command and presents it to rm as input so that it can delete it.
Compacting the previous one. Run this command in the /DIR/YOU/MESSED/UP
unzip -qql FILE.zip | awk '{print "rm -rf " $4 }' | sh
enjoy

untar filename.tr.gz to directory "filename"

I would like to untar an archive e.g. "tar123.tar.gz" to directory /myunzip/tar123/" using a shell command.
tar -xf tar123.tar.gz will decompress the files but in the same directory as where I'm working in.
If the filename would be "tar233.tar.gz" I want it to be decompressed to /myunzip/tar233.tar.gz" so destination directory would be based on the filename.
Does anyone know if the tar command can do this?
tar -xzvf filename.tar.gz -C destination_directory
With Bash and GNU tar:
file=tar123.tar.gz
dir=/myunzip/${file%.tar.gz}
mkdir -p $dir
tar -C $dir -xzf $file
You can change directory before extracting with the -C flag, but the directory has to exist already. (If you create file-specific directories, I strongly recommend against calling them foo.tar.gz - the extension implies that it's an archive file but it's actually a directory. That will lead to confusion.)
Try
file=tar123.tar.gz
dir=/myunzip/$(basename $file .tar.gz) # matter of taste and custom here
[ -d "$dir" ] && { echo "$dir already exists" >&2; exit 1; }
mkdir "$dir" && ( gzip -d "$file | ( cd "$dir" && tar xf - ) )
If you're using GNU tar you can also give it an option -C "$dir" which will cause it to change to the directory before extracting. But the code above should work even with a Bronze Age tar.
Disclaimer: none of the code above has been tested.

Resources