Extract a .tgz into specific subfolder only if there are files in the tar that would extract to my CWD - bash

Most tar files extract into their own subfolder (because the people that write open source utilities are amazing people).
Some extract into my cwd, which clutters everything up. I know there's a way to see what's in the tar, but I want to write a bash script that essentially guarantees I won't end up with 15 files extracted into my home folder.
Any pointers?
pseudo code:
if [listing of tar files] has any file that doesn't have a '/' in it:
mkdir [tar filename without extension]
tar xzvf [tar filename] into [the new folder]
else:
tar xzvf [tar filename] into cwd
EDIT:
Both solutions are great, I chose the below solution because I was asking for a bash script, and it doesn't rely on extra software.
However, on my own machine, I am using aunpack because it can handle many, many more formats.
I am using it with a shell script that downloads and unpacks all at once. Here is what I am using:
#!/bin/bash
wget -o temp.log --content-disposition $1
old=IFS
IFS='
'
r=`cat temp.log`
rm temp.log
for line in $r; do
substring=$(expr "$line" : 'Saving to: `\(.*\)'\')
if [ "$substring" != "" ]
then
aunpack $substring
rm $substring
IFS=$old
exit
fi
done
IFS=$old

The aunpack command from the atool package does that:
aunpack extracts files from an archive. Often one wants to extract all
files in an archive to a single subdirectory.
However, some archives contain multiple files in their root
directories. The aunpack program overcomes this problem
by first extracting files to a unique (temporary)
directory, and then moving its contents back if possible. This
also prevents local files from being overwritten by mistake.

You can use combination of tar options to achieve this:
tar option for listing is:
-t, --list
list the contents of an archive
tar option to extract into different directory is:
-C, --directory DIR
change to directory DIR
So in your script you can list the files & check if there are any files in the listing which do not have "/" and based on that output you can call tar with appropriate options.
Sample for your reference is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for all the entries w/o "/" in their names using grep -v
# Count the number of such entries using wc -l, if the count is > 0, create directory
if [ `tar -tf ${TAR_FILE} |grep -v "/"|wc -l` -gt 0 ];then
echo "Found file(s) which is(are) not in any directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
[ -d ${DIR_NAME} ] || mkdir ${DIR_NAME}
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
In some cases the tar file may contain different directories. If you find it a little annoying to look for different directories which are extracted by the same tar file then the script can be modified to create a new directory even if the listing contains different directories. The slightly advanced sample is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for only directory names using cut,
# Current cut option used lists each files as different entry
# Count the number unique directories, if the count is > 1, create directory
if [ `tar -tf ${TAR_FILE} |cut -d '/' -f 1|uniq|wc -l` -gt 1 ];then
echo "Found file(s) which is(are) not in same directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
# If directory exists prompt user to enter directory to extract to
# It can be a new or existing directory
if [ -d ${DIR_NAME} ];then
echo "${DIR_NAME} exists. Enter (new/existing) directory to extract to"
read NEW_DIR_NAME
# Test if the user entered directory exists, if not then create it
[ -d ${NEW_DIR_NAME} ] || mkdir ${NEW_DIR_NAME}
else
mkdir ${DIR_NAME}
fi
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
Hope this helps!

Related

change unzipped folder name

I have a fairly large number of directories (500+), each directory (and possible sub-directories) contains 4 or more zip files.
I managed to piece together a bash script that unzips the compressed files while maintaining zip filename as directory and all the directory hierarchy.
For example: If I have a zip file called 100011_test123.zip, and it contains 10 files. The script will uncompress all the files into 100011_test123/ directory.
The occurrence of numbers 100010 before the underscore in the filename/directoryname is totally random.
Here's the actual bash script:
#!/bin/bash
cd <directory-with-large-number-of-zip-files>
find . -name "*.zip" | while read filename; do unar -d -o "`dirname "$filename"`" "$filename"; done;
find . -name "*.zip" -type f -delete
Now I would like to update the script in order to remove the 100010_ from the .zip filename without tampering with the directory structure/hierarchy (I guess there's a way to rename the zip files before using unar command) and then uncompress the files into a directory without 100010_ at the beginning.
I have been stuck with this for more than 3 days. Any insights on this would be highly appreciated.
Thank you.
With all zip files at the same level, you don't need find, but a regular filename pattern globbing will do to iterate each zip archive.
And with bash's globstar option, you can also find the zip archives inside sub-directories
#!/usr/bin/env bash
shopt -s nullglob # Prevents iterating if no filename match
shopt -s globstar # ./**/ Allow searching inside sub-directories
# Set the basedir if you want all output directories at same place
#basedir="$PWD"
for zipfile in ./**/*.zip; do
# Extract the base directory containing the archive
zipdir="${zipfile%/*}"
# Extract the base name without the directory path
basename="${zipfile##*/}"
# Remove the .zip extension
# 100011_test123.zip -> 100011_test123
extensionless="${basename%.zip}"
# Remove everything before and first underscore 100011_
# 100011_test123 -> test123
outputdir="${basedir:-$zipdir}/${extensionless#*_}"
# Create output directory or continue with next archive
# mkdir -p test123
mkdir -p "$outputdir" || continue
# Unzip the zipfile into the outputdir and remove the zipfile if successful
# unrar -d -o test123 100011_test123.zip && rm -f -- 100011_test123.zip
unar -d -o "$outputdir" "$zipfile" && rm -f -- "$zipfile"
done
You need to parse directory name and filename first for each entry. Please check the ${fullpath%/*} and ${fullpath##*/} for this purpose. And awk for splitting filename with '_' and getting second part of it.
You can try following code.
#!/bin/bash
# cd directory
zip_files=($(find . -name "*.zip"))
for fullpath in "${zip_files[#]}"; do
echo "Processing: "$fullpath""
DIRNAME="${fullpath%/*}"
FILENAME="${fullpath##*/}"
NEW_FILENAME="`echo $FILENAME | awk -F'_' '{print $NF}'`"
echo " DIRNAME="$DIRNAME
echo " NEW_FILENAME="$NEW_FILENAME
mv $fullpath "$DIRNAME/$NEW_FILENAME"
# call unar command
unar -d -o $DIRNAME $NEW_FILENAME
# delete file if you want
done

Unpack .tar.gz and modify result files

I wanted to write a bash script that will unpack .tar.gz archives and for each result file it will set an additional attribute with the name of the original archive. Just to know what the origin is of the unpacked file.
I tried to store the inside files in an array and then for-loop them.
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ]
then
readarray -t fileNames < <(tar tzf "$archive")
for file in "${fileNames}"; do
echo "${file}"
tar xvzf "${archive}" -C "$1" --no-wildcards "${file}" &&
attr -s package -V "${archive}" "${file}"
done
fi
done
The result is that only one file is extracted and no extra attribute is set.
#! /bin/bash
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ] ; then
# Unpack the archive into subfolder $1
tar xvf "$archive" -C "$1"
# Assign attributes
tar tf "$archive" | (cd "$1" && xargs -t -L1 attr -s package -V "$archive" )
fi
done
Notes:
Script is unpacking each archive with a single 'tar'. This is more efficient than unpacing one file at a time. It also avoid issues with unpacking folders, which will lead to unnecessary repeated work.
Script is using 'attr'. Will be better to use 'setfattr', if supported on target file system to set attributes on multiple files with a few calls (using xargs, with multiple files per command)
It is not clear what is the structure of the output folder. From the question, it looks as if all archives will be placed into the same folder "$1". The following solution assume that this is the intended behavior, and that each archive will have distinct file names. If each archive is to be placed into different sub folder, it will be easier/more efficient to implement.

Gzip no such file or directory error, still zips files

I'm just learning shell scripting specifically in bash, I want to be able to use gzip to take files from a target directory and send them to a different directory. I enter directories in the command line. ext is for the extensions I want to zip and file will be the new zipped file. My script zips the files correctly, to and from the desired directories, but I get a no such file or directory error. How do I avoid this?
Current code
cd $1
for ext in $*; do
for file in `ls *.$ext`; do
gzip -c $file > $2/$file.gz
done
done
and my I/O
blackton#ltsp-amd64-charlie:~/Desktop/60256$ bash myCompress /home/blackton/Desktop/ /home/blackton/ txt
ls: cannot access *./home/blackton/Desktop/: No such file or directory
ls: cannot access *./home/blackton/: No such file or directory
gzip: alg: No such file or directory
gzip: proj.txt: No such file or directory
There are two separate things causing problems here.
In your outer loop
for ext in $*; do
done
you are looping over all the command line parameters, using each as the extension to search for - including the directory names.
Since the extension is the third parameter, you only want to run the inner loop once on $3:
for file in `ls *.$3`; do
gzip -c $file > $2/$file.gz
done
The next problem is spaces.
You do not want to run ls here - the wildcard expansion will provide the filenames directly, e.g. for file in *.$3, and it will fill $file with a whole filename at a time. The output from ls is split on each space, so you end up with two filenames alg and proj.txt, instead of one alg proj.txt.
That is not enough by itself, though. You also need to quote $file whenever you use it, so the command expands to gzip -c "alg proj.txt" instead of gzip -c alg proj.txt, which tells gzip to compress two files. In general, all variable expansions that you expect to be a filename should be quoted:
cd "$1"
for file in *."$3"; do
gzip -c "$file" > "$2/$file.gz"
done
One further problem is that if there are no files matching the extension, the wildcard will not expand and the command executed will be
gzip -c "*.txt" > "dir/*.txt.gz"
This will create a file that is literally called "*.txt.gz" in the target directory. A simple way to avoid this would be to check that the original file exists first - this will also avoid accidentally trying to gzip an oddly named directory.
cd "$1"
for file in *."$3"; do
if [ -f "$file" ]; then
gzip -c "$file" > "$2/$file.gz"
fi
done
you can try this;
#!/bin/bash
Src=$1
Des=$2
ext="txt"
for file in $Src/*; do
if [ "${file##*.}" = "${ext}" ]; then
base=$(basename $file)
mkdir -p $2 #-p ensures creation if directory does not exist
gzip -c $file > $Des/$base.gz
fi
done

Bash script of unzipping unknown name files

I have a folder that after an rsync will have a zip in it. I want to unzip it to its own folder(if the zip is L155.zip, to unzip its content to L155 folder). The problem is that I dont know it's name beforehand(although i know it will be "letter-number-number-number"), so I have to unzip an uknown file to its unknown folder and this to be done automatically.
The command “unzip *”(or unzip *.zip) works in terminal, but not in a script.
These are the commands that have worked through terminal one by one, but dont work in a script.
#!/bin/bash
unzip * #also tried .zip and /path/to/file/* when script is on different folder
i=$(ls | head -1)
y=${i:0:4}
mkdir $y
unzip * -d $y
First I unzip the file, then I read the name of the first extracted file through ls and save it in a variable.I take the first 4 chars and make a directory with it and then again unzip the files to that specific folder.
The whole procedure after first unzip is done, is because the files inside .zip, all start with a name that the zip already has, so if L155.ZIP is the zip, the files inside with be L155***.txt.
The zip file is at /path/to/file/NAME.zip.
When I run the script I get errors like the following:
unzip: cannot find or open /path/to/file/*.ZIP
unzip: cannot find or open /path/to/file//*.ZIP.zip
unzip: cannot find or open /path/to/file//*.ZIP.ZIP. No zipfiles found.
mkdir: cannot create directory 'data': File exists data
unzip: cannot find or open data, data.zip or data.ZIP.
Original answer
Supposing that foo.zip contains a folder foo, you could simply run
#!/bin/bash
unzip \*.zip \*
And then run it as bash auto-unzip.sh.
If you want to have these files extracted into a different folder, then I would modify the above as
#!/bin/bash
cp *.zip /home/user
cd /home/user
unzip \*.zip \*
rm *.zip
This, of course, you would run from the folder where all the zip files are stored.
Another answer
Another "simple" fix is to get dtrx (also available in the Ubuntu repos, possibly for other distros). This will extract each of your *.zip files into its own folder. So if you want the data in a different folder, I'd follow the second example and change it thusly:
#!/bin/bash
cp *.zip /home/user
cd /home/user
dtrx *.zip
rm *.zip
I would try the following.
for i in *.[Zz][Ii][Pp]; do
DIRECTORY=$(basename "$i" .zip)
DIRECTORY=$(basename "$DIRECTORY" .ZIP)
unzip "$i" -d "$DIRECTORY"
done
As noted, the basename program removes the indicated suffix .zip from the filename provided.
I have edited it to be case-insensitive. Both .zip and .ZIP will be recognized.
for zfile in $(find . -maxdepth 1 -type f -name "*.zip")
do
fn=$(echo ${zfile:2:4}) # this will give you the filename without .zip extension
mkdir -p "$fn"
unzip "$zfile" -d "$fn"
done
If the folder has only file file with the extension .zip, you can extract the name without an extension with the basename tool:
BASE=$(basename *.zip .zip)
This will produce an error message if there is more than one file matching *.zip.
Just to be clear about the issue here, the assumption is that the zip file does not contain a folder structure. If it did, there would be no problem; you could simply extract it into the subfolders with unzip. The following is only needed if your zipfile contains loose files, and you want to extract them into a subfolder.
With that caveat, the following should work:
#!/bin/bash
DIR=${1:-.}
BASE=$(basename "$DIR/"*.zip .zip 2>/dev/null) ||
{ echo More than one zipfile >> /dev/stderr; exit 1; }
if [[ $BASE = "*" ]]; then
echo No zipfile found >> /dev/stderr
exit 1
fi
mkdir -p "$DIR/$BASE" ||
{ echo Could not create $DIR/$BASE >> /dev/stderr; exit 1; }
unzip "$DIR/$BASE.zip" -d "$DIR/$BASE"
Put it in a file (anywhere), call it something like unzipper.sh, and chmod a+x it. Then you can call it like this:
/path/to/unzipper.sh /path/to/data_directory
simple one liner I use all the time
$ for file in `ls *.zip`; do unzip $file -d `echo $file | cut -d . -f 1`; done

Unzip ZIP file and extract unknown folder name's content

My users will be zipping up files which will look like this:
TEMPLATE1.ZIP
|--------- UnknownName
|------- index.html
|------- images
|------- image1.jpg
I want to extract this zip file as follows:
/mysite/user_uploaded_templates/myrandomname/index.html
/mysite/user_uploaded_templates/myrandomname/images/image1.jpg
My trouble is with UnknownName - I do not know what it is beforehand and extracting everything to the "base" level breaks all the relative paths in index.html
How can I extract from this ZIP file the contents of UnknownName?
Is there anything better than:
1. Extract everything
2. Detect which "new subdidrectory" got created
3. mv newsubdir/* .
4. rmdir newsubdir/
If there is more than one subdirectory at UnknownName level, I can reject that user's zip file.
I think your approach is a good one. Step 2 could be improved my extracting to a newly created directory (later deleted) so that "detection" is trivial.
# Bash (minimally tested)
tempdest=$(mktemp -d)
unzip -d "$tempdest" TEMPLATE1.ZIP
dir=("$tempdest"/*)
if (( ${#dir[#]} == 1 )) && [[ -d $dir ]]
# in Bash, etc., scalar $var is the same as ${var[0]}
mv "$dir"/* /mysite/user_uploaded_templates/myrandomname
else
echo "rejected"
fi
rm -rf "$tempdest"
The other option I can see other than the one you suggested is to use the unzip -j flag which will dump all paths and put all files into the current directory. If you know for certain that each of your TEMPLATE1.ZIP files includes an index.html and *.jpg files then you can just do something like:
destdir=/mysite/user_uploaded_templates/myrandomname
unzip -j -d "$destdir"
mkdir "${destdir}/images"
mv "${destdir}/*.jpg" "${destdir}/images"
It's not exactly the cleanest solution but at least you don't have to do any parsing like you do in your example. I can't seem to find any option similar to patch -p# that lets you specify the path level.
Each zip and unzip command differs, but there's usually a way to list the file contents. From there, you can parse the output to determine the unknown directory name.
On Windows, the 1996 Wales/Gaily/van der Linden/Rommel version it is unzip -l.
Of course, you could just simply allow the unzip to unzip the files to whatever directory it wants, then use mv to rename the directory to what you want it as.
$tempDir = temp.$$
mv $zipFile temp.$$
cd $tempDir
unzip $zipFile
$unknownDir = * #Should be the only directory here
mv $unknownDir $whereItShouldBe
cd ..
rm -rf $tempDir
It's always a good idea to create a temporary directory for these types of operations in case you end up running two instances of this command.

Resources