change unzipped folder name - bash

I have a fairly large number of directories (500+), each directory (and possible sub-directories) contains 4 or more zip files.
I managed to piece together a bash script that unzips the compressed files while maintaining zip filename as directory and all the directory hierarchy.
For example: If I have a zip file called 100011_test123.zip, and it contains 10 files. The script will uncompress all the files into 100011_test123/ directory.
The occurrence of numbers 100010 before the underscore in the filename/directoryname is totally random.
Here's the actual bash script:
#!/bin/bash
cd <directory-with-large-number-of-zip-files>
find . -name "*.zip" | while read filename; do unar -d -o "`dirname "$filename"`" "$filename"; done;
find . -name "*.zip" -type f -delete
Now I would like to update the script in order to remove the 100010_ from the .zip filename without tampering with the directory structure/hierarchy (I guess there's a way to rename the zip files before using unar command) and then uncompress the files into a directory without 100010_ at the beginning.
I have been stuck with this for more than 3 days. Any insights on this would be highly appreciated.
Thank you.

With all zip files at the same level, you don't need find, but a regular filename pattern globbing will do to iterate each zip archive.
And with bash's globstar option, you can also find the zip archives inside sub-directories
#!/usr/bin/env bash
shopt -s nullglob # Prevents iterating if no filename match
shopt -s globstar # ./**/ Allow searching inside sub-directories
# Set the basedir if you want all output directories at same place
#basedir="$PWD"
for zipfile in ./**/*.zip; do
# Extract the base directory containing the archive
zipdir="${zipfile%/*}"
# Extract the base name without the directory path
basename="${zipfile##*/}"
# Remove the .zip extension
# 100011_test123.zip -> 100011_test123
extensionless="${basename%.zip}"
# Remove everything before and first underscore 100011_
# 100011_test123 -> test123
outputdir="${basedir:-$zipdir}/${extensionless#*_}"
# Create output directory or continue with next archive
# mkdir -p test123
mkdir -p "$outputdir" || continue
# Unzip the zipfile into the outputdir and remove the zipfile if successful
# unrar -d -o test123 100011_test123.zip && rm -f -- 100011_test123.zip
unar -d -o "$outputdir" "$zipfile" && rm -f -- "$zipfile"
done

You need to parse directory name and filename first for each entry. Please check the ${fullpath%/*} and ${fullpath##*/} for this purpose. And awk for splitting filename with '_' and getting second part of it.
You can try following code.
#!/bin/bash
# cd directory
zip_files=($(find . -name "*.zip"))
for fullpath in "${zip_files[#]}"; do
echo "Processing: "$fullpath""
DIRNAME="${fullpath%/*}"
FILENAME="${fullpath##*/}"
NEW_FILENAME="`echo $FILENAME | awk -F'_' '{print $NF}'`"
echo " DIRNAME="$DIRNAME
echo " NEW_FILENAME="$NEW_FILENAME
mv $fullpath "$DIRNAME/$NEW_FILENAME"
# call unar command
unar -d -o $DIRNAME $NEW_FILENAME
# delete file if you want
done

Related

How to write a bash script to copy files from one base to another base location

I have a bash script I'm trying to write
I have 2 base directories:
./tmp/serve/
./src/
I want to go through all the directories in ./tmp and copy the *.html files into the same folder path in ./src
i.e
if I have a html file in ./tmp/serve/app/components/help/ help.html -->
copy to ./src/app/components/help/ And recursively do this for all subdirectories in ./tmp/
NOTE: the folder structures should exist so just need to copy them only. If it doesn't then hopefully it could create the folder for me (not what I want) but with GIT I can track these folders to manually handle those loose html files.
I got as far as
echo $(find . -name "*.html")\n
But not sure how to actually extract the file path with pwd and do what I need to, maybe it's not a one liner and needs to be done with some vars.
something like
for i in `echo $(find /tmp/ -name "*.html")\n
do
cp -r $i /src/app/components/help/
done
going so far to create the directories would take some more time for me.
I'll try to do it on my own and see if I come up with something
but for argument sake if you do run pwd and get a response the pseudo code for that:
pwd
get response
if that directory does not exist in src create that directory
copy all the original directories contents into the new folder at /src/$newfolder
(possibly running two for loops, one to check the directory tree, and then one to go through each original directory, copying all the html files)
You process substitution to loop the output from your find command and create the destination directory(ies) and then copy the file(s):
#!/bin/bash
# accept first parameters to script as src_dir and dest values or
# simply use default values if no parameter(s) passed
src_dir=${1:-/tmp/serve}
dest=${2-src}
while read -r orig_path ; do
# To replace the first occurrence of a pattern with a given string,
# use ${parameter/pattern/string}
dest_path="${orig_path/tmp\/serve/${dest}}"
# Use dirname to remove the filename from the destination path
# and create the destination directory.
dest_dir=$(dirname "${dest_path}")
mkdir -p "${dest_dir}"
cp "${orig_path}" "${dest_path}"
done < <(find "${src_dir}" -name '*.html')
This script copy .html files from src directory to des directory (create the subdirectory if they do not exist)
Find the files, then remove the src directory name and copy them into the destination directory.
#!/bin/bash
for i in `echo $(find src/ -name "*.html")`
do
file=$(echo $i | sed 's/src\///g')
cp -r --parents $i des
done
Not sure if you must use bash constructs or not, but here is a GNU tar solution (if you use GNU tar), which IMHO is the best way to handle this situation because all the metadata for the files (permissions, etc.) are preserved:
find ./tmp/serve -name '*.html' -type f -print0 | tar --null -T - -c | tar -x -v -C ./src --strip-components=3
This finds all the .html files (-type f) in the ./tmp/serve directory and prints them nul-terminated (-print0), then sends these filenames via stdin to tar as nul-terminated literals (--null) for inclusion (-T -), creating (-c) an archive which is then sent to another tar instance which extracts (-x) the archive printing its contents along the way (optional: -v), changing directory to the destination (-C ./src) before commencing and stripping (--strip-components=3) the ./tmp/serve/ prefix from the files. (You could also cd ./tmp/serve beforehand, using find . instead, and change -C to ../../src.)

Breaking down a filename into lexicographic based folders

Let's say I have thousands of images in a folder in the format filename_order.jpg.
filename are encoded as a 7 digits integer from 0000000 to 9999999
order is a number between 0 and 9
folder/
6398305_0.jpg
6398305_1.jpg
6398305_2.jpg
...
6399305_0.jpg
Is there an easy way to sort them into equality repartitioned folders based on the filenames?
folder/
6/3/9/
8/3/0/5/
6398305_0.jpg
6398305_1.jpg
6398305_2.jpg
...
9/3/0/7/
6399307_0.jpg
Is there a way to do the reverse operation as well: given a nested tree structure bringing it back to level 1 only.
The goal is being able to store them in S3 in an efficient way for millions of images.
Thank you.
This would do it in pure Bash:
#!/usr/bin/env bash
# extglob needed to expand number into a serie of folders path
shopt -s extglob
# Starting folder name
folder=folder
# Iterate all *.jpg files in folder
for file in "$folder/"*.jpg; do
# Remove leading directory path from file to get basename
basename="${file##*/}"
# Remove everything ater first _ to get only numbers
numbers="${basename%_*}"
# Insert / before each number to create a directory path from numbers
# Need Bash extglob
dir="$folder${numbers//?()/\/}"
# Create the directory path
echo mkdir -p "$dir"
# move file to its directory
echo mv "$file" "$dir/"
done
Remove the echo if the output matches your expectations.
Nesting a flat folder,
cp -R flat_folder/ nested_folder/
cd nested_folder/
for f in *_[0-9].jpg
do
filename=${f%.*}
extension=${f##*.}
number=${filename%_*}
index=${filename##*_}
folder=$(echo $number | sed 's/\(.\)\(.\)\(.\)\(.\)\(.\)\(.\)\(.\)/\1\/\2\/\3\/\4\/\5\/\6\/\7/')
mkdir -p $folder
mv $f $folder/
done
Flattening a nested folder,
cd nested_folder/
find . -name "*.jpg" -exec cp {} ../flat_folder/ \;

Can one automatically append file names with increasing numerical value when copying many files from one directory to another on Mac terminal?

I am trying to use a single line command in terminal to find and copy all the files of a certain type in one directory of my computer to another directory. I can do this right now using the below command:
find ./ -name '*.fileType' -exec cp -prv '{}' '/destination_directory/' ';'
The problem I'm having is that if a file that is being copied has the same name as a file that was previously copied, it will replace the previously copied file.
To remedy this, I would like to edit my command such that the files are numbered as they are copied to the new directory.
so the output should look something like this:
Original Files
cat.txt
dog.txt
dog.txt
Copied Files
cat1.txt
dog2.txt
dog3.txt
Edit:
The list of commands I can work with are linked here: https://ss64.com/osx/
Specifically for the cp command: https://ss64.com/osx/cp.html
-Note: --backup and -b are not available (it seems) for this version of cp
You are looking for the --backup option of the cp command. E.g.:
find ./ -name '*.fileType' -exec cp --backup=t -prv '{}' '/destination_directory/' ';'
Edit: If you are stuck with MacOS's cp you can emulate --backup's behaviour in a script:
#!/bin/bash
set -e
# First parameter: source directory
srcdir=$1
# Second parameter: destination directory
destdir=$2
# Print all filenames separated by '\0' in case you have strange
# characters in the names
find "$srcdir" -type f -print0 |
# Split the input using '\0' as separator and put the current line
# into the $file variable
while read -d $'\0' file; do
# filename = just the name of the file, without dirs
filename=$(basename "$file")
# if destdir does not have a file named filename
if [ \! -f "$destdir/$filename" ]; then
cp -pv "$file" "$destdir/$filename";
continue;
fi
# Otherwise
suffix=1
# Find the first suffix number that is free
while [ -f "$destdir/$filename.$suffix" ]; do
suffix=$(($suffix + 1))
done
cp -pv "$file" "$destdir/$filename.$suffix"
done

Bash script of unzipping unknown name files

I have a folder that after an rsync will have a zip in it. I want to unzip it to its own folder(if the zip is L155.zip, to unzip its content to L155 folder). The problem is that I dont know it's name beforehand(although i know it will be "letter-number-number-number"), so I have to unzip an uknown file to its unknown folder and this to be done automatically.
The command “unzip *”(or unzip *.zip) works in terminal, but not in a script.
These are the commands that have worked through terminal one by one, but dont work in a script.
#!/bin/bash
unzip * #also tried .zip and /path/to/file/* when script is on different folder
i=$(ls | head -1)
y=${i:0:4}
mkdir $y
unzip * -d $y
First I unzip the file, then I read the name of the first extracted file through ls and save it in a variable.I take the first 4 chars and make a directory with it and then again unzip the files to that specific folder.
The whole procedure after first unzip is done, is because the files inside .zip, all start with a name that the zip already has, so if L155.ZIP is the zip, the files inside with be L155***.txt.
The zip file is at /path/to/file/NAME.zip.
When I run the script I get errors like the following:
unzip: cannot find or open /path/to/file/*.ZIP
unzip: cannot find or open /path/to/file//*.ZIP.zip
unzip: cannot find or open /path/to/file//*.ZIP.ZIP. No zipfiles found.
mkdir: cannot create directory 'data': File exists data
unzip: cannot find or open data, data.zip or data.ZIP.
Original answer
Supposing that foo.zip contains a folder foo, you could simply run
#!/bin/bash
unzip \*.zip \*
And then run it as bash auto-unzip.sh.
If you want to have these files extracted into a different folder, then I would modify the above as
#!/bin/bash
cp *.zip /home/user
cd /home/user
unzip \*.zip \*
rm *.zip
This, of course, you would run from the folder where all the zip files are stored.
Another answer
Another "simple" fix is to get dtrx (also available in the Ubuntu repos, possibly for other distros). This will extract each of your *.zip files into its own folder. So if you want the data in a different folder, I'd follow the second example and change it thusly:
#!/bin/bash
cp *.zip /home/user
cd /home/user
dtrx *.zip
rm *.zip
I would try the following.
for i in *.[Zz][Ii][Pp]; do
DIRECTORY=$(basename "$i" .zip)
DIRECTORY=$(basename "$DIRECTORY" .ZIP)
unzip "$i" -d "$DIRECTORY"
done
As noted, the basename program removes the indicated suffix .zip from the filename provided.
I have edited it to be case-insensitive. Both .zip and .ZIP will be recognized.
for zfile in $(find . -maxdepth 1 -type f -name "*.zip")
do
fn=$(echo ${zfile:2:4}) # this will give you the filename without .zip extension
mkdir -p "$fn"
unzip "$zfile" -d "$fn"
done
If the folder has only file file with the extension .zip, you can extract the name without an extension with the basename tool:
BASE=$(basename *.zip .zip)
This will produce an error message if there is more than one file matching *.zip.
Just to be clear about the issue here, the assumption is that the zip file does not contain a folder structure. If it did, there would be no problem; you could simply extract it into the subfolders with unzip. The following is only needed if your zipfile contains loose files, and you want to extract them into a subfolder.
With that caveat, the following should work:
#!/bin/bash
DIR=${1:-.}
BASE=$(basename "$DIR/"*.zip .zip 2>/dev/null) ||
{ echo More than one zipfile >> /dev/stderr; exit 1; }
if [[ $BASE = "*" ]]; then
echo No zipfile found >> /dev/stderr
exit 1
fi
mkdir -p "$DIR/$BASE" ||
{ echo Could not create $DIR/$BASE >> /dev/stderr; exit 1; }
unzip "$DIR/$BASE.zip" -d "$DIR/$BASE"
Put it in a file (anywhere), call it something like unzipper.sh, and chmod a+x it. Then you can call it like this:
/path/to/unzipper.sh /path/to/data_directory
simple one liner I use all the time
$ for file in `ls *.zip`; do unzip $file -d `echo $file | cut -d . -f 1`; done

Extract a .tgz into specific subfolder only if there are files in the tar that would extract to my CWD

Most tar files extract into their own subfolder (because the people that write open source utilities are amazing people).
Some extract into my cwd, which clutters everything up. I know there's a way to see what's in the tar, but I want to write a bash script that essentially guarantees I won't end up with 15 files extracted into my home folder.
Any pointers?
pseudo code:
if [listing of tar files] has any file that doesn't have a '/' in it:
mkdir [tar filename without extension]
tar xzvf [tar filename] into [the new folder]
else:
tar xzvf [tar filename] into cwd
EDIT:
Both solutions are great, I chose the below solution because I was asking for a bash script, and it doesn't rely on extra software.
However, on my own machine, I am using aunpack because it can handle many, many more formats.
I am using it with a shell script that downloads and unpacks all at once. Here is what I am using:
#!/bin/bash
wget -o temp.log --content-disposition $1
old=IFS
IFS='
'
r=`cat temp.log`
rm temp.log
for line in $r; do
substring=$(expr "$line" : 'Saving to: `\(.*\)'\')
if [ "$substring" != "" ]
then
aunpack $substring
rm $substring
IFS=$old
exit
fi
done
IFS=$old
The aunpack command from the atool package does that:
aunpack extracts files from an archive. Often one wants to extract all
files in an archive to a single subdirectory.
However, some archives contain multiple files in their root
directories. The aunpack program overcomes this problem
by first extracting files to a unique (temporary)
directory, and then moving its contents back if possible. This
also prevents local files from being overwritten by mistake.
You can use combination of tar options to achieve this:
tar option for listing is:
-t, --list
list the contents of an archive
tar option to extract into different directory is:
-C, --directory DIR
change to directory DIR
So in your script you can list the files & check if there are any files in the listing which do not have "/" and based on that output you can call tar with appropriate options.
Sample for your reference is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for all the entries w/o "/" in their names using grep -v
# Count the number of such entries using wc -l, if the count is > 0, create directory
if [ `tar -tf ${TAR_FILE} |grep -v "/"|wc -l` -gt 0 ];then
echo "Found file(s) which is(are) not in any directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
[ -d ${DIR_NAME} ] || mkdir ${DIR_NAME}
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
In some cases the tar file may contain different directories. If you find it a little annoying to look for different directories which are extracted by the same tar file then the script can be modified to create a new directory even if the listing contains different directories. The slightly advanced sample is as follows:
TAR_FILE=<some_tar_file_to_be_extracted>
# List the files in the .tgz file using tar -tf
# Look for only directory names using cut,
# Current cut option used lists each files as different entry
# Count the number unique directories, if the count is > 1, create directory
if [ `tar -tf ${TAR_FILE} |cut -d '/' -f 1|uniq|wc -l` -gt 1 ];then
echo "Found file(s) which is(are) not in same directory"
# Directory name will be the tar file name excluding everything after last "."
# Thus "test.a.sh.tgz" will give a directory name "test.a.sh"
DIR_NAME=${TAR_FILE%.*}
echo "Extracting in ${DIR_NAME}"
# Test if the directory exists, if not then create it
# If directory exists prompt user to enter directory to extract to
# It can be a new or existing directory
if [ -d ${DIR_NAME} ];then
echo "${DIR_NAME} exists. Enter (new/existing) directory to extract to"
read NEW_DIR_NAME
# Test if the user entered directory exists, if not then create it
[ -d ${NEW_DIR_NAME} ] || mkdir ${NEW_DIR_NAME}
else
mkdir ${DIR_NAME}
fi
# Extract to the directory instead of cwd
tar xzvf ${TAR_FILE} -C ${DIR_NAME}
else
# Extract to cwd
tar xzvf ${TAR_FILE}
fi
Hope this helps!

Resources