Rename files based on their parent directory in Bash - bash

Been trying to piece together a couple previous posts for this task.
The directory tree looks like this:
TEST
|ABC_12345678
3_XYZ
|ABC_23456789
3_XYZ
etc
Each folder within the parent folder named "TEST" always starts with ABC_\d{8} -the 8 digits are always different. Within the folder ABC_\d{8} is always a folder entitled 3_XYZ that always has a file named "MD2_Phd.txt". The goal is to rename each "MD2_PhD.txt" file with the specific 8 digit ID found in the ABC folder name i.e. "\d{8}_PhD.txt"
After several iterations on various bits of code from different posts this is the best I can come up with,
cd /home/etc/Desktop/etc/TEST
find -type d -name 'ABC_(\d{8})' |
find $d -name "*_PhD.txt" -execdir rename 's/MD2$/$d/' "{}" \;
done

find + bash solution:
find -type f -regextype posix-egrep -regex ".*/TEST/ABC_[0-9]{8}/3_XYZ/MD2_Phd\.txt" \
-exec bash -c 'abc="${0%/*/*}"; fp="${0%/*}/";
mv "$0" "$fp${abc##*_}_PhD.txt" ' {} \;
Viewing results:
$ tree TEST/ABC_*
TEST/ABC_12345678
└── 3_XYZ
└── 12345678_PhD.txt
TEST/ABC_1234ss5678
└── 3_XYZ
└── MD2_Phd.txt
TEST/ABC_23456789
└── 3_XYZ
└── 23456789_PhD.txt

You are piping find output to another find. That won't work.
Use a loop instead:
dir_re='^.+_([[:digit:]]{8})/'
for file in *_????????/3_XYZ/MD2_PhD.txt; do
[[ -f $file ]] || continue
if [[ $file =~ $dir_re ]]; then
dir_num="${BASH_REMATCH[1]}"
new_name="${file%MD2_PhD.txt/$dir_num.txt}" # replace the MD2_PhD at the end
echo mv "$file" "$new_name" # remove echo from here once tested
fi
done

Related

Create Folder for Each File in Recursive Directory, Placing File in Folder

Create Folder for Each File in Recursive Directory, Placing File in Folder
On MacOS, so far I have...
for file in $(ls -R); do
if [[ -f "$file" ]]; then mkdir "${file%.*}"; mv "$file" "${file%.*}"; fi;
done
This operates correctly on the top level of the nested folder, but does nothing with lower levels.
To isolate the error, I tried this instead, operating on rtf files . .
for i in $(ls -R);do
if [ $i = '*.rtf' ];then
echo "I do something with the file $i"
fi
done
This hangs, so I simplified to . .
for i in $(ls -R); do echo "file is $i" done
That hangs also, so I tried . .
for i in $(ls -R); do echo hello
That hangs also.
ls -R works to provide a recursive list of all files.
Suggestions appreciated !!
First of all don't use ls in scripts. It is meant to show output interactively. Although the newish GNU ls version has some features/options for shell parsing, not sure about on a Mac though.
Now using find with the sh shell.
find . -type f -name '*.rtf' -execdir sh -c '
for f; do mkdir -p -- "${f%.*}" && mv -v -- "$f" "${f%.*}"; done' _ {} +
For whatever reason -execdir is not available, one can use -exec
find . -type f -name '*.rtf' -exec sh -c '
for f; do mkdir -p -- "${f%.*}" && mv -v -- "$f" "${f%.*}" ; done' _ {} +
See Understanding-the-exec-option-of-find
Given this file structure:
$ tree .
.
├── 123
│   └── test_2.rtf
├── bar
│   ├── 456
│   │   └── test_1.rtf
│   └── 789
└── foo
There are two common ways to find all the .rtf files in that tree. The first (and most common) is to use find:
while IFS= read -r path; do
echo "$path"
done < <(find . -type f -name "*.rtf")
Prints:
./bar/456/test_1.rtf
./123/test_2.rtf
The second common way is to use a recursive glob. This is not a POSIX way and is only found in more recent shells such as Bash, zsh, etc:
shopt -s globstar # In Bash, this enables recursive globs
# In zsh, this is not required
for path in **/*.rtf; do
echo "$path"
done
# same output
Now that you have the loop to find the files, you can modify the files found.
The first issue you will run across is that you cannot have two files with the same name in a single directory; a directory is just a type of file. So you will need to proceed this way:
Find all the files with their paths;
Create a tmp name and create a sub-directory with that temp name;
Move the found file into the temp directory;
Rename the temp directory to the found file name.
Here is a Bash (or zsh that is default on MacOS) script to do that:
shopt -s globstar # remove for zsh
for p in **/*.rtf; do
[ -f "$p" ] || continue # if not a file, loop onward
tmp=$(uuidgen) # easy temp name -- not POSIX however
fn="${p##*/}" # strip the file name from the path
path_to="${p%/*}" # get the path without the file name
mkdir "${path_to}${tmp}" # use temp name for the directory
mv "$p" "${path_to}$tmp" # move the file to that directory
mv "${path_to}$tmp" "$p" # rename the directory to the path
done
And the result:
.
├── 123
│   └── test_2.rtf
│   └── test_2.rtf
├── bar
│   ├── 456
│   │   └── test_1.rtf
│   │   └── test_1.rtf
│   └── 789
└── foo

Bash : Verify empty folder before ls -A

I'm starting to study some sh implementations and im running into some troubles when trying to do some actions with files inside some folders.
Here is the scenario:
I have a list of TXT files inside two different subfolders :
├── Folder A
├── randomFile1.txt
├── randomFile2.txt
├── Folder B
├── File1.txt
├── Folder C
├── File2.txt
And depending of the folder that the file resides in, i should take an specify action.
obs1 : The files from folderA should not be processed.
basicaly i tried two different aprroachs:
first one :
files_b="$incoming/$origin"/FolderB/*.txt
files_c="$incoming/$origin"/FolderC/*.txt
if [ "$(ls -A $files_b)" ]; then
for file in $files_b
do
#take action
done
else
echo -e "\033[1;33mWarning: No files\033[0m"
fi
if [ "$(ls -A $files_c)" ]; then
for file in $files_c
do
#take action
done
else
echo -e "\033[1;33mWarning: No files\033[0m"
fi
the problem for this one is that when i run the command ls -A if one of the folders (B or C) is empty, it throws an error because of the " *.txt " in the end of the path.
The second :
path="$incoming/$origin"/*.txt
find $path -type f -name "*.txt" | while read txt; do
for file in $txt
do
name=$(basename "$file")
dir=$(basename $(dirname $file))
if [ "$dir" == FolderB]; then
# Do something to files"
elif [ "$dir" == FolderC]; then
# Do something to files"
fi
done
done
For that approach the problem is that i'm picking the files from folder A and i dont want that (because it will decrease performance due to "if" statements), and i dont know how to verify if the folder is empty using the find command.
Can annyone help me?
Thank you all.
I would write the code like this:
No unquoted parameter expansions
Don't use ls to check if the directory is empty
Use printf instead of echo.
# You cannot safely expand a parameter so that it does file globbing
# but does *not* to word-splitting. Put the glob directly in the loop
# or use an array.
shopt -s nullglob
found=
for file in "$incoming/$origin"/FolderB/*.txt; do
do
found=1
#take action
done
if [ "$found" ]; then
printf "\033[1;33mWarning: No files\033[0m\n"
fi
In the first solution you can simply hide the error messages.
if [ "$(ls -A $files_b 2>/dev/null)" ]; then
In the second solution, start find at the subdirectories instead of the parent directory:
path="$incoming/$origin/FolderA $incoming/$origin/FolderB"
I think using find should be better
files_b="${incoming}/${origin}/FolderB"
files_c="${incoming}/${origin}/FolderC"
find files_b -name "*.txt" -exec action1 {} \;
find files_b -name "*.txt" -exec action2 {} \;
or even just find
find "${incoming}/${origin}/FolderB" -name "*.txt" -exec action1 {} \;
find "${incoming}/${origin}/FolderC" -name "*.txt" -exec action2 {} \;
of course you should think about your action, but you can make function or separate script which accept file name(s)

md5 all files in a directory tree

I have a a directory with a structure like so:
.
├── Test.txt
├── Test1
│   ├── Test1.txt
│   ├── Test1_copy.txt
│   └── Test1a
│   ├── Test1a.txt
│   └── Test1a_copy.txt
└── Test2
├── Test2.txt
├── Test2_copy.txt
└── Test2a
├── Test2a.txt
└── Test2a_copy.txt
I would like to create a bash script that makes a md5 checksum of every file in this directory. I want to be able to type the script name in the CLI and then the path to the directory I want to hash and have it work. I'm sure there are many ways to accomplish this. Currently I have:
#!/bin/bash
for file in "$1" ; do
md5 >> "${1}__checksums.md5"
done
This just hangs and it not working. Perhaps I should use find?
One caveat - the directories I want to hash will have files with different extensions and may not always have this exact same tree structure. I want something that will work in these different situations, as well.
Using md5deep
md5deep -r path/to/dir > sums.md5
Using find and md5sum
find relative/path/to/dir -type f -exec md5sum {} + > sums.md5
Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5, you need to run it from the same directory from which you generated sums.md5 file. This is because find outputs paths that are relative to your current location, which are then put into sums.md5 file.
If this is a problem you can make relative/path/to/dir absolute (e.g. by puting $PWD/ in front of your path). This way you can run check on sums.md5 from any location. Disadvantage is, that now sums.md5 contains absolute paths, which makes it bigger.
Fully featured function using find and md5sum
You can put this function to your .bashrc file (located in your $HOME directory):
function md5sums {
if [ "$#" -lt 1 ]; then
echo -e "At least one parameter is expected\n" \
"Usage: md5sums [OPTIONS] dir"
else
local OUTPUT="checksums.md5"
local CHECK=false
local MD5SUM_OPTIONS=""
while [[ $# > 1 ]]; do
local key="$1"
case $key in
-c|--check)
CHECK=true
;;
-o|--output)
OUTPUT=$2
shift
;;
*)
MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1"
;;
esac
shift
done
local DIR=$1
if [ -d "$DIR" ]; then # if $DIR directory exists
cd $DIR # change to $DIR directory
if [ "$CHECK" = true ]; then # if -c or --check option specified
md5sum --check $MD5SUM_OPTIONS $OUTPUT # check MD5 sums in $OUTPUT file
else # else
find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file
fi
cd - > /dev/null # change to previous directory
else
cd $DIR # if $DIR doesn't exists, change to it to generate localized error message
fi
fi
}
After you run source ~/.bashrc, you can use md5sums like normal command:
md5sums path/to/dir
will generate checksums.md5 file in path/to/dir directory, containing MD5 sums of all files in this directory and subdirectories. Use:
md5sums -c path/to/dir
to check sums from path/to/dir/checksums.md5 file.
Note that path/to/dir can be relative or absolute, md5sums will work fine either way. Resulting checksums.md5 file always contains paths relative to path/to/dir.
You can use different file name then default checksums.md5 by supplying -o or --output option. All options, other then -c, --check, -o and --output are passed to md5sum.
First half of md5sums function definition is responsible for parsing options. See this answer for more information about it. Second half contains explanatory comments.
How about:
find /path/you/need -type f -exec md5sum {} \; > checksums.md5
Update#1: Improved the command based on #twalberg's recommendation to handle white spaces in file names.
Update#2: Improved based on #jil's suggestion, to remove unnecessary xargs call and use -exec option of find instead.
Update#3: #Blake a naive implementation of your script would look something like this:
#!/bin/bash
# Usage: checksumchecker.sh <path>
find "$1" -type f -exec md5sum {} \; > "$1"__checksums.md5
Updated Answer
If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:
function sumthem(){ find "$1" -type f -print0 | parallel -0 -X md5 > checksums.md5; }
Then you can just use:
sumthem /Users/somebody/somewhere
If that works how you like, you can add that line to the end of your "bash profile" and the function will be declared and available whenever you are logged in. Your "bash profile" is probably in $HOME/.profile
Original Answer
Why not get all your CPU cores working in parallel for you?
find . -type f -print0 | parallel -0 -X md5sum
This finds all the files (-type f) in the current directory (.) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte (-0) and that it should do as many files as possible at a time (-X) to save creating a new process for each file and it should md5sum the files.
This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.
#!/bin/bash
shopt -s globstar
md5sum "$1"/** > "${1}__checksums.md5"
Explanation: shopt -s globstar(manual) enables ** recursive glob wildcard. It will mean that "$1"/** will expand to list of all the files recursively under the directory given as parameter $1. Then the script simply calls md5sum with this file list as parameter and > "${1}__checksums.md5" redirects the output to the file.
md5deep -r $your_directory | awk {'print $1'} | sort | md5sum | awk {'print $1'}
Use find command to list all files in directory tree,
then use xargs to provide input to md5sum command
find dirname -type f | xargs md5sum > checksums.md5
In case you prefer to have separate checksum files in every directory, rather than a single file, you can
find all subdirectories
keep only those which actually contain files (not only other subdirs)
cd to each of them and create a checksums.md5 file inside that directory
Here is a an example script which does that:
#!/bin/bash
# Do separate md5 files in each subdirectory
md5_filename=checksums.md5
dir="$1"
[ -z "$dir" ] && dir="."
# Check OS to select md5 command
if [[ "$OSTYPE" == "linux-gnu"* ]]; then
is_linux=1
md5cmd="md5sum"
elif [[ "$OSTYPE" == "darwin"* ]]; then
md5cmd="md5 -r"
else
echo "Error: unknown OS '$OSTYPE'. Don't know correct md5 command."
exit 1
fi
# go to base directory after saving where we started
start_dir="$PWD"
cd "$dir"
# if we're in a symlink cd to the real path
if [ ! "$dir" = "$(pwd -P)" ]; then
dir="$(pwd -P)"
cd "$dir"
fi
if [ "$PWD" = "/" ]; then
die "Refusing to do it on system root '$PWD'"
fi
# Find all folders to process
declare -a subdirs=()
declare -a wanted=()
# find all non-hidden subdirectories (not if the name begins with "." like ".Trashes", ".Spotlight-V100", etc.)
while IFS= read -r; do subdirs+=("$PWD/$REPLY"); done < <(find . -type d -not -name ".*" | LC_ALL=C sort)
# count files and if there are any, add dir to "wanted" array
echo "Counting files and sizes to process ..."
for d in "$dir" "${subdirs[#]}"; do # include "$dir" itself, not only it's subdirs
files_here=0
while IFS= read -r ; do
(( files_here += 1 ))
done < <(find "$d" -maxdepth 1 -type f -not -name "*.md5")
(( files_here )) && wanted+=("$d")
done
echo "Found ${#wanted[#]} folders to process:"
printf " * %s\n" "${wanted[#]}"
if [ "${#wanted[*]}" = 0 ]; then
echo "Nothing to do. Exiting."
exit 0
fi
for d in "${wanted[#]}"; do
cd "$d"
find . -maxdepth 1 -type f -not -name "$md5_filename" -print0 \
| LC_ALL=C sort -z \
| while IFS= read -rd '' f; do
$md5cmd "$f" | tee -a "$md5_filename"
done
cd "$dir"
done
cd "$start_dir"
(This is actually a very simplified version of this "md5dirs" script on Github. The original is quite specific and more complex, making it less illustrative as an example, and more difficult to adapt to other different needs.)
I wanted something similar to calculate the SHA256 of an entire directory, so I wrote this "checksum" script:
#!/bin/sh
cd $1
find . -type f | LC_ALL=C sort |
(
while read name; do
sha256sum "$name"
done;
) | sha256sum
Example usage:
patrick#pop-os:~$ checksum tmp
d36bebfa415da8e08cbfae8d9e74f6606e86d9af9505c1993f5b949e2befeef0 -
In an earlier version I was feeding the file names to "xargs", but that wasn't working when file names had spaces.

Move the files to a specific folder in unix

I have a script to move files of type .txt to a particular folder .It looks for the files in work folder and move it to completed folder.
I would like to make the script generic i.e to enhance the script so that the scripts works not for just one particular folder but other similar folders as well.
Example: If there is a .txt file in folder /tmp/swan/test/work and also in folder /tmp/swan/test11/work, the files should move to /tmp/swan/test/done and /tmp/swan/test11/done respectively.
EDIT:Also, if there is a .txt file in a sub folder like /tmp/swan/test11/work/APX that should also move to /tmp/swan/test11/done
Below is the current script.
#!/bin/bash
MY_DIR=/tmp/swan
cd $MY_DIR
find . -path "*work*" -iname "*.txt" -type f -execdir mv '{}' /tmp/swan/test/done \;
With -execdir, the mv command is executed in whatever directory the file is found in. Since you just want to move the file to a "sibling" directory, each command can use the same relative path ../done.
find . -path "*work*" -iname "*.txt" -type f -execdir mv '{}' ../done \;
One way to do it:
Background:
$ tree
.
├── a
│   └── work
└── b
└── work
Renaming:
find . -type f -name work -exec \
sh -c 'echo mv "$1" "$(dirname "$1")"/done' -- {} \;
Output:
mv ./a/work ./a/done
mv ./b/work ./b/done
You can remove the echo if it does what you want it to.
What about:
find . -path '*work/*.txt' -exec sh -c 'd=$(dirname $(dirname $1))/done; mkdir -p $d; mv $1 $d' _ {} \;
(also creates the target directory if it does not exist already).

Count number of specific file type of a directory and its sub dir in mac

I use ls -l *.filetype | wc -l but it can only find files in current directory.
How can I also count all files with specific extension in its sub dirs?
Thank you very much.
You can do that with find command:
find . -name "*.filetype" | wc -l
The following compound command, albeit somewhat verbose, guarantees an accurate count because it handles filenames that contain newlines correctly:
total=0; while read -rd ''; do ((total++)); done < <(find . -name "*.filetype" -print0) && echo "$total"
Note: Before running the aforementioned compound command:
Firstly, cd to the directory that you want to count all files with specific extension in.
Change the filetype part as appropriate, e.g. txt
Demo:
To further demonstrate why piping the results of find to wc -l may produce incorrect results:
Run the following compound command to quickly create some test files:
mkdir -p ~/Desktop/test/{1..2} && touch ~/Desktop/test/{1..2}/a-file.txt && touch ~/Desktop/test/{1..2}/$'b\n-file.txt'
This produces the following directory structure on your "Desktop":
test
├── 1
│   ├── a-file.txt
│   └── b\n-file.txt
└── 2
├── a-file.txt
└── b\n-file.txt
Note: It contains a total of four .txt files. Two of which have multi-line filenames, i.e. b\n-file.txt.
On newer version of macOS the files named b\n-file.txt will appear as b?-file.txt in the "Finder", i.e. A question mark indicates the newline in a multi-line filename
Then run the following command that pipes the results of find to wc -l:
find ~/Desktop/test -name "*.txt" | wc -l
It incorrectly reports/prints:
6
Then run the following suggested compound command:
total=0; while read -rd ''; do ((total++)); done < <(find ~/Desktop/test -name "*.txt" -print0) && echo "$total"
It correctly reports/prints:
4

Resources