Bash : Verify empty folder before ls -A - bash

I'm starting to study some sh implementations and im running into some troubles when trying to do some actions with files inside some folders.
Here is the scenario:
I have a list of TXT files inside two different subfolders :
├── Folder A
├── randomFile1.txt
├── randomFile2.txt
├── Folder B
├── File1.txt
├── Folder C
├── File2.txt
And depending of the folder that the file resides in, i should take an specify action.
obs1 : The files from folderA should not be processed.
basicaly i tried two different aprroachs:
first one :
files_b="$incoming/$origin"/FolderB/*.txt
files_c="$incoming/$origin"/FolderC/*.txt
if [ "$(ls -A $files_b)" ]; then
for file in $files_b
do
#take action
done
else
echo -e "\033[1;33mWarning: No files\033[0m"
fi
if [ "$(ls -A $files_c)" ]; then
for file in $files_c
do
#take action
done
else
echo -e "\033[1;33mWarning: No files\033[0m"
fi
the problem for this one is that when i run the command ls -A if one of the folders (B or C) is empty, it throws an error because of the " *.txt " in the end of the path.
The second :
path="$incoming/$origin"/*.txt
find $path -type f -name "*.txt" | while read txt; do
for file in $txt
do
name=$(basename "$file")
dir=$(basename $(dirname $file))
if [ "$dir" == FolderB]; then
# Do something to files"
elif [ "$dir" == FolderC]; then
# Do something to files"
fi
done
done
For that approach the problem is that i'm picking the files from folder A and i dont want that (because it will decrease performance due to "if" statements), and i dont know how to verify if the folder is empty using the find command.
Can annyone help me?
Thank you all.

I would write the code like this:
No unquoted parameter expansions
Don't use ls to check if the directory is empty
Use printf instead of echo.
# You cannot safely expand a parameter so that it does file globbing
# but does *not* to word-splitting. Put the glob directly in the loop
# or use an array.
shopt -s nullglob
found=
for file in "$incoming/$origin"/FolderB/*.txt; do
do
found=1
#take action
done
if [ "$found" ]; then
printf "\033[1;33mWarning: No files\033[0m\n"
fi

In the first solution you can simply hide the error messages.
if [ "$(ls -A $files_b 2>/dev/null)" ]; then
In the second solution, start find at the subdirectories instead of the parent directory:
path="$incoming/$origin/FolderA $incoming/$origin/FolderB"

I think using find should be better
files_b="${incoming}/${origin}/FolderB"
files_c="${incoming}/${origin}/FolderC"
find files_b -name "*.txt" -exec action1 {} \;
find files_b -name "*.txt" -exec action2 {} \;
or even just find
find "${incoming}/${origin}/FolderB" -name "*.txt" -exec action1 {} \;
find "${incoming}/${origin}/FolderC" -name "*.txt" -exec action2 {} \;
of course you should think about your action, but you can make function or separate script which accept file name(s)

Related

Rename files based on their parent directory in Bash

Been trying to piece together a couple previous posts for this task.
The directory tree looks like this:
TEST
|ABC_12345678
3_XYZ
|ABC_23456789
3_XYZ
etc
Each folder within the parent folder named "TEST" always starts with ABC_\d{8} -the 8 digits are always different. Within the folder ABC_\d{8} is always a folder entitled 3_XYZ that always has a file named "MD2_Phd.txt". The goal is to rename each "MD2_PhD.txt" file with the specific 8 digit ID found in the ABC folder name i.e. "\d{8}_PhD.txt"
After several iterations on various bits of code from different posts this is the best I can come up with,
cd /home/etc/Desktop/etc/TEST
find -type d -name 'ABC_(\d{8})' |
find $d -name "*_PhD.txt" -execdir rename 's/MD2$/$d/' "{}" \;
done
find + bash solution:
find -type f -regextype posix-egrep -regex ".*/TEST/ABC_[0-9]{8}/3_XYZ/MD2_Phd\.txt" \
-exec bash -c 'abc="${0%/*/*}"; fp="${0%/*}/";
mv "$0" "$fp${abc##*_}_PhD.txt" ' {} \;
Viewing results:
$ tree TEST/ABC_*
TEST/ABC_12345678
└── 3_XYZ
└── 12345678_PhD.txt
TEST/ABC_1234ss5678
└── 3_XYZ
└── MD2_Phd.txt
TEST/ABC_23456789
└── 3_XYZ
└── 23456789_PhD.txt
You are piping find output to another find. That won't work.
Use a loop instead:
dir_re='^.+_([[:digit:]]{8})/'
for file in *_????????/3_XYZ/MD2_PhD.txt; do
[[ -f $file ]] || continue
if [[ $file =~ $dir_re ]]; then
dir_num="${BASH_REMATCH[1]}"
new_name="${file%MD2_PhD.txt/$dir_num.txt}" # replace the MD2_PhD at the end
echo mv "$file" "$new_name" # remove echo from here once tested
fi
done

md5 all files in a directory tree

I have a a directory with a structure like so:
.
├── Test.txt
├── Test1
│   ├── Test1.txt
│   ├── Test1_copy.txt
│   └── Test1a
│   ├── Test1a.txt
│   └── Test1a_copy.txt
└── Test2
├── Test2.txt
├── Test2_copy.txt
└── Test2a
├── Test2a.txt
└── Test2a_copy.txt
I would like to create a bash script that makes a md5 checksum of every file in this directory. I want to be able to type the script name in the CLI and then the path to the directory I want to hash and have it work. I'm sure there are many ways to accomplish this. Currently I have:
#!/bin/bash
for file in "$1" ; do
md5 >> "${1}__checksums.md5"
done
This just hangs and it not working. Perhaps I should use find?
One caveat - the directories I want to hash will have files with different extensions and may not always have this exact same tree structure. I want something that will work in these different situations, as well.
Using md5deep
md5deep -r path/to/dir > sums.md5
Using find and md5sum
find relative/path/to/dir -type f -exec md5sum {} + > sums.md5
Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5, you need to run it from the same directory from which you generated sums.md5 file. This is because find outputs paths that are relative to your current location, which are then put into sums.md5 file.
If this is a problem you can make relative/path/to/dir absolute (e.g. by puting $PWD/ in front of your path). This way you can run check on sums.md5 from any location. Disadvantage is, that now sums.md5 contains absolute paths, which makes it bigger.
Fully featured function using find and md5sum
You can put this function to your .bashrc file (located in your $HOME directory):
function md5sums {
if [ "$#" -lt 1 ]; then
echo -e "At least one parameter is expected\n" \
"Usage: md5sums [OPTIONS] dir"
else
local OUTPUT="checksums.md5"
local CHECK=false
local MD5SUM_OPTIONS=""
while [[ $# > 1 ]]; do
local key="$1"
case $key in
-c|--check)
CHECK=true
;;
-o|--output)
OUTPUT=$2
shift
;;
*)
MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1"
;;
esac
shift
done
local DIR=$1
if [ -d "$DIR" ]; then # if $DIR directory exists
cd $DIR # change to $DIR directory
if [ "$CHECK" = true ]; then # if -c or --check option specified
md5sum --check $MD5SUM_OPTIONS $OUTPUT # check MD5 sums in $OUTPUT file
else # else
find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file
fi
cd - > /dev/null # change to previous directory
else
cd $DIR # if $DIR doesn't exists, change to it to generate localized error message
fi
fi
}
After you run source ~/.bashrc, you can use md5sums like normal command:
md5sums path/to/dir
will generate checksums.md5 file in path/to/dir directory, containing MD5 sums of all files in this directory and subdirectories. Use:
md5sums -c path/to/dir
to check sums from path/to/dir/checksums.md5 file.
Note that path/to/dir can be relative or absolute, md5sums will work fine either way. Resulting checksums.md5 file always contains paths relative to path/to/dir.
You can use different file name then default checksums.md5 by supplying -o or --output option. All options, other then -c, --check, -o and --output are passed to md5sum.
First half of md5sums function definition is responsible for parsing options. See this answer for more information about it. Second half contains explanatory comments.
How about:
find /path/you/need -type f -exec md5sum {} \; > checksums.md5
Update#1: Improved the command based on #twalberg's recommendation to handle white spaces in file names.
Update#2: Improved based on #jil's suggestion, to remove unnecessary xargs call and use -exec option of find instead.
Update#3: #Blake a naive implementation of your script would look something like this:
#!/bin/bash
# Usage: checksumchecker.sh <path>
find "$1" -type f -exec md5sum {} \; > "$1"__checksums.md5
Updated Answer
If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:
function sumthem(){ find "$1" -type f -print0 | parallel -0 -X md5 > checksums.md5; }
Then you can just use:
sumthem /Users/somebody/somewhere
If that works how you like, you can add that line to the end of your "bash profile" and the function will be declared and available whenever you are logged in. Your "bash profile" is probably in $HOME/.profile
Original Answer
Why not get all your CPU cores working in parallel for you?
find . -type f -print0 | parallel -0 -X md5sum
This finds all the files (-type f) in the current directory (.) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte (-0) and that it should do as many files as possible at a time (-X) to save creating a new process for each file and it should md5sum the files.
This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.
#!/bin/bash
shopt -s globstar
md5sum "$1"/** > "${1}__checksums.md5"
Explanation: shopt -s globstar(manual) enables ** recursive glob wildcard. It will mean that "$1"/** will expand to list of all the files recursively under the directory given as parameter $1. Then the script simply calls md5sum with this file list as parameter and > "${1}__checksums.md5" redirects the output to the file.
md5deep -r $your_directory | awk {'print $1'} | sort | md5sum | awk {'print $1'}
Use find command to list all files in directory tree,
then use xargs to provide input to md5sum command
find dirname -type f | xargs md5sum > checksums.md5
In case you prefer to have separate checksum files in every directory, rather than a single file, you can
find all subdirectories
keep only those which actually contain files (not only other subdirs)
cd to each of them and create a checksums.md5 file inside that directory
Here is a an example script which does that:
#!/bin/bash
# Do separate md5 files in each subdirectory
md5_filename=checksums.md5
dir="$1"
[ -z "$dir" ] && dir="."
# Check OS to select md5 command
if [[ "$OSTYPE" == "linux-gnu"* ]]; then
is_linux=1
md5cmd="md5sum"
elif [[ "$OSTYPE" == "darwin"* ]]; then
md5cmd="md5 -r"
else
echo "Error: unknown OS '$OSTYPE'. Don't know correct md5 command."
exit 1
fi
# go to base directory after saving where we started
start_dir="$PWD"
cd "$dir"
# if we're in a symlink cd to the real path
if [ ! "$dir" = "$(pwd -P)" ]; then
dir="$(pwd -P)"
cd "$dir"
fi
if [ "$PWD" = "/" ]; then
die "Refusing to do it on system root '$PWD'"
fi
# Find all folders to process
declare -a subdirs=()
declare -a wanted=()
# find all non-hidden subdirectories (not if the name begins with "." like ".Trashes", ".Spotlight-V100", etc.)
while IFS= read -r; do subdirs+=("$PWD/$REPLY"); done < <(find . -type d -not -name ".*" | LC_ALL=C sort)
# count files and if there are any, add dir to "wanted" array
echo "Counting files and sizes to process ..."
for d in "$dir" "${subdirs[#]}"; do # include "$dir" itself, not only it's subdirs
files_here=0
while IFS= read -r ; do
(( files_here += 1 ))
done < <(find "$d" -maxdepth 1 -type f -not -name "*.md5")
(( files_here )) && wanted+=("$d")
done
echo "Found ${#wanted[#]} folders to process:"
printf " * %s\n" "${wanted[#]}"
if [ "${#wanted[*]}" = 0 ]; then
echo "Nothing to do. Exiting."
exit 0
fi
for d in "${wanted[#]}"; do
cd "$d"
find . -maxdepth 1 -type f -not -name "$md5_filename" -print0 \
| LC_ALL=C sort -z \
| while IFS= read -rd '' f; do
$md5cmd "$f" | tee -a "$md5_filename"
done
cd "$dir"
done
cd "$start_dir"
(This is actually a very simplified version of this "md5dirs" script on Github. The original is quite specific and more complex, making it less illustrative as an example, and more difficult to adapt to other different needs.)
I wanted something similar to calculate the SHA256 of an entire directory, so I wrote this "checksum" script:
#!/bin/sh
cd $1
find . -type f | LC_ALL=C sort |
(
while read name; do
sha256sum "$name"
done;
) | sha256sum
Example usage:
patrick#pop-os:~$ checksum tmp
d36bebfa415da8e08cbfae8d9e74f6606e86d9af9505c1993f5b949e2befeef0 -
In an earlier version I was feeding the file names to "xargs", but that wasn't working when file names had spaces.

command line find first file in a directory

My directory structure is as follows
Directory1\file1.jpg
\file2.jpg
\file3.jpg
Directory2\anotherfile1.jpg
\anotherfile2.jpg
\anotherfile3.jpg
Directory3\yetanotherfile1.jpg
\yetanotherfile2.jpg
\yetanotherfile3.jpg
I'm trying to use the command line in a bash shell on ubuntu to take the first file from each directory and rename it to the directory name and move it up one level so it sits alongside the directory.
In the above example:
file1.jpg would be renamed to Directory1.jpg and placed alongside the folder Directory1
anotherfile1.jpg would be renamed to Directory2.jpg and placed alongside the folder Directory2
yetanotherfile1.jpg would be renamed to Directory3.jpg and placed alongside the folder Directory3
I've tried using:
find . -name "*.jpg"
but it does not list the files in sequential order (I need the first file).
This line:
find . -name "*.jpg" -type f -exec ls "{}" +;
lists the files in the correct order but how do I pick just the first file in each directory and move it up one level?
Any help would be appreciated!
Edit: When I refer to the first file what I mean is each jpg is numbered from 0 to however many files in that folder - for example: file1, file2...... file34, file35 etc... Another thing to mention is the format of the files is random, so the numbering might start at 0 or 1a or 1b etc...
You can go inside each dir and run:
$ mv `ls | head -n 1` ..
If first means whatever the shell glob finds first (lexical, but probably affected by LC_COLLATE), then this should work:
for dir in */; do
for file in "$dir"*.jpg; do
echo mv "$file" "${file%/*}.jpg" # If it does what you want, remove the echo
break 1
done
done
Proof of concept:
$ mkdir dir{1,2,3} && touch dir{1,2,3}/file{1,2,3}.jpg
$ for dir in */; do for file in "$dir"*.jpg; do echo mv "$file" "${file%/*}.jpg"; break 1; done; done
mv dir1/file1.jpg dir1.jpg
mv dir2/file1.jpg dir2.jpg
mv dir3/file1.jpg dir3.jpg
Look for all first level directories, identify first file in this directory and then move it one level up
find . -type d \! -name . -prune | while read d; do
f=$(ls $d | head -1)
mv $d/$f .
done
Building on the top answer, here is a general use bash function that simply returns the first path that resolves to a file within the given directory:
getFirstFile() {
for dir in "$1"; do
for file in "$dir"*; do
if [ -f "$file" ]; then
echo "$file"
break 1
fi
done
done
}
Usage:
# don't forget the trailing slash
getFirstFile ~/documents/
NOTE: it will silently return nothing if you pass it an invalid path.

List directories not containing certain files?

I used this command to find all the directories containing .mp3 in the current directory, and filtered out only the directory names:
find . -iname "*.mp3" | sed -e 's!/[^/]*$!!' -e 's!^\./!!' | sort -u
I now want the opposite, but I found it a little harder. I can't just add a '!' to the find command since it'll only exclude .mp3 when printing them not find directories that do not contain .mp3.
I googled this and searched on stackoverflow and unix.stackexchange.com.
I have tried this script so far and it returns this error:
#!/bin/bash
find . -type d | while read dir
do
if [[! -f $dir/*.mp3 ]]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [[!: command not found
#!/bin/bash
find . -type d | while read dir
do
if [! -f $dir/*.mp3 ]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [!: command not found
#!/bin/bash
find . -type d | while read dir
do
if [[! -f "$dir/*.mp3" ]]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [!: command not found
I'm thinking it has to do with multiple arguments for the test command.
Since I'm testing all the directories the variable is going to change, and I use a wildcard for the filenames.
Any help is much appreciated. Thank You.
[ "$(echo $dir/*.mp3)" = "$dir/*.mp3" ]
should work.
Or simply add a space between '[' and '!'
A method that is probably significantly faster is
if find "$dir" -name '*.mp3' -quit ; then
: # there are mp3-files in there.
else
; # no mp3:s
fi
Okay, I solved my own answer by using a counter.
I don't know how efficient it is, but it works. I know it can be made better. Please feel free to critique.
find . -type d | while read dir
do
count=`ls -1 "$dir"/*.mp3 2>/dev/null | wc -l`
if [ $count = 0 ]
then
echo $dir
fi
done
This prints all directories not containing MP3s It also shows sub-directories thanks to the find command printing directories recursively.
I ran a script to automatically download cover art for my mp3 collection. It put a file called "cover.jpg" in the directory for each album for which it could retrieve the art. I needed to check for which albums the script had failed - i.e. which CDs (directories) did not contain a file called cover.jpg. This was my effort:
find . -maxdepth 1 -mindepth 1 -type d | while read dir; do [[ ! -f $dir/cover.jpg ]] && echo "$dir has no cover art"; done
The maxdepth 1 stops the find command from descending into a hidden directory which my WD My Cloud NAS server had created for each album and placed a default generic disc image. (This got cleared during the next scan.)
Edit: cd to the MP3 directory and run it from there, or change the . in the command above to the path to point to it.

Bash rename extension recursive

I know there are a lot of things like this around, but either they don't work recursively or they are huge.
This is what I got:
find . -name "*.so" -exec mv {} `echo {} | sed s/.so/.dylib/` \;
When I just run the find part it gives me a list of files. When I run the sed part it replaces any .so with .dylib. When I run them together they don't work.
I replaced mv with echo to see what happened:
./AI/Interfaces/C/0.1/libAIInterface.so ./AI/Interfaces/C/0.1/libAIInterface.so
Nothing is replaced at all!
What is wrong?
This will do everything correctly:
find -L . -type f -name "*.so" -print0 | while IFS= read -r -d '' FNAME; do
mv -- "$FNAME" "${FNAME%.so}.dylib"
done
By correctly, we mean:
1) It will rename just the file extension (due to use of ${FNAME%.so}.dylib). All the other solutions using ${X/.so/.dylib} are incorrect as they wrongly rename the first occurrence of .so in the filename (e.g. x.so.so is renamed to x.dylib.so, or worse, ./libraries/libTemp.so-1.9.3/libTemp.so is renamed to ./libraries/libTemp.dylib-1.9.3/libTemp.so - an error).
2) It will handle spaces and any other special characters in filenames (except double quotes).
3) It will not change directories or other special files.
4) It will follow symbolic links into subdirectories and links to target files and rename the target file, not the link itself (the default behaviour of find is to process the symbolic link itself, not the file pointed to by the link).
for X in `find . -name "*.so"`
do
mv $X ${X/.so/.dylib}
done
A bash script to rename file extensions generally
#/bin/bash
find -L . -type f -name '*.'$1 -print0 | while IFS= read -r -d '' file; do
echo "renaming $file to $(basename ${file%.$1}.$2)";
mv -- "$file" "${file%.$1}.$2";
done
Credits to aps2012.
Usage
Create a file e.g. called ext-rename (no extension, so you can run it like a command) in e.g. /usr/bin (make sure /usr/bin is added to your $PATH)
run ext-rename [ext1] [ext2] anywhere in terminal, where [ext1] is renaming from and [ext2] is renaming to. An example use would be: ext-rename so dylib, which will rename any file with extension .so to same name but with extension .dylib.
What is wrong is that
echo {} | sed s/.so/.dylib/
is only executed once, before the find is launched, sed is given {} on its input, which doesn't match /.so/ and is left unchanged, so your resulting command line is
find . -name "*.so" -exec mv {} {}
if you have Bash 4
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for file in /path/**/*.so
do
echo mv "$file" "${file/%.so}.dylib"
done
He needs recursion:
#!/bin/bash
function walk_tree {
local directory="$1"
local i
for i in "$directory"/*;
do
if [ "$i" = . -o "$i" = .. ]; then
continue
elif [ -d "$i" ]; then
walk_tree "$i"
elif [ "${i##*.}" = "so" ]; then
echo mv $i ${i%.*}.dylib
else
continue
fi
done
}
walk_tree "."

Resources