How to write a UNIX script to check if directories contain specified number of files - shell

I have a Base Directory that has 4 directories : Dir1 Dir2 Dir3 Dir4. Each of these directories have files in the format: "Sometext_YYYMMMDD". I'm writing a UNIX script to search through the files in all these directories that have a particular string say "20151215", and then printing it on the console.
find . -name "*20151215" -print
Example of files: File1_20151215 (this will be printed);
File2_20151214 (this will not be printed)
I want to write a script that runs through these directories and checks if Dir1 contains 4 files with string "20151215", Dir2 contains 3 files with string "20151215" and Dir3 & Dir4 contains 4 files with string "20151215". If the directories don't contain that number of files with that string, then I want to print those directories.
How do I do that? Please help!
UPDATE: I have an addition to this: There are also some files that are not in the format "Sometext_YYYMMMDD" So, for those I used something like:
find . -name "FILENAME*" -mtime -1 -exec ls -ltr '{}' \;
to extract the timestamp when that file was created. But, I want to know how do I add it to the script so that if the timestamp is 15 Dec 2015, then this file should also be counted in the search?

You got the find part, but now you need to count how many files match the pattern. Since find prints one line per match to the output, you can use "wc -l" to count how many lines there are. Assign that to a variable, that you can use in a comparison, and you're 90% of the way there. E.g.
d1=$(find ./dir1 -name '*20151215*' | wc -l)
if [ $d1 != 4 ]; then echo "dir1" ; fi
For extra credit, you can imagine turning this into a function with inputs of
Directory to search
Filename pattern to match on
How many matches to expect
Which would look like:
check_dir () {
d1=$(find $1 -name "*$2*" | wc -l)
if [ $d1 != $3 ]; then echo $1 ; fi
}
check_dir ./dir1 20151215 4
check_dir ./dir2 20151215 3
Update: with the new requirement to find files based either on the name of the file or the last modification (creation isn't possible), here's two approaches:
The first uses a fairly modern feature of find that isn't available in all versions, newermt:
check_dir () {
d1=$2
d2=$((d1+1))
n=$(find $1 \( -name "*$d1*" \) -o \( -newermt $d1 ! -newermt $d2 \) | wc -l)
if [ $n != $3 ]; then echo $1 ; fi
}
check_dir ./dir1 20151215 4
check_dir ./dir2 20151215 3
Which looks a little confusing, but break it down into small steps and it makes sense:
d1=$2 # So d1=20151215
d2=$((d1+1)) # d2=20151216 (lucky you're specifying the date format this way!)
The find command now has two predicates, to match based on the filename or the modification time:
\( -name "*$2*" \) # Matches filenames that contain 20151215
-o # Or
\( -newermt $d1 ! -newermt $d2 \)
The modification time is greater than midnight on the first day, and not greater than midnight on the next day
The second approach uses a couple of temp files, and sets the timestamps on them using the -d option of the touch command
#!/bin/bash
check_dir () {
d1=$2
d2=$((d1+1))
f1=`mktemp`
f2=`mktemp`
touch -d $d1 $f1
touch -d $d2 $f2
n=$(find $1 \( -name "*$d1*" \) -o \( -newer $f1 ! -newer $f2 \) | wc -l)
if [ $n != $3 ]; then echo $1 "=" $n ; fi
rm -f $f1 $f2
}
Again, it's lucky that the date is in YYYYMMDD since that works with the -d option of the touch command. If not, you would need to do some string manipulation to get the date into the correct format for "touch -t".

Related

Usage of for loop and if statement in bash

I am using the following code but the final 'echo $dirname' is giving empty output on console
for folderpath in find /u01/app/SrcFiles/commercial/ngdw/* -name "IQ*";
do
folder_count=ls -d $folderpath/* | wc -l
echo -e "Total Date folder created : $folder_count in $folderpath \n"
if [ $folder_count -ne 0 ];
then
for dirpath in `find $folderpath/* -name "2*" `;
do
dirname=${dirpath##*/}
(( dirname <= 20210106 )) || continue
echo $dirname
done
fi
done
First I would calculate the date it was 3 months ago with the date command:
# with GNU date (for example on Linux)
mindate=$(date -d -3month +%Y%m%d)
# with BSD date (for example on macOS)
mindate=$(date -v -3m +%Y%m%d)
Then I would use a shell arithmetic comparison for determining the directories to remove:
# for dirpath in "$directory"/*
for dirpath in "$directory"/{20220310,20220304,20220210,20220203,20210403,20210405}
do
dirname=${dirpath##*/}
(( dirname <= mindate )) || continue
echo "$dirpath"
# rm -rf "$dirpath"
done
== doesn't do wildcard matching. You should do that in the for statement itself.
There's also no need to put * at the beginning of the wildcard, since the year is at the beginning of the directory name, not in the middle.
for i in "$directory"/202104*; do
if [ -d "$i" ]; then
echo "$i"
rm -rf "$i"
fi
done
The if statement serves two purposes:
If there are no matching directories, the wildcard expands to itself (unless you set the nullglob option), and you'll try to remove this nonexistent directory.
In case there are matching files rather than directories, they're skipped.
Suggesting to find which are the directories that created before 90 days ago or older, with find command.
find . -type d -ctime +90
If you want to find which directories created 90 --> 100 days ago.
find . -type d -ctime -100 -ctime +90
Once you have the correct folders list. Feed it to the rm command.
rm -rf $(find . -type d -ctime +90)

Find and count the results, delete if it is less than x

I would like to search a directory and all its subdirectories for files that are structured like this: ABC.001.XYZ, ABC.001.DEF, ABC.002.XYZ and so fourth.
It should search for all files beginning with ABC.001, count the results, and if it is less than x, delete all files beginning with that. Then move on to ABC.002 and so on.
dir = X
counter=1
while [ $counter -le 500 ]
do
if [find ${dir} -type f -name 'ABC*' | wc -l -eq 5]
then
for file in $(find ${dir} -type f -name 'ABC*')
do
/bin/rm -i ${file}
fi
((counter++))
done
My question is
I. how do I plug in the variable counter for -name 'ABC*' so it increments up. (Like a string placeholder)
II. How would I make it so if the counter is less than 10 or 100, I place 00 or 0 before the counter, so it would actually search for ABC001*, instead of ABC1*
You can use printf to print formatted numbers as in most languages:
printf "ABC%03d" "$counter"
Simple substitution can put this into the arguments to find. Also worth mentioning that find can delete files directly, and just personal preference, but a for loop is probably neater.
#!/bin/bash
dir=X
for counter in $(seq 1 500); do
if [[ $(find "$dir" -type f -name "$(printf "ABC%03d" "$counter")" | wc -l) -eq 5 ]]; then
find "$dir" -type f -name "$(printf "ABC%03d" "$counter")" -delete
fi
done

find emitting unexpected ".", making wc -l list more contents than expected

I'm trying to use the newer command as follows:
touch $HOME/mark.start -d "$d1"
touch $HOME/mark.end -d "$d2"
SF=$HOME/mark.start
EF=$HOME/mark.end
find . -newer $SF ! -newer $EF
But this gives me an output like this:
.
./File5
and counts it as 2 files, however that directory only has 1 file i.e., File5. Why is this happening and how to solve it?
UPDATE:
I'm actually trying to run the following script:
#!/bin/bash
check_dir () {
d1=$2
d2=$((d1+1))
f1=`mktemp`
f2=`mktemp`
touch -d $d1 $f1
touch -d $d2 $f2
n=$(find $1 \( -name "*$d1*" \) -o \( -newer $f1 ! -newer $f2 \) | wc -l)
if [ $n != $3 ]; then echo $1 "=" $n ; fi
rm -f $f1 $f2
}
That checks if the directory has file that either has a particular date in the format YYYMMDD or if its last modification time was last 1 day.
check_dir ./dir1 20151215 4
check_dir ./dir2 20151215 3
where in dir1 there should be 4 such files and if it is not true then it will print the actual number of files that is there.
So, when the directory only has file with dates in their name, then it checks them fine, but when it checks with newer, it always gives 1 file extra (which is not even there in the directory). Why is this happening???
The question asks why there's an extra . in the results from find, even when no file or directory by that name exists. The answer is simple: . always exists, even when it's hidden. Use ls -a to show hidden contents, and you'll see that it's present.
Your existing find command doesn't exempt the target directory itself -- . -- from being a legitimate result, which is why you're getting more results than you expect.
Add the following filter:
-mindepth 1 # only include content **under** the file or directory specified
...or, if you only want to count files, use...
-type f # only include regular files
Assuming GNU find, by the way, this all can be made far more efficient:
check_dir() {
local d1 d2 # otherwise these variables leak into global scope
d1=$2
d2=$(gdate -d "+ 1 day $d1" '+%Y%m%d') # assuming GNU date is installed as gdate
n=$(find "$1" -mindepth 1 \
-name "*${d1}*" -o \
'(' -newermt "$d1" '!' -newermt "$d2" ')' \
-printf '\n' | wc -l)
if (( n != $3 )); then
echo "$1 = $n"
fi
}

How to locate the directory where the sum of the number of lines of regular file is greatest (in bash)

Hi i'm new in Unix and bash and I'd like to ask q. how can i do this
The specified directory is given as arguments. Locate the directory
where the sum of the number of lines of regular file is greatest.
Browse all specific directories and their subdirectories. Amounts
count only for files that are directly in the directory.
I try somethnig but it's not working properly.
while [ $# -ne 0 ];
do case "$1" in
-h) show_help ;;
-*) echo "Error: Wrong arguments" 1>&2 exit 1 ;;
*) directories=("$#") break ;;
esac
shift
done
IFS='
'
amount=0
for direct in "${directories[#]}"; do
for subdirect in `find $direct -type d `; do
temp=`find "$subdirect" -type f -exec cat {} \; | wc -l | tr -s " "`
if [ $amount -lt $temp ]; then
amount=$temp
subdirect2=$subdirect
fi
done
echo Output: "'"$subdirect2$amount"'"
done
the problem is here when i use as arguments this dirc.(just example)
/home/usr/first and there are this direct.
/home/usr/first/tmp/first.txt (50 lines)
/home/usr/first/tmp/second.txt (30 lines)
/home/usr/first/tmp1/one.txt (20 lines)
it will give me on Output /home/usr/first/tmp1 100 and this is wrong it should be /home/usr/first/tmp 80
I'd like to scan all directories and all its subdirectories in depth. Also if multiple directories meets the maximum should list all.
Given your sample files, I'm going to assume you only want to look at the immediate subdirectories, not recurse down several levels:
max=-1
# the trailing slash limits the wildcard to directories only
for dir in */; do
count=0
for file in "$dir"/*; do
[[ -f "$file" ]] && (( count += $(wc -l < "$file") ))
done
if (( count > max )); then
max=$count
maxdir="$dir"
fi
done
echo "files in $maxdir have $max lines"
files in tmp/ have 80 lines
In the spirit of Unix (caugh), here's an absolutely disgusting chain of pipes that I personally hate, but it's a lot of fun to construct :):
find . -mindepth 1 -maxdepth 1 -type d -exec sh -c 'find "$1" -maxdepth 1 -type f -print0 | wc -l --files0-from=- | tail -1 | { read a _ && echo "$a $1"; }' _ {} \; | sort -nr | head -1
Of course, don't use this unless you're mentally ill, use glenn jackman's nice answer instead.
You can have great control on find's unlimited filtering possibilities, too. Yay. But use glenn's answer!

Unix to verify file has no content and empty lines

How to verify that a file has absolutely no content. [ -s $file ] gives if file is zero bytes but how to know if file is absolutely empty with no data that including empty lines ?
$cat sample.text
$ ls -lrt sample.text
-rw-r--r-- 1 testuser userstest 1 Jul 31 16:38 sample.text
When i "vi" the file the bottom has this - "sample.text" 1L, 1C
Your file might have new line character only.
Try this check:
[[ $(tr -d "\r\n" < file|wc -c) -eq 0 ]] && echo "File has no content"
A file of 0 size by definition has nothing in it, so you are good to go. However, you probably want to use:
if [ \! -s f ]; then echo "0 Sized and completely empty"; fi
Have fun!
Blank lines add data to the file and will therefore increase the file size, which means that just checking whether the file is 0 bytes is sufficient.
For a single file, the methods using the bash built-in -s (for test, [ or [[). ([[ makes dealing with ! less horrible, but is bash-specific)
fn="file"
if [[ -f "$fn" && ! -s "$fn" ]]; then # -f is needed since -s will return false on missing files as well
echo "File '$fn' is empty"
fi
A (more) POSIX shell compatible way: (The escaping of exclamation marks can be shell dependant)
fn="file"
if test -f "$fn" && test \! -s "$fn"; then
echo "File '$fn' is empty"
fi
For multiple files, find is a better method.
For a single file you can do: (It will print the filename if empty)
find "$PWD" -maxdepth 1 -type f -name 'file' -size 0 -print
for multiple files, matching the glob glob*:(It will print the filenames if empty)
find "$PWD" -maxdepth 1 -type f -name 'glob*' -size 0 -print
To allow subdirectories:
find "$PWD" -type f -name 'glob*' -size 0 -print
Some find implementations does not require a directory as the first parameter (some do, like the Solaris one). On most implementations, the -print parameter can be omitted, if it is not specified, find defaults to printing matching files.

Resources