Counting the number of files in a directory in bash - bash

I have a bash script where I'm trying to find out the number of files in a directory and perform an addition operation on it as well.
But while doing the same I'm getting the error as follows:
admin> ./fileCount.sh
1
./fileCount.sh: line 6: 22 + : syntax error: operand expected (error token is " ")
My script is as shown:
#!/usr/bin/bash
Var1=22
Var2= ls /stud_data/Input_Data/test3 | grep ".txt" | wc -l
Var3= $(($Var1 + $Var2))
echo $Var3
Can anyone point out where is the error.

A little away
As #devnull already answered to the question point out where is the error,
Just some more ideas:
General unix
To make this kind of browsing, there is a very powerfull command find that let you find recursively, exactly what you're serching for:
Var2=`find /stud_data/Input_Data/test3 -name '*.txt' | wc -l`
If you won't this to be recursive:
Var2=`find /stud_data/Input_Data/test3 -maxdepth 1 -name '*.txt' | wc -l`
If you want files only (meaning no symlink, nor directories)
Var2=`find /stud_data/Input_Data/test3 -maxdepth 1 -type f -name '*.txt' | wc -l`
And so on... Please read the man page: man find.
Particular bash solutions
As your question stand for bash, there is some bashism you could use to make this a lot quicker:
#!/bin/bash
Var1=22
VarLs=(/stud_data/Input_Data/test3/*.txt)
[ -e $VarLs ] && Var2=${#VarLs[#]} || Var2=0
Var3=$(( Var1 + Var2 ))
echo $Var3
# Uncomment next line to see more about current environment
# set | grep ^Var
Where bash expansion will translate /path/*.txt in an array containing all filenames matching the jocker form.
If there is no file matching the form, VarLs will only contain the jocker form himself.
So the test -e will correct this: If the first file of the returned list exist, then assing the number of elements in the list (${#VarLs[#]}) to Var2 else, assign 0 to Var2.

Can anyone point out where is the error.
You shouldn't have spaces around =.
You probably wanted to use command substitution to capture the result in Var2.
Try:
Var1=22
Var2=$(ls /stud_data/Input_Data/test3 | grep ".txt" | wc -l)
Var3=$(($Var1 + $Var2))
echo $Var3
Moreover, you could also say
Var3=$((Var1 + Var2))

Related

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

Finding longest length of filename using wc and assigning variable

cd subdir1/subdir2
ineedthis=$(find subdir3/ -name "*.csv" | tr ' ' '_') ## assigning name of the file into this variable
echo -n $ineedthis | wc -c
I wanted to see the length of all the filename by assigning a variable called ineedthis and making changes such that I don't have any spaces between the names. Then, I tried to use echo -n to read the name only and count the characters to find the length of the name. However, when I try to use wc -c on the echo statement, it would give me the number of characters of the chunk, instead of giving me the length of each filename.
What I was hoping was:
# numbers indicating the length of filename
9 subdir3/saying/hello.csv
6 subdir3/saying/hi.csv
9 subdir3/nay/noway.csv
12 subdir3/nay/nomethod.csv
16 subdir3/nay/you_dont_say.csv
find subdir3/ -name "*.csv" |\
while read path; do
$file=$(basename "$path")
$len=$(echo -n "$file" | wc -c)
echo $len "$path"
done
while loops over each path found by find
basename strips off everything up to final / (optionally a suffix can also be removed. bash provides bultins like ${path##*/} and ${path%%.csv} that are similar)

Find files with identical content [duplicate]

This question already has answers here:
When to wrap quotes around a shell variable?
(5 answers)
Compare files with each other within the same directory
(6 answers)
Closed 4 years ago.
Answer to my question using Kubator command line :
#Function that shows the files having the same content in the current directory
showDuplicates (){
last_file=''
while read -r f1_hash f1_name; do
if [ "$last_file" != "$f1_hash" ]; then
echo "The following files have the exact same content :"
echo "$f1_name"
while read -r f2_hash f2_name; do
if [ "$f1_hash" == "$f2_hash" ] && [ "$f1_name" != "$f2_name" ]; then
echo "$f2_name"
fi
done < <(find ./ -maxdepth 1 -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D)
fi
last_file="$f1_hash"
done < <(find ./ -maxdepth 1 -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D)
}
Original question :
I've seen some discussions about what I'm about to ask but I have troubles understanding the mechanics behind the solution proposed and I have not been able to solve my problem that follows.
I want to make a function to compare files, for that, naively, I've tried the following :
#somewhere I use that to get the files paths
files_to_compare=$(find $base_path -maxdepth 1 -type f)
files_to_compare=( $files_to_compare )
#then I pass files_to_compare as an argument to the following function
showDuplicates (){
files_to_compare=${1}
n_files=$(( ${#files_to_compare[#]} ))
for (( i=0; i < $n_files ; i=i+1 )); do
for (( j=i+1; j < $n_files ; j=j+1 )); do
sameContent "${files_to_compare[i]}" "${files_to_compare[j]}"
r=$?
if [ $r -eq 1 ]; then
echo "The following files have the same content :"
echo ${files_to_compare[i]}
echo ${files_to_compare[j]}
fi
done
done
}
The function 'sameContent' takes the absolute paths of two files and makes use of different commends (du, wc, diff) to return 1 or 0 depending on the files having the same content (respectively).
The incorrectness of that code has showed up with file names containing spaces but I've since read that it's not the way to go to manipulate files in bash.
On https://unix.stackexchange.com/questions/392393/bash-moving-files-with-spaces and some other pages I've read that the correct way to go is to use a code that looks like this :
$ while IFS= read -r file; do echo "$file"; done < files
I seem not to be able to understand what lies behind that bit of code and how I could use it to solve my problem. Particularly due to the fact that I want/need to use intricate loops.
I'm new to bash and it's seems to be a common problem but still if someone was kind enough to give me some insight about how that works that would be wonderful.
p.s.: please excuse the probable grammar mistakes
How about to use md5sum to compare content of files in Your folder instead. That's way safer and standard way. Then You would need only something like this:
find ./ -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D
What it does:
find finds all files -type f in current folder ./ and output separates by null byte -print0 that's needed for special characters like space in filenames (like You are mentioning moving files with space)
xargs takes output from find separated by null byte -0 and performs md5sum hashes on files
sort sorts output by positions 1-32 (which is md5 hash) -k1,32
uniq makes output unique by first 32 characters (md5 hash) -w32 and filter only duplicated lines -D
Output example:
7a2e203cec88aeffc6be497af9f4891f ./file1.txt
7a2e203cec88aeffc6be497af9f4891f ./folder1/copy_of_file1.txt
e97130900329ccfb32516c0e176a32d5 ./test.log
e97130900329ccfb32516c0e176a32d5 ./test_copy.log
If performance is crucial this can be tuned to sort firstly by filesize and only then compare md5sums. Or called mv, rm etc.

Bash error: Division on zero error when using directory path

I'm writing a bash script that checks the number of files in a directory and if it's over a certain number, do something. The code is here:
DIR=/home/inventory
waiting="$((ls ${DIR}/waiting/ -1 | wc -l))"
echo $waiting
if [ $waiting -gt 3 ]
then
(DO SOME STUFF HERE)
fi
The error I am getting is this line....
waiting="$((ls ${DIR}/waiting/ -1 | wc -l))"
Specifically the error is ....
division by 0 (error token is "/home/inventory/waiting/ -1 | wc -l")
I thought trying to put the number of files in this directory into a variable would work using $(()).
Does anyone have an idea why this is failing?
Many TIA.....
Jane
Use single parenthesis:
waiting="$(ls ${DIR}/waiting/ -1 | wc -l)"
$(( ... )) is used to perform arithmetic calculations.
From the man page:
((expression))
The expression is evaluated according to the rules described below under ARITHMETIC EVALUATION. If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1. This is exactly equivalent to let "expression".
It is not recommended to use output of ls to count number of files/directories in a sub -directory as file names may contain newlines or glob characters as well.
Here is one example of doing it safely using gnu find:
dir=/home/inventory
waiting=$(find "$dir" -mindepth 1 -maxdepth 1 -printf '.' | wc -c)
If you don't have gnu find then use:
waiting=$(find "$dir" -mindepth 1 -maxdepth 1 -exec printf '.' \; | wc -c)
Another alternative is to pipe the waiting variable through awk to check for a value greater than 3 so:
DIR=/home/inventory
waiting="$((ls ${DIR}/waiting/ -1 | wc -l))"
echo $waiting
res=$(echo $waiting | awk '{ if ( $0 > 3 ) print "OK" }')
if [ $res == "OK" ]
then
(DO SOME STUFF HERE)
fi
In case someone has a similar problem even without using the compound command ((..))
Be sure your variables are not defined using let keyword, as it would force shell arithmetic evaluation too.
From the referece:
The shell allows arithmetic expressions to be evaluated, as one of the
shell expansions or by using the (( compound command, the let builtin,
or the -i option to the declare builtin.

Recursively count specific files BASH

My goal is to write a script to recursively search through the current working directory and the sub dirctories and print out a count of the number of ordinary files, a count of the directories, count of block special files, count of character special files,count of FIFOs, and a count of symbolic links. I have to use condition tests with [[ ]]. Problem is I am not quite sure how to even start.
I tried the something like the following to search for all ordinary files but I'm not sure how recursion exactly works in BASH scripting:
function searchFiles(){
if [[ -f /* ]]; then
return 1
fi
}
searchFiles
echo "Number of ordinary files $?"
but I get 0 as a result. Anyone help on this?
Why would you not use find?
$ # Files
$ find . -type f | wc -l
327
$ # Directories
$ find . -type d | wc -l
64
$ # Block special
$ find . -type b | wc -l
0
$ # Character special
$ find . -type c | wc -l
0
$ # named pipe
$ find . -type p | wc -l
0
$ # symlink
$ find . -type l | wc -l
0
Something to get you started:
#!/bin/bash
directory=0
file=0
total=0
for a in *
do
if test -d $a; then
directory=$(($directory+1))
else
file=$(($file+1))
fi
total=$(($total+1))
echo $a
done
echo Total directories: $directory
echo Total files: $file
echo Total: $total
No recursion here though, for that you could resort to ls -lR or similar; but then again if you are to use an external program you should resort to using find, that's what it's designed to do.

Resources