I'm practicing writing small bash scripts. I have a habit of accidentally typing "teh" instead of "the". So now I want to start at a directory and go through all the files there to find how many occurrences of "teh" there are.
Here's what I have:
#!/bin/bash
for file in `find` ; do
if [ -f "${file}" ] ; then
for word in `cat "${file}"` ; do
if [ "${word}" = "teh" -o "${word}" = "Teh" ] ; then
echo "found one"
fi
done
fi
done
When I run it, I get
found one
found one
found one
found one
found one
found one
./findTeh: line 6: [: too many arguments
Why am I getting too many arguments. Am I not doing the if statement properly?
Thanks
The behavior of test with more than three arguments is, per POSIX, not well-defined and deprecated. That's because variable arguments can be treated as logical constructs with meaning to the test command itself in that case.
Instead, use the following, which will work on all POSIX shells, not only bash:
if [ "$word" = teh ] || [ "$word" = Teh ]; then
echo "Found one"
fi
...or, similarly, a case statement:
case $word in
[Tt]eh) echo "Found one" ;;
esac
Now, let's look at the actual standard underlying the test command:
>4 arguments:
The results are unspecified.
[OB XSI] [Option Start] On XSI-conformant systems, combinations of primaries and operators shall be evaluated using the precedence and associativity rules described previously. In addition, the string comparison binary primaries '=' and "!=" shall have a higher precedence than any unary primary. [Option End]
Note the OB flag: The use of a single test command with more than four arguments is obsolete, and is an optional part of the standard regardless (which not all shells are required to implement).
Finally, consider the following revision to your script, with various other bugs fixed (albeit using various bashisms, and thus not portable to all POSIX shells):
#!/bin/bash
# read a filename from find on FD 3
while IFS= read -r -d '' filename <&3; do
# read words from a single line, as long as there are more lines
while read -a words; do
# compare each word in the most-recently-read line with the target
for word in "${words[#]}"; do
[[ $word = [Tt]eh ]] && echo "Found one"
done
done <"$filename"
done 3< <(find . -type f -print0)
Some of the details:
By delimiting filenames with NULs, this works correctly with all possible filenames, including files with spaces or newlines in their names, which for file in $(find) does not.
Quoted array expansion, ie. for word in "${words[#]}", avoids glob expansion; with the old code, if * were a word in a file, then the code would subsequently be iterating over filenames in the current directory rather than over words in a file.
Using while read -a to read in a single line at a time both avoids the aforementioned glob expansion behavior, and also acts to constrain memory usage (if very large files are in play).
A common idiom to avoid this sort of problem is to add "x" on both sides of comparisons:
if [ "x${word}" = "xteh" -o "x${word}" = "xTeh" ] ; then
Related
for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?
No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)
Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done
Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done
the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done
as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.
I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!
I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.
I have this bash script that looks like this
#!/bin/bash
for f in src/styles/*.less src/styles/brand/*.less
do
filename=$(basename "$f")
if [ ${filename:0:1} != '_' ]; then
lessc --no-color -x "${f}" "$(sed 's|^src/styles/|httpdocs/css/|;s/.less$/.css/' <<< $f)"
fi
done
It is supposed to compile a series of .less files in the style and brand directories, I think. When I run it though, I get this:
./compile_less.sh: line 6: [: too many arguments
Any idea what is going wrong here? I'm a newb with bash scripts and am using this one from a former co-worker.
Thanks!
There's an existing answer that addresses best practices, but doesn't explain why it's a too-many-arguments error.
"${filename:0:1}" needs double quotes if you want to ensure that it will always be one word, and not zero words or several words -- if it comes out empty, it's zero words, so you run [ != _ ].
Leaving out the quotes is even worse if the first character is a * -- then it expands to one word per file in the current directory, and that's far too many operands.
[ != _ ] is too many arguments because there's nothing in a position to be a binary operator, so [ expects to see only a single operand.
[ all your filenames here != _ ] is too many arguments for a far less subtle reason. :)
The problem with using
for f in src/styles/*.less src/styles/brand/*.less
is that if no file matches the then f will be src/styles/*.less src/styles/brand/*.less and filename will be *.less. This causes problem in the if statement.
You can check if file exists and continue:
...
[[ ! -e "${f}" ]] && continue # or any other appropriate action
filename=$(basename "$f")
...
Here is a small[but complete] part of my bash script that finds and outputs all files in mydir if the have the prefix from a stored array. Strange thing I notice is that this script works perfectly if I take out the "-maxdepth 1 -name" from the script else it only gives me the files with the prefix of the first element in the array.
It would be of great help if someone explained this to me. Sorry in advance if there is some thing obviously silly that I'm doing. I'm relatively new to scripting.
#!/bin/sh
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in $DIS_ARRAY
do
IN_FILES=`find /mydir -maxdepth 1 -name "$dis*.xml"`
for file in $IN_FILES
do
echo $file
done
done
Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
Expected Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
/mydir/Bc.xml
/mydir/Cb.xml
/mydir/Dc.xml
The loop is broken either way. The reason why
IN_FILES=`find mydir -maxdepth 1 -name "$dis*.xml"`
works, whereas
IN_FILES=`find mydir "$dis*.xml"`
doesn't is because in the first one, you have specified -name. In the second one, find is listing all the files in mydir. If you change the second one to
IN_FILES=`find mydir -name "$dis*.xml"`
you will see that the loop isn't working.
As mentioned in the comments, the syntax that you are currently using $DIS_ARRAY will only give you the first element of the array.
Try changing your loop to this:
for dis in "${DIS_ARRAY[#]}"
The double quotes around the expansion aren't strictly necessary in your specific case, but required if the elements in your array contained spaces, as demonstrated in the following test:
#!/bin/bash
arr=("a a" "b b")
echo using '$arr'
for i in $arr; do echo $i; done
echo using '${arr[#]}'
for i in ${arr[#]}; do echo $i; done
echo using '"${arr[#]}"'
for i in "${arr[#]}"; do echo $i; done
output:
using $arr
a
a
using ${arr[#]}
a
a
b
b
using "${arr[#]}"
a a
b b
See this related question for further details.
#TomFenech's answer solves your problem, but let me suggest other improvements:
#!/usr/bin/env bash
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in "${DIS_ARRAY[#]}"
do
for file in "/mydir/$dis"*.xml
do
if [ -f "$file" ]; then
echo "$file"
fi
done
done
Your shebang line references sh, but your question is tagged bash - unless you need POSIX compliance, use a bash shebang line to take advantage of all that bash has to offer
To match files located directly in a given directory (i.e., if you don't need to traverse an entire subtree), use a glob (filename pattern) and rely on pathname expansion as in my code above - no need for find and command substitution.
Note that the wildcard char. * is UNquoted to ensure pathname expansion.
Caveat: if no matching files are found, the glob is left untouched (assuming the nullglob shell option is OFF, which it is by default), so the loop is entered once, with an invalid filename (the unexpanded glob) - hence the [ -f "$file" ] conditional to ensure that an actual match was found (as an aside: using bashisms, you could use [[ -f $file ]] instead).
for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?
No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)
Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done
Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done
the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done
as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.
I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!
I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.
I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt