Loop through all the files with a specific extension - bash

for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?

No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)

Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done

Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done

the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done

as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.

I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!

I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.

Related

How to call a command over every .json file in a directory with different file extensions? [duplicate]

for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?
No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)
Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done
Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done
the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done
as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.
I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!
I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.

List all the files with prefixes from a for loop using Bash

Here is a small[but complete] part of my bash script that finds and outputs all files in mydir if the have the prefix from a stored array. Strange thing I notice is that this script works perfectly if I take out the "-maxdepth 1 -name" from the script else it only gives me the files with the prefix of the first element in the array.
It would be of great help if someone explained this to me. Sorry in advance if there is some thing obviously silly that I'm doing. I'm relatively new to scripting.
#!/bin/sh
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in $DIS_ARRAY
do
IN_FILES=`find /mydir -maxdepth 1 -name "$dis*.xml"`
for file in $IN_FILES
do
echo $file
done
done
Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
Expected Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
/mydir/Bc.xml
/mydir/Cb.xml
/mydir/Dc.xml
The loop is broken either way. The reason why
IN_FILES=`find mydir -maxdepth 1 -name "$dis*.xml"`
works, whereas
IN_FILES=`find mydir "$dis*.xml"`
doesn't is because in the first one, you have specified -name. In the second one, find is listing all the files in mydir. If you change the second one to
IN_FILES=`find mydir -name "$dis*.xml"`
you will see that the loop isn't working.
As mentioned in the comments, the syntax that you are currently using $DIS_ARRAY will only give you the first element of the array.
Try changing your loop to this:
for dis in "${DIS_ARRAY[#]}"
The double quotes around the expansion aren't strictly necessary in your specific case, but required if the elements in your array contained spaces, as demonstrated in the following test:
#!/bin/bash
arr=("a a" "b b")
echo using '$arr'
for i in $arr; do echo $i; done
echo using '${arr[#]}'
for i in ${arr[#]}; do echo $i; done
echo using '"${arr[#]}"'
for i in "${arr[#]}"; do echo $i; done
output:
using $arr
a
a
using ${arr[#]}
a
a
b
b
using "${arr[#]}"
a a
b b
See this related question for further details.
#TomFenech's answer solves your problem, but let me suggest other improvements:
#!/usr/bin/env bash
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in "${DIS_ARRAY[#]}"
do
for file in "/mydir/$dis"*.xml
do
if [ -f "$file" ]; then
echo "$file"
fi
done
done
Your shebang line references sh, but your question is tagged bash - unless you need POSIX compliance, use a bash shebang line to take advantage of all that bash has to offer
To match files located directly in a given directory (i.e., if you don't need to traverse an entire subtree), use a glob (filename pattern) and rely on pathname expansion as in my code above - no need for find and command substitution.
Note that the wildcard char. * is UNquoted to ensure pathname expansion.
Caveat: if no matching files are found, the glob is left untouched (assuming the nullglob shell option is OFF, which it is by default), so the loop is entered once, with an invalid filename (the unexpanded glob) - hence the [ -f "$file" ] conditional to ensure that an actual match was found (as an aside: using bashisms, you could use [[ -f $file ]] instead).

bash filename start matching

I've got a simple enough question, but no guidance yet through the forums or bash. The question is as follows:
I want to add a prefix string to each filename in a directory that matches *.h or *.cpp. HOWEVER, if the prefix has already been applied to the filename, do NOT apply it again.
Why the following doesn't work is something that has yet to be figured out:
for i in *.{h,cpp}
do
if [[ $i!="$pattern*" ]]
then mv $i $pattern$i
fi
done
you can try this:
for i in *.{h,cpp}
do
if ! ( echo $i | grep -q "^$pattern" )
# if the file does not begin with $pattern rename it.
then mv $i $pattern$i
fi
done
Others have shown replacements comparisons that work; I'll take a stab at why the original version didn't. There are two problems with the original prefix test: you need spaces between the comparison operator (!=) and its operands, and the asterisk was in quotes (meaning it gets matched literally, rather than as a wildcard). Fix these, and (at least in my tests) it works as expected:
if [[ $i != "$pattern"* ]]
#!/bin/sh
pattern=testpattern_
for i in *.h *.cpp; do
case "$i" in
$pattern*)
continue;;
*)
mv "$i" "$pattern$i";;
esac
done
This script will run in any Posix shell, not just bash. (I wasn't sure if your question was "why isn't this working" or "how do I make this work" so I guessed it was the second.)
for i in *.{h,cpp}; do
[ ${i#prefix} = $i ] && mv $i prefix$i
done
Not exactly conforming to your script, but it should work. The check returns true if there is no prefix (i.e. if $i, with the prefix "prefix" removed, equals $i).

Bash Compound Conditional, With Wildcards and File Existence Check

I've mastered the basics of Bash compound conditionals and have read a few different ways to check for file existence of a wildcard file, but this one is eluding me, so I figured I'd ask for help...
I need to:
1.) Check if some file matching a pattern exists
AND
2.) Check that text in a different file exists.
I know there's lots of ways to do this, but I don't really have the knowledge to prioritize them (if you have that knowledge I'd be interested in reading about that as well).
First things that came to mind is to use find for #1 and grep for #2
So something like
if [ `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ] \
&& [ `find -name "jobscript_minim\*cmd\*o\*"` ]; then
echo "Both passed! (1)"
fi
That fails, though curiously:
if `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ;then
echo "Text passed!"
fi
if `find -name "jobscript_minim\*cmd\*o\*"` ;then
echo "File passed!"
fi
both pass...
I've done a bit of reading and have seen people talking about the problem of multiple filenames matching wildcards within an if statement. What's the best solution to this? (in answer my question, I'd assumed you take a crack at that question, as well, in the process)
Any ideas/solutions/suggestions?
Let's tackle why your attempt failed first:
if [ `grep -q …` ];
This runs the grep command between backticks, and interpolates the output inside the conditional command. Since grep -q doesn't produce any output, it's as if you wrote if [ ];
The conditional is supposed to test the return code of grep, not anything about its output. Therefore it should be simply written as
if grep -q …;
The find command returns 0 (i.e. true) even if it finds nothing, so this technique won't work. What will work is testing whether its output is empty, by collecting its output any comparing it to the empty string:
if [ "$(find …)" != "" ];
(An equivalent test is if [ -n "$(find …)" ].)
Notice two things here:
I used $(…) rather than backticks. They're equivalent, except that backticks require strange quoting inside them (especially if you try to nest them), whereas $(…) is simple and reliable. Just use $(…) and forget about backticks (except that you need to write \` inside double quotes).
There are double quotes around $(…). This is really important. Without the quotes, the shell would break the output of the find command into words. If find prints, say, two lines dir/file and dir/otherfile, we want if [ "dir/file dir/otherfile" = "" ]; to be executed, not if [ dir/file dir/otherfile = "" ]; which is a syntax error. This is a general rule of shell programming: always put double quotes around a variable or command substitution. (A variable substitution is $foo or ${foo}; a command substitution is $(command).)
Now let's see your requirements.
Check if some file matching a pattern exists
If you're looking for files in the current directory or in any directory below it recursively, then find -name "PATTERN" is right. However, if the directory tree can get large, it's inefficient, because it can spend a lot of time printing all the matches when we only care about one. An easy optimization is to only retain the first line by piping into head -n 1; find will stop searching once it realizes that head is no longer interested in what it has to say.
if [ "$(find -name "jobscript_minimcmdo" | head -n 1)" != "" ];
(Note that the double quotes already protect the wildcards from expansion.)
If you're only looking for files in the current directory, assuming you have GNU find (which is the case on Linux, Cygwin and Gnuwin32), a simple solution is to tell it not to recurse deeper than the current directory.
if [ "$(find -maxdepth 1 -name "jobscript_minim*cmd*o*")" != "" ];
There are other solutions that are more portable, but they're more complicated to write.
Check that text in a different file exists.
You've already got a correct grep command. Note that if you want to search for a literal string, you should use grep -F; if you're looking for a regexp, grep -E has a saner syntax than plain grep.
Putting it all together:
if grep -q -F "OUTPUT FILE AT STEP 1000" ../log/minimize.log &&
[ "$(find -name "jobscript_minim*cmd*o*")" != "" ]; then
echo "Both passed! (1)"
fi
bash 4
shopt -s globstar
files=$(echo **/jobscript_minim*cmd*o*)
if grep -q "pattern" file && [[ ! -z $files ]];then echo "passed"; fi
for i in filename*; do FOUND=$i;break;done
if [ $FOUND == 'filename*' ]; then
echo “No files found matching wildcard.”
else
echo “Files found matching wildcard.”
fi

Errors from if and else statements in shell

I am just new to programming in Unix and have a small issue that I am unsure of how to solve. The objective of this piece of my script is to offer the user various options as to the type of scan they would like to use. This scan detects duplicate files with specified variables depending on the option chosen.
I am unable to get it working at all and am unsure why?
Also could you please offer me advice on how I could better display the selection screen if possible. I have only pasted part of my code as I would like to figure out the rest of my objective myself.
#!/bin/bash
same_name="1"
filesize="2"
md5sum="3"
different_name="4"
echo "The list of choices are, same_name=1, filesize=2, md5sum=3 and different name=4"
echo "Search for files with the:"
read choice
if [$choice == "$same_name" ];then
find /home/user/OSN -type f -exec basename '{}' \; | sort > filelist.txt
find /home/user/OSN -type f -exec basename '{}' \; | sort | uniq -d > repeatlist.txt
else
ls -al /home/user/OSN >2filelist.txt
fi
The shell command [ also known as test needs a space after it for the shell to parse correctly. For example:
if [ "x$choice" == x"$same_name" ] ; then
is equivalent to
if test "x$choice" == "x$same_name" ; then
prepending "x" to the variables is an idiom to prevent test from seeing too few arguments. Test would complain if called as test 5 == so if $choice and $same_name were empty the call to expr is syntactically correct.
You can also use the construct ${choice:-default} or ${choice:=default} to guard against unset or null shell variables.
It would help if you included the error messages you were receiving. When I tried this, I got an error of:
./foo: line 9: [1: command not found
This makes the problem fairly clear. The [ operator in the if statement is, in Unix's "never use something complicated when some simple hack will work" style, just another program. (See ls /bin/[ for proof!) As such, it needs to be treated like any other program with command-line options; you separate it from its options with whitespace. Otherwise, bash will think that "[$choice", concatenated, is the name of a program to execute and will try to execute it. Thus, that line needs to be:
if [ $choice == "$same_name" ];then
After I changed that, it worked.
Also, as a style suggestion, I'd note that the case construct is a much easier way to write this code than using if statements, when you've got more than one test. And, as noted in other answers, you should put " marks around $choice, to guard against the case where the user input is empty or contains spaces -- $choice, unquoted, will expand to a list of zero or more tokens separated by whitespace, whereas "$choice" always expands to a single token.
can't believe nobody's picked up this error: if you use [ (or test), the operator for string equality is = not ==.
you can do it like this.
while true
do
cat <<EOF
The list of choices are:
1) same name
2) filesize
3) md5sum
4) different name
5) exit
EOF
read -r -p "Enter your choice: " choice
case "$choice" in
1)
find /home/user/OSN -type f -exec basename '{}' \; | sort > filelist.txt
find /home/user/OSN -type f -exec basename '{}' \; | sort | uniq -d > repeatlist.txt
5) exit;
*) ls -al /home/user/OSN >2filelist.txt
esac
done
Bash's double square brackets are much more forgiving of quoting and null or unset variables.
if [[ $choice == "$same_name" ]]; then
You should take a look at Bash's select and case statements:
choices="same_name filesize md5sum different_name exit"
PS3="Make a selection: " # this is the prompt that the select statement will display
select choice in $choices
do
case $choice in
same_name)
find ...
;;
filesize)
do_something
;;
.
.
.
exit)
break
;;
esac
done

Resources