Fastest way to find text in folder - full-text-search

I was using the return value of fgrep -s 'text' /folder/*.txt to find if 'text' is in any .txt file in /folder/. It works, but I find it too slow for what I need, like if it searches for 'text' in all the files before giving me an answer.
I need something that quickly gives me a yes/no answer when it finds at least one file with the 'text'. Probably some awk script.

If I understand your question correctly, you want:
fgrep -m1
Which stops after one match.

You can use this to shorten your search if it's the kind that would be based on mopoke's answer. This stops after the first match in the first file in which it's found:
# found=$(false)
found=1 # false
text="text"
for file in /folder/*.txt
do
if fgrep -m1 "$text" "$file" > /dev/null
then
found=$?
# echo "$text found in $file"
break
fi
done
# if [[ ! $found ]]
# then
# echo "$text not found"
# fi
echo $found # or use exit $found
Edit: commented out some lines and made a couple of other changes.
If there is a large number of files, then the for could fail and you should use a while loop with a find piped into it.
If all you want to do is find out whether there is any .txt file in a folder regardless of the file's contents, then use something like this all by itself:
find /folder -name "*.txt"

Building on the answers above, this should do it I think ?
find /folder -name \*.txt -exec grep "text" {}\;
But I'm not sure I fully understand the problem : is 'fgrep' doing a full depth recursion before it starts outputting or something ? The find should report as-it-finds so might be better for what you need, dunno.
[edit, not tested]: Change the 'grep' above for a shell-script that does something like:
grep "text" && exit 0 || exit 1
To get the true|false thing you need to work (you'll need to play around with this , haven't tested exactly the steps here - no access to Unix at the moment :-( )

Related

How to call a command over every .json file in a directory with different file extensions? [duplicate]

for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?
No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)
Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done
Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done
the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done
as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.
I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!
I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.

Searching within txt files using bash

I am looking trivial solution for the trivial task.
My bash script is looping several folders producing in which some log.txt file. If some operation in each case has performed successfully in each of the log the string with sentence "The unit is OK" should somewhere in the log.txt appeared, however its actual position (precise number of string) in each log.txt is differs!
I need to put in my loop some condition (probably using IF ) to check whether that sentence is actually present somewhere in the log file and if so - to print "Everything is OK" within the terminal where my script is executed in moment of looping of particular folder, and otherwise (if the string is absent in the log) to print smth like "Bad news"!
Will be thankful for the different solutions especially how to find the strings of selected phrases in the given log file.
Thanks!!
Gleb
To apply the test to an entire directory, a simple for loop can be use. If you need to process directories recursively, you can feed a while read -r... loop with the results of find. In either case the search with be similar. Here is an example searching a single directory of log files:
$ for i in dirname/*; do grep -q 'The unit is OK' "$i" && \
echo "$i - Everything is OK" || echo "$i - Bad news"; done
Searching where dirname is my test dat directory, example results would be:
dat/test.properties - Bad news
dat/test_pph_s.txt - Bad news
dat/testlog-1.txt - Everything is OK
dat/testlog-2.txt - Everything is OK
dat/testlog-3.txt - Everything is OK
Where the testlog-X.txt files contain, for example:
$ cat dat/testlog-1.txt
The unit is OK
Without more info (examples of strings you are looking for, potential file names, whatever bash script you already have) it is hard to provide concrete advice.
Still, if you have a known set of potential strings you could check for their existence in a file while you're looping
if [ `grep -E (pattern1|pattern2|pattern3) $FILE | wc -l` -gt 0 ]; then
# String was found in file
fi
Scanning files with some pattern (eg: log.txt)
for file in $(find /path/to/logs -name "log.txt")
do
grep -q "The unit is OK" $file && echo "$file: Everything is OK" || echo "$file: Bad news"
done

Loop through all the files with a specific extension

for i in $(ls);do
if [ $i = '*.java' ];then
echo "I do something with the file $i"
fi
done
I want to loop through each file in the current folder and check if it matches a specific extension. The code above doesn't work, do you know why?
No fancy tricks needed:
for i in *.java; do
[ -f "$i" ] || break
...
done
The guard ensures that if there are no matching files, the loop will exit without trying to process a non-existent file name *.java.
In bash (or shells supporting something similar), you can use the nullglob option
to simply ignore a failed match and not enter the body of the loop.
shopt -s nullglob
for i in *.java; do
...
done
Some more detail on the break-vs-continue discussion in the comments. I consider it somewhat out of scope whether you use break or continue, because what the first loop is trying to do is distinguish between two cases:
*.java had no matches, and so is treated as literal text.
*.java had at least one match, and that match might have included an entry named *.java.
In case #1, break is fine, because there are no other values of $i forthcoming, and break and continue would be equivalent (though I find break more explicit; you're exiting the loop, not just waiting for the loop to exit passively).
In case #2, you still have to do whatever filtering is necessary on any possible matches. As such, the choice of break or continue is less relevant than which test (-f, -d, -e, etc) you apply to $i, which IMO is the wrong way to determine if you entered the loop "incorrectly" in the first place.
That is, I don't want to be in the position of examining the value of $i at all in case #1, and in case #2 what you do with the value has more to do with your business logic for each file, rather than the logic of selecting files to process in the first place. I would prefer to leave that logic to the individual user, rather than express one choice or the other in the question.
As an aside, zsh provides a way to do this kind of filtering in the glob itself. You can match only regular files ending with .java (and disable the default behavior of treating unmatched patterns as an error, rather than as literal text) with
for f in *.java(.N); do
...
done
With the above, you are guaranteed that if you reach the body of the loop, then $f expands to the name of a regular file. The . makes *.java match only regular files, and the N causes a failed match to expand to nothing instead of producing an error.
There are also other such glob qualifiers for doing all sorts of filtering on filename expansions. (I like to joke that zsh's glob expansion replaces the need to use find at all.)
Recursively add subfolders,
for i in `find . -name "*.java" -type f`; do
echo "$i"
done
Loop through all files ending with: .img, .bin, .txt suffix, and print the file name:
for i in *.img *.bin *.txt;
do
echo "$i"
done
Or in a recursive manner (find also in all subdirectories):
for i in `find . -type f -name "*.img" -o -name "*.bin" -o -name "*.txt"`;
do
echo "$i"
done
the correct answer is #chepner's
EXT=java
for i in *.${EXT}; do
...
done
however, here's a small trick to check whether a filename has a given extensions:
EXT=java
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
echo "I do something with the file $i"
fi
done
as #chepner says in his comment you are comparing $i to a fixed string.
To expand and rectify the situation you should use [[ ]] with the regex operator =~
eg:
for i in $(ls);do
if [[ $i =~ .*\.java$ ]];then
echo "I want to do something with the file $i"
fi
done
the regex to the right of =~ is tested against the value of the left hand operator and should not be quoted, ( quoted will not error but will compare against a fixed string and so will most likely fail"
but #chepner 's answer above using glob is a much more efficient mechanism.
I agree withe the other answers regarding the correct way to loop through the files. However the OP asked:
The code above doesn't work, do you know why?
Yes!
An excellent article What is the difference between test, [ and [[ ?] explains in detail that among other differences, you cannot use expression matching or pattern matching within the test command (which is shorthand for [ )
Feature new test [[ old test [ Example
Pattern matching = (or ==) (not available) [[ $name = a* ]] || echo "name does not start with an 'a': $name"
Regular Expression =~ (not available) [[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"
matching
So this is the reason your script fails. If the OP is interested in an answer with the [[ syntax (which has the disadvantage of not being supported on as many platforms as the [ command), I would be happy to edit my answer to include it.
EDIT: Any protips for how to format the data in the answer as a table would be helpful!
I found this solution to be quite handy. It uses the -or option in find:
find . -name \*.tex -or -name "*.png" -or -name "*.pdf"
It will find the files with extension tex, png, and pdf.

bash to print certain file names to text

I have spent a lot of time the past few weeks and posting on here. I finally think I am much closer with learning bash but I have one problem with my code I cannot for the life of me figure out why it will not run. I can run each line in the terminal and it returns a result but for some reason when I point it to run, it will do nothing. I get a a syntax error: word unexpected (expecting "do").
#!/bin/bash
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
fsize=$(stat --printf= '%s' "$f");
if [ "$fsize" -eq "40318" ]; then
echo "$(basename $f)" >> results.txt
fi
done
What am I missing???
The problem might be in line endings. Make sure your script file has unix line endings, not the Windows ones.
Also, do not iterate over output of ls. Use globbing right in the shell:
for f in "$file"/*.jpg ; do
Your for loop appears to be missing a list of values to iterate over:
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
Because $image does not end with a /, your ls command expands to
for f in $(ls /Home/Desktop/epubs/images*.jpg); do
which probably results in
for f in ; do
causing the syntax error. The simplest fix is
for f in $(ls "$image"/*.jpg); do
but you should take the advice in the other answers and skip ls:
for f in "$image"/*.jpg; do
Here's how I would do that.
#!/bin/bash -e
image="/Home/Desktop/epubs/images"
(cd "$image"
for f in *.jpg; do
let fsize=$(stat -c %s "$f")
if (( fsize == 40318 )); then
echo "$f"
fi
done) >results.txt
The -e means the script will exit if anything goes wrong (can't cd into the directory, for instance). Saves a lot of error checking when you're happy with that behavior.
The parentheses mean that the cd command is in a subshell; the surrounding script (including the redirection into results.txt) is still in whatever directory you started in.
Now that we're in the directory, we can just look for *.jpg, no directory prefix, and no need to call basename on anything.
Using let and (( == )) treats the size value as a number instead of a string, so we won't get tripped up by any wonkiness in the way stat chooses to format the value.
We just redirect the output the entire loop into the result file instead of appending every time through; it's more efficient. If you have existing contents in results.txt that you want to keep, you can just change the > back to a >>, but leaving it around the whole loop is still more efficient than opening the file and appending to it on every iteration.

Bash Compound Conditional, With Wildcards and File Existence Check

I've mastered the basics of Bash compound conditionals and have read a few different ways to check for file existence of a wildcard file, but this one is eluding me, so I figured I'd ask for help...
I need to:
1.) Check if some file matching a pattern exists
AND
2.) Check that text in a different file exists.
I know there's lots of ways to do this, but I don't really have the knowledge to prioritize them (if you have that knowledge I'd be interested in reading about that as well).
First things that came to mind is to use find for #1 and grep for #2
So something like
if [ `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ] \
&& [ `find -name "jobscript_minim\*cmd\*o\*"` ]; then
echo "Both passed! (1)"
fi
That fails, though curiously:
if `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ;then
echo "Text passed!"
fi
if `find -name "jobscript_minim\*cmd\*o\*"` ;then
echo "File passed!"
fi
both pass...
I've done a bit of reading and have seen people talking about the problem of multiple filenames matching wildcards within an if statement. What's the best solution to this? (in answer my question, I'd assumed you take a crack at that question, as well, in the process)
Any ideas/solutions/suggestions?
Let's tackle why your attempt failed first:
if [ `grep -q …` ];
This runs the grep command between backticks, and interpolates the output inside the conditional command. Since grep -q doesn't produce any output, it's as if you wrote if [ ];
The conditional is supposed to test the return code of grep, not anything about its output. Therefore it should be simply written as
if grep -q …;
The find command returns 0 (i.e. true) even if it finds nothing, so this technique won't work. What will work is testing whether its output is empty, by collecting its output any comparing it to the empty string:
if [ "$(find …)" != "" ];
(An equivalent test is if [ -n "$(find …)" ].)
Notice two things here:
I used $(…) rather than backticks. They're equivalent, except that backticks require strange quoting inside them (especially if you try to nest them), whereas $(…) is simple and reliable. Just use $(…) and forget about backticks (except that you need to write \` inside double quotes).
There are double quotes around $(…). This is really important. Without the quotes, the shell would break the output of the find command into words. If find prints, say, two lines dir/file and dir/otherfile, we want if [ "dir/file dir/otherfile" = "" ]; to be executed, not if [ dir/file dir/otherfile = "" ]; which is a syntax error. This is a general rule of shell programming: always put double quotes around a variable or command substitution. (A variable substitution is $foo or ${foo}; a command substitution is $(command).)
Now let's see your requirements.
Check if some file matching a pattern exists
If you're looking for files in the current directory or in any directory below it recursively, then find -name "PATTERN" is right. However, if the directory tree can get large, it's inefficient, because it can spend a lot of time printing all the matches when we only care about one. An easy optimization is to only retain the first line by piping into head -n 1; find will stop searching once it realizes that head is no longer interested in what it has to say.
if [ "$(find -name "jobscript_minimcmdo" | head -n 1)" != "" ];
(Note that the double quotes already protect the wildcards from expansion.)
If you're only looking for files in the current directory, assuming you have GNU find (which is the case on Linux, Cygwin and Gnuwin32), a simple solution is to tell it not to recurse deeper than the current directory.
if [ "$(find -maxdepth 1 -name "jobscript_minim*cmd*o*")" != "" ];
There are other solutions that are more portable, but they're more complicated to write.
Check that text in a different file exists.
You've already got a correct grep command. Note that if you want to search for a literal string, you should use grep -F; if you're looking for a regexp, grep -E has a saner syntax than plain grep.
Putting it all together:
if grep -q -F "OUTPUT FILE AT STEP 1000" ../log/minimize.log &&
[ "$(find -name "jobscript_minim*cmd*o*")" != "" ]; then
echo "Both passed! (1)"
fi
bash 4
shopt -s globstar
files=$(echo **/jobscript_minim*cmd*o*)
if grep -q "pattern" file && [[ ! -z $files ]];then echo "passed"; fi
for i in filename*; do FOUND=$i;break;done
if [ $FOUND == 'filename*' ]; then
echo “No files found matching wildcard.”
else
echo “Files found matching wildcard.”
fi

Resources