shell script : find a string by searching inside all the files in a folder? [duplicate] - bash

This question already has answers here:
How do I recursively grep all directories and subdirectories?
(26 answers)
Closed 7 years ago.
How do I find a string contained in (possibly multiple) files in a folder including hidden files and subfolders?
I tried this command:
find . -maxdepth 1 -name "tes1t" -print0 | sed 's,\.\/,,g'
But this yielded no results.

grep -Hnr PATTERN . if your grep supports -r (recursive, = -d recurse). Note there would be no limit on recursion depths then.
Or try grep -d skip -Hn PATTERN {,.[!.]}*{,/{,.[!.]}*}; this should work since grep accepts multiple file arguments. Just throw away the -d skip stuff if your version of grep doesn't support it. For shells without the brace expansion, use the manually expanded form * */* */.[!.]* .[!.]* .[!.]*/* .[!.]*/.[!.]*.

First of all, your maxdepth should have been 2 instead of 1, now your find command won't descend in subdirectories. Furthermore you can simply grep for your pattern on the output of find. This can be achieved as follows:
find . -maxdepth 2 -type f -exec grep 'pattern here' '{}' \;
Explanation:
find . execute find in current directory.
-maxdepth 2 descend in subdirectories by no further.
-type f find every file that is not a directory.
-exec grep 'pattern' '{}' execute a grep statement with a certain pattern, the {} contains the filename for each file found.
Add options to grep for color highlighting, outputting line numbers and/or the file name.
For more information see man find and man grep.

Related

Loop through all files in a directory and subdirectories using Bash [duplicate]

This question already has answers here:
How to loop through a directory recursively to delete files with certain extensions
(16 answers)
Closed 4 years ago.
I know how to loop through all the files in a directory, for example:
for i in *
do
<some command>
done
But I would like to go through all the files in a directory, including (particularly!) all the ones in the subdirectories. Is there a simple way of doing this?
The find command is very useful for that kind of thing, provided you don't have white space or other special characters in the file names:
For example:
for i in $(find . -type f -print)
do
stuff
done
The command generates path names relative from the start of the search (the first parameter).
As pointed out, this will fail if your filenames contain spaces or some other characters.
You can also use the -exec option which avoids the problem with spaces in file names. It executes the given command for each file found. The braces are a placeholder for the filename:
find . -type f -exec command {} \;
find and xargs are great tools for recursively processing the contents of directories and sub-directories. For example
find . -type f -print0 | xargs -0 command
will run command on batches of files from the current directory and its sub-directories. The -print0 and -0 arguments avoid the usual problems with filenames that contain spaces, quotes or other metacharacters.
If command just takes one argument, you can limit the number of files passed to it with -L1.
find . -type f -print0 | xargs -0 -L1 command
And as suggested by alexgirao, xargs can also name arguments, using -I, which gives some flexibility if command takes options. -I implies -L1.
find . -type f -print0 | xargs -0 -Iarg command arg --option
recurse() {
path=$1
If [ -d "$path" ] ; then
for i in "$path/"*
do
recurse "$i"
done
elif [ -f "$path" ] ; then
do-something
fi
}
Call recurse and pass first positional parameter as directory path from where you want to start.
Ex: recurse /path

How to find the particular files in a directory in a shell script?

I'm trying to find the particular below files in the directory using find command pattern in shell script .
The below files will create in the directory "/data/output" in the below format every time.
PO_ABCLOAD0626201807383269.txt
PO_DEF 0626201811383639.txt
So I need to find the above txt files starting from "PO_ABCLOAD" and "PO_DEF" is created or not.if not create for four hours then I need to write in logs.
I written script but I am stuck up to find the file "PO_ABCLOAD" and "PO_DEF format text file in the below script.
Please help on this.
What changes i need to add in the find command.
My script is:
file_path=/data/output
PO_count='find ${file_path}/PO/*.txt -mtime +4 -exec ls -ltr {} + | wc -l'
if [ $PO_count == 0 ]
then
find ${file_path}/PO/*.xml -mtime +4 -exec ls -ltr {} + >
/logs/test/PO_list.txt
fi
Thanks in advance
Welcome to the forum. To search for files which match the names you are looking for you could try the -iname or -name predicates. However, there are other issues with your script.
Modification times
Firstly, I think that find's -mtime test works in a different way than you expect. From the manual:
-mtime n
File's data was last modified n*24 hours ago.
So if, for example, you run
find . -mtime +4
you are searching for files which are more than four days old. To search for files that are more than four hours old, I think you need to use the -mmin option instead; this will search for files which were modified a certain number of minutes ago.
Command substitution syntax
Secondly, using ' for command substitution in Bash will not work: you need to use backticks instead - as in
PO_COUNT=`find ...`
instead of
PO_COUNT='find ...'
Alternatively - even better (as codeforester pointed out in a comment) - use $(...) - as in
PO_COUNT=$(find ...)
Redundant options
Thirdly, using -exec ls -ltr {} + is redundant in this context - since all you are doing is determining the number of lines in the output.
So the relevant line in your script might become something like
PO_COUNT=$(find $FILE_PATH/PO/ -mmin +240 -a -name 'PO_*' | wc -l)
or
PO_COUNT=$(find $FILE_PATH/PO/PO_* -mmin +240 | wc -l)
If you wanted tighter matching of filenames, try (as per codeforester's suggestion) something like
PO_COUNT=$(find $file_path/PO/PO_* -mmin +240 -a \( -name 'PO_DEF*' -o -name 'PO_ABCLOAD*' \) | wc -l)
Alternative file-name matching in Bash
One last thing ...
If using bash, you can use brace expansion to match filenames, as in
PO_COUNT=$(find $file_path/PO/PO_{ABCLOAD,DEF}* -mmin +240 | wc -l)
Although this is slightly more concise, I don't think it is compatible with all shells.

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

grep returns "Too many argument specified on command" [duplicate]

This question already has answers here:
Argument list too long error for rm, cp, mv commands
(31 answers)
Closed 7 years ago.
I am trying to list all files we received in one month
The filename pattern will be
20110101000000.txt
YYYYMMDDHHIISS.txt
The entire directory is having millions of files.
For one month there can be minimum 50000 files.
Idea of sub directory is still pending.
Is there any way to list huge number of files with file name almost similar.
grep -l 20110101*
Am trying this and returning error.
I try php it took a huge time , thats why i use shell script . I dont understand why shell also not giving a result
Any suggestion please!!
$ find ./ -name '20110101*' -print0 -type f | xargs -0 grep -l "search_pattern"
you can use find and xargs. xargs will run grep for each file found by find. You can use -P to run multiple grep's parallely and -n for multiple files per grep command invocation. The print0 argument in find separates each filename with a null character to avoid confusion caused by any spaces in the file name. If you are sure there will not be any spaces you can remove -print0 and -0 args.
This should be the faster way:
find . -name "20110101*" -exec grep -l "search_pattern" {} +
Should you want to avoid the leading dot:
find . -name "20110101*" -exec grep -l "search_pattern" {} + | sed 's/^.\///'
or better thanks to adl:
find . -name "20110101*" -exec grep -l "search_pattern" {} + | cut -c3-
The 20110101* is getting expanded by your shell before getting passed to the command, so you're getting one argument passed for every file in the dir that starts with 20110101.
If you just want a list of matching files you can use find:
find . -name "20110101*"
(note that this will search every subdirectory also)
Some in depth information available here and also another work-around: for FILE in 20110101*; do grep foo ${FILE}; done. Most people will go with xargs and more seasoned admins with -exec {} + which accomplishes exactly the same, except is shorter to type. One would use the inline shell for construct, when running more processes is less important then seeing the results. With the for construct you may end up running grep thousands of times, but you see each match in real time, while using find and/or xargs you see batched results, however grep is run significantly less.
you need to put in a search term, so
grep -l "search term" 20110101*
if you want to just find the files, use ls 20110101*
Just pipe the output of ls to grep: ls | grep '^20110101'

using output of grep command to find command

I have a problem related to searching a pattern among several files.
I want to search "Logger." pattern in jsp files,so i used the command
grep -ir Logger. * | find . -name *.jsp
Now the problem i am facing is that this command is listing all the jsp files and its not searching the pattern "Logger." in jsp files and listing them.
I just want the jsp files in which "Logger." instance is present.
start like this
you want to search in jsp files.
find . -name "*.jsp"
the above will output all the jsp files recursively from current directory. like below
1/2/ahbd.jsp
befwej/dg/wefwefw/wefwefwe/ijn.jsp
And now you want to find the string in just these files.
grep -ir Logger. (output of find)
so the actual complete command becomes:
find . -name "*.jsp"|xargs grep -ir 'Logger.'
magic here is done by xargs
it gives the output of find as an input for grep line by line.
if you remove xargs,then only the first line that is 1/2/ahbd.jsp will be searched for the string.
there are several other ways to do this.But i feel more comfortable using this regularly
To recursively find all *.jsp files containing the string Logger. you can do:
find . -type f -name '*.jsp' -exec grep -l "Logger\." {} \;
grep -l means to print only the file name if the file contains the string.
The -exec switch of find will execute the given command for each file matching the other criteria (-type f and -name '*.jsp'). The string {} is substituted by the filename. Some versions of find also support + instead of {} to feed several file names to the command (like xargs does) and not only one at once, e.g.:
find . -type f -name '*.jsp' -exec grep -l "Logger\." + \;
You can just use grep for that, here's a command that should give you the results:
grep -ir "Logger\." * | grep ".jsp"
Problem is, grep will bail when you use ".jsp" instead or "" if you don't have at least one .jsp file into your root directory. So we have to tell him to look every file.
Since you give grep the -r (recursive) argument, it will walk the subdirectories to find the pattern "Logger.", then the second grep will only display the .jsp files. Note that -i tells grep not to care about the letter case, which is may be not what you want.
edit: following John's answer: we have to escape the . to prevent it to be taken as a regexp.
re-edit: actually, I think that using find is better, since it will filter the jsp files directly instead of grepping all the files:
find . -name "*.jsp" -exec grep -i "Logger\." {} \;
(you don't need the -r anymore since find takes care of recursion.
If you have bash 4+
shopt -s globstar
shopt -s nullglob
for file in **/*.jsp
do
if grep -q "Logger." "$file" ;then
echo "found in $file"
fi
# or just grep -l "Logger." "$file"
done

Resources