using output of grep command to find command - shell

I have a problem related to searching a pattern among several files.
I want to search "Logger." pattern in jsp files,so i used the command
grep -ir Logger. * | find . -name *.jsp
Now the problem i am facing is that this command is listing all the jsp files and its not searching the pattern "Logger." in jsp files and listing them.
I just want the jsp files in which "Logger." instance is present.

start like this
you want to search in jsp files.
find . -name "*.jsp"
the above will output all the jsp files recursively from current directory. like below
1/2/ahbd.jsp
befwej/dg/wefwefw/wefwefwe/ijn.jsp
And now you want to find the string in just these files.
grep -ir Logger. (output of find)
so the actual complete command becomes:
find . -name "*.jsp"|xargs grep -ir 'Logger.'
magic here is done by xargs
it gives the output of find as an input for grep line by line.
if you remove xargs,then only the first line that is 1/2/ahbd.jsp will be searched for the string.
there are several other ways to do this.But i feel more comfortable using this regularly

To recursively find all *.jsp files containing the string Logger. you can do:
find . -type f -name '*.jsp' -exec grep -l "Logger\." {} \;
grep -l means to print only the file name if the file contains the string.
The -exec switch of find will execute the given command for each file matching the other criteria (-type f and -name '*.jsp'). The string {} is substituted by the filename. Some versions of find also support + instead of {} to feed several file names to the command (like xargs does) and not only one at once, e.g.:
find . -type f -name '*.jsp' -exec grep -l "Logger\." + \;

You can just use grep for that, here's a command that should give you the results:
grep -ir "Logger\." * | grep ".jsp"
Problem is, grep will bail when you use ".jsp" instead or "" if you don't have at least one .jsp file into your root directory. So we have to tell him to look every file.
Since you give grep the -r (recursive) argument, it will walk the subdirectories to find the pattern "Logger.", then the second grep will only display the .jsp files. Note that -i tells grep not to care about the letter case, which is may be not what you want.
edit: following John's answer: we have to escape the . to prevent it to be taken as a regexp.
re-edit: actually, I think that using find is better, since it will filter the jsp files directly instead of grepping all the files:
find . -name "*.jsp" -exec grep -i "Logger\." {} \;
(you don't need the -r anymore since find takes care of recursion.

If you have bash 4+
shopt -s globstar
shopt -s nullglob
for file in **/*.jsp
do
if grep -q "Logger." "$file" ;then
echo "found in $file"
fi
# or just grep -l "Logger." "$file"
done

Related

Get all occurrences of a string within a directory(including subdirectories) in .gz file using bash?

I want to find all the occurrences of "getId" inside a directory which has subdirectories as follows:
*/*/*/*/*/*/myfile.gz
i tried thisfind -name *myfile.gz -print0 | xargs -0 zgrep -i "getId" but it didn't work. Can anyone tell me the best and simplest approach to get this?
find ./ -name '*gz' -exec zgrep -aiH 'getSorById' {} \;
find allows you to execute a command on the file using "-exe" and it replaces "{}" with the file name, you terminate the command with "\;"
I added "-H" to zgrep so it also prints out the file path when it has a match, as its helpful. "-a" treats binary files as text (since you might get tar-ed gzipped files)
Lastly, its best to quote your strings in case bash starts globbing them.
https://linux.die.net/man/1/grep
https://linux.die.net/man/1/find
Use the following find approach:
find . -name *myfile.gz -exec zgrep -ai 'getSORByID' {} \;
This will print all possible lines containing getSORByID substring

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Copying list of files to a directory

I want to make a search for all .fits files that contain a certain text in their name and then copy them to a directory.
I can use a command called fetchKeys to list the files that contain say 'foo'
The command looks like this : fetchKeys -t 'foo' -F | grep .fits
This returns a list of .fits files that contain 'foo'. Great! Now I want to copy all of these to a directory /path/to/dir. There are too many files to do individually , I need to copy them all using one command.
I'm thinking something like:
fetchKeys -t 'foo' -F | grep .fits > /path/to/dir
or
cp fetchKeys -t 'foo' -F | grep .fits /path/to/dir
but of course neither of these works. Any other ideas?
If this is on Linux/Unix, can you use the find command? That seems very much like fetchkeys.
$ find . -name "*foo*.fit" -type f -print0 | while read -r -d $'\0' file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
The find command will find all files that match *foo*.fits in their name. The -type f says they have to be files and not directories. The -print0 means print out the files found, but separate them with the NUL character. Normally, the find command will simply return a file on each line, but what if the file name contains spaces, tabs, new lines, or even other strange characters?
The -print0 will separate out files with nulls (\0), and the read -d $'\0' file means to read in each file separating by these null characters. If your files don't contain whitespace or strange characters, you could do this:
$ find . -name "*foo*.fit" -type f | while read file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
Basically, you read each file found with your find command into the shell variable file. Then, you can use that to copy that file into your $fits_dir or where ever you want.
Again, maybe there's a reason to use fetchKeys, and it is possible to replace that find with fetchKeys, but I don't know that fetchKeys command.
Copy all files with the name containing foo to a certain directory:
find . -name "*foo*.fit" -type f -exec cp {} "/path/to/dir/" \;
Copy all files themselves containing foo to a certain directory (solution without xargs):
for f in `find . -type f -exec grep -l foo {} \;`; do cp "$f" /path/to/dir/; done
The find command has very useful arguments -exec, -print, -delete. They are very robust and eliminate the need to manually process the file names. The syntax for -exec is: -exec (what to do) \;. The name of the file currently processed will be substituted instead of the placeholder {}.
Other commands that are very useful for such tasks are sed and awk.
The xargs tool can execute a command for every line what it gets from stdin. This time, we execute a cp command:
fetchkeys -t 'foo' -F | grep .fits | xargs -P 1 -n 500 --replace='{}' cp -vfa '{}' /path/to/dir
xargs is a very useful tool, although its parametrization is not really trivial. This command reads in 500 .fits files, and calls a single cp command for every group. I didn't tested it to deep, if it doesn't go, I'm waiting your comment.

Removing files with a double quote in their name

I am trying to remove files within a directory. Some of the files have double-quotes around their name while others do not. An example of these files would be:
"DDD344".csv
D2DW.csv
Both these files are located in sub-directories within the directory YM.
To find such files and remove them, I invoke find like so:
find YM -name "*.csv" -print | xargs rm
The above command results in a lot of No such file or directory errors.
I tried using sed in the following way:
find yum/yum_hyd -name "\"*\".csv" | sed 's/"/\"/g' | xargs rm
but to no avail. How do I remove the files?
The problem is that you're using xargs. xargs is a horribly broken program that should never be used for anything except in conjunction with the nonstandard -0 option. Even so, I can't think of any advantages to doing that in this case. You should just execute rm directly from find.
find . -type f -name '"*".csv' -exec rm -f -- {} +
Will work. If you have GNU find, you may also use -delete.
try this:
find yum/yum_hyd -name "\"*\".csv" |sed 's/"/\\"/g'|xargs rm
explanation:
you want to replace " with \". but if you write \" directly, sed considers it as plain ", you have to escape the backslash. so \\" works.
I wasn't aware of this option until recently but you can list the inode of the file in the following way:
$ ls –il
In the output you will see that the first column contains the inode value. You can then use that value to find -inum the offending files and remove them.
Output
2616366 -rw-r--r-- 1 etc etc
$ find . -inum 2616366 -exec rm -f {} \;
This will remove the file with that specific inum.
As a test you can run the following to locate your files.
ls -il \"* | awk '{print $1}' | xargs -n1 -I {} find -inum {}
Replace the final portion of this command (the "find -inum {}") with the "rm" command once you are satisfied.
This is also similar to the question on SuperUser

How do I grab the filename of the file containing a certain string when there are hundreds of files?

I have a folder with 200 files in it. We can say that the files are named "abc0" to "abc199". Five of these files contain the string "ez123" but I don't know which ones. My current attempt to find the file names of the files that contain the string is:
#!/bin/sh
while read FILES
do
cat $FILES | egrep "ez123"
done
I have a file that contains the filenames of all files in the directory. So I then execute:
./script < filenames
This is verifies for me that the files containing the string exist but I still don't have the name of the files. Are there any ideas concerning the best way to accomplish this?
Thanks
you can try
grep -l "ez123" abc*
find /directory -maxdepth 1 -type f -exec fgrep -l 'ez123' \{\} \;
(-maxdepth 1 is only necessary if you only want to search the directory and not the tree recursively (if there's any)).
fgrep is a bit faster than grep. -l lists the matched filenames only.
Try
find -type f -exec grep -qs "ez123" {} \; -print
This will use find to find all real files in current directory (and subdirectories), execute grep on them ({} will be replaced by file name, -qs tells it to be silent and just set an exit code), -print will print out the names of the files that grep found a matching line in.
What about:
xargs egrep -l ez123
That reads filenames from stdin and prints out the filenames with matches.

Resources