Grep through the results of a 'find' command - bash

I am trying to do a simple search through files.
Find all files that match a name pattern
Grep through results of step 1 and find only files whose contents have a specific string
I tried,
find . -name rio.yml -exec grep "my pattern" \;
Whats best practice for something like this.

If you just want the paths that contain the match, do:
find . -name rio.yml -type f -exec grep -q "my pattern" {} \; -print
(Given that you're already filtering on the name, the -type f may be redundant, but I find it helpful when grepping.) You can use grep -l, but it's often convenient to build a pipeline to xargs with -print0, so this is a good pattern.

To get the filename which contains some string, you need to use grep -l
find . -name rio.yml -exec grep -l "my pattern" {} \;
To get full path of the files; you can use $(pwd) in place of search directory.

Related

Script to find recursively the number of files with a certain extension

We have a highly nested directory structure, where we have a directory, let's call it 'my Dir', appearing many times in our hierarchy. I am interested in counting the number of "*.csv" files in all directories named 'my Dir' (yes, there is a whitespace in the name). How can I go about it?
I tried something like this, but it does not work:
find . -type d -name "my Dir" -exec ls "{}/*.csv" \; | wc -l
If you want to the number of files matching the pattern '*.csv' under "my Dir", then:
don't ask for -type d; ask for -type f
don't ask for -name "my Dir" if you really want -name '*.csv'
don't try to ls *.csv on each match, because if there's more N csv files in a directory, you would potentially count each one N times
also beware of embedding {} in -exec code!
For counting files from find, I like to use a trick I learned from Stéphane Chazelas on U&L; for example, from: Counting files in Linux:
find "my Dir" -type f -name '*.csv' -printf . | wc -c
This requires GNU find, as -printf is a GNU extension to the POSIX standard.
It works by looking within "my Dir" (from the current working directory) for files that match the pattern; for each matching file, it prints a single dot (period); that's all piped to wc who counts the number of characters (periods) that find produced -- the number of matching files.
You would exclude all pathcs that are not My Dir:
find . -type f -not '(' -not -path '*/my Dir/*' -prune ')' -name '*.csv'
Another solution is to use the -path predicate to select your files.
find . -path '*/my Dir/*.csv'
Counting the number of occurrences could be a simple matter of piping to wc -l, though this will obviously produce the wrong result if some of the files contain newlines in their names. (This is slightly pathological, but definitely something you want to cover in production code.) A common arrangement is to just print a newline for every found file, instead of its name.
find . -path '*/my Dir/*.csv' -printf '.\n' | wc -l
(The -printf predicate is not in POSIX but it's not hard to replace with an -exec or similar.)

Finding all PHP files within a certain directory containing a string

Im wondering if someone can help me out.
Im currently using the following to find all PHP files in a certain directory
find /home/mywebsite -type f -name "*.php"
How would i extend that to search through those PHP files and get all files with the string base64_decode?
Any help would be great.
Cheers,
find /home/mywebsite -type f -name '*.php' -exec grep -l base64_decode {} +
The -exec option to find executes a command on the files found. {} is replaced by the filename, and the + means that it should keep repeating this for all the filenames. grep looks for a string in the file, and the -l option tells it to print just the filename when there's a match, not all the matching lines.
If you're getting an error from find, you may have an old version that doesn't support the + feature of -exec. Use this command instead:
find /home/mywebsite -type f -name '*.php' | xargs grep -l base64_decode
xargs reads its standard input and turns them into arguments for the command line in its arguments.

Grepping from a text file list

I know I can find specific types of files and then grep them in one shot, i.e.
find . -type f -name "*.log" -exec grep -o "some-pattern" {} \;
But I need to do this in two steps. This is because the find operation is expensive (there are lots of files and subdirectories to search). I'd like to save down the file-list to a text file, and then repeatedly grep for different patterns on this precomputed set of files whenever I need to. The first part is easy:
find . -type f -name "*.log" > my-file-list.txt
Now I have a file that looks like this:
./logs/log1.log
./logs/log2.log
etc
What does the grep look like? I've tried a few combinations but can't get it right.
xargs grep "your pattern" < my-file-list.txt

How to return the absolute path of recursively matched arguments? (BASH)

OK, so simple enough.. I want to recursively search a directory for files with a specific extension - and then perform an action on those files.
# pwdENTER
/dir
# ls -R | grep .txt | xargs -I {} open {} ENTER
The file /dir/reallyinsubfolder.txt does not exist. ⬅ fails (bad)
Not output, but succeeds.. /dir/fileinthisfolder.txt ⬅ opens silently (good)
This does find ALL the files I am interested in… but only OPEN's those which happen to be "1-level" deep. In this case, the attempt to open /dir/reallyinsubfolder.txt fails, as reallyinsubfolder.txt is actually /dir/sub/reallyinsubfolder.txt.
I understand that grep is simply returning the matched filename… which then chokes (in this case), the open command, as it fails to reach down to the correct sub-directory to execute the file..
How do I get grep to return the full path of a match?
How about using the find command -
find /path/to/dir -type f -iname "*.txt" -exec action to perform {} \;
find . -name *.txt -exec open {};
(Decorate with backslashes of your needing)
I believe you're asking the wrong question; parsing ls(1) output in this fashion is far more trouble than it is worth.
What would work far more reliably:
find /dir -name '*.txt' -print0 | xargs -0 open
or
find /dir -name '*.txt' -exec open {} \;
find(1) does not mangle names nearly as much as ls(1) and makes executing programs on matched files far more reliable.

Find files containing a given text

In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string "document.cookie" | "setcookie"
How would I do that?
egrep -ir --include=*.{php,html,js} "(document.cookie|setcookie)" .
The r flag means to search recursively (search subdirectories). The i flag means case insensitive.
If you just want file names add the l (lowercase L) flag:
egrep -lir --include=*.{php,html,js} "(document.cookie|setcookie)" .
Try something like grep -r -n -i --include="*.html *.php *.js" searchstrinhere .
the -i makes it case insensitlve
the . at the end means you want to start from your current directory, this could be substituted with any directory.
the -r means do this recursively, right down the directory tree
the -n prints the line number for matches.
the --include lets you add file names, extensions. Wildcards accepted
For more info see: http://www.gnu.org/software/grep/
find them and grep for the string:
This will find all files of your 3 types in /starting/path and grep for the regular expression '(document\.cookie|setcookie)'. Split over 2 lines with the backslash just for readability...
find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \
xargs egrep -i '(document\.cookie|setcookie)'
Sounds like a perfect job for grep or perhaps ack
Or this wonderful construction:
find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null {} \;
find . -type f -name '*php' -o -name '*js' -o -name '*html' |\
xargs grep -liE 'document\.cookie|setcookie'
Just to include one more alternative, you could also use this:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \;
Where:
-regextype posix-extended tells find what kind of regex to expect
-regex "^.*\.(php|html|js)$" tells find the regex itself filenames must match
-exec grep -EH '(document\.cookie|setcookie)' {} \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where {} represents where the file path goes in this command.
while
E option tells grep to use extended regex (to support the parentheses) and...
H option tells grep to print file paths before the matches.
And, given this, if you only want file paths, you may use:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \; | sed -r 's/(^.*):.*$/\1/' | sort -u
Where
| [pipe] send the output of find to the next command after this (which is sed, then sort)
r option tells sed to use extended regex.
s/HI/BYE/ tells sed to replace every First occurrence (per line) of "HI" with "BYE" and...
s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by ()] including everything [.* = one or more of any-character] from the beginning of the line [^] till' the first ':' followed by anything till' the end of line [$]) by the first group [\1] of the replaced regex.
u tells sort to remove duplicate entries (take sort -u as optional).
...FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).

Resources