Find files containing a given text - bash

In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string "document.cookie" | "setcookie"
How would I do that?

egrep -ir --include=*.{php,html,js} "(document.cookie|setcookie)" .
The r flag means to search recursively (search subdirectories). The i flag means case insensitive.
If you just want file names add the l (lowercase L) flag:
egrep -lir --include=*.{php,html,js} "(document.cookie|setcookie)" .

Try something like grep -r -n -i --include="*.html *.php *.js" searchstrinhere .
the -i makes it case insensitlve
the . at the end means you want to start from your current directory, this could be substituted with any directory.
the -r means do this recursively, right down the directory tree
the -n prints the line number for matches.
the --include lets you add file names, extensions. Wildcards accepted
For more info see: http://www.gnu.org/software/grep/

find them and grep for the string:
This will find all files of your 3 types in /starting/path and grep for the regular expression '(document\.cookie|setcookie)'. Split over 2 lines with the backslash just for readability...
find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \
xargs egrep -i '(document\.cookie|setcookie)'

Sounds like a perfect job for grep or perhaps ack
Or this wonderful construction:
find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null {} \;

find . -type f -name '*php' -o -name '*js' -o -name '*html' |\
xargs grep -liE 'document\.cookie|setcookie'

Just to include one more alternative, you could also use this:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \;
Where:
-regextype posix-extended tells find what kind of regex to expect
-regex "^.*\.(php|html|js)$" tells find the regex itself filenames must match
-exec grep -EH '(document\.cookie|setcookie)' {} \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where {} represents where the file path goes in this command.
while
E option tells grep to use extended regex (to support the parentheses) and...
H option tells grep to print file paths before the matches.
And, given this, if you only want file paths, you may use:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \; | sed -r 's/(^.*):.*$/\1/' | sort -u
Where
| [pipe] send the output of find to the next command after this (which is sed, then sort)
r option tells sed to use extended regex.
s/HI/BYE/ tells sed to replace every First occurrence (per line) of "HI" with "BYE" and...
s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by ()] including everything [.* = one or more of any-character] from the beginning of the line [^] till' the first ':' followed by anything till' the end of line [$]) by the first group [\1] of the replaced regex.
u tells sort to remove duplicate entries (take sort -u as optional).
...FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).

Related

Bash script to return all elements given an extension, without using print flags

I want to create shell script that search inside all folders of the actual directory and return all files that satisfy some condition, but without using any print flag.
(Here the condition is to end with .py)
What I have done:
find . -name '*.py'| sed -n 's/\.py$//p'
The output:
./123
./test
./abc/dfe/test3
./testing
./test2
What I would like to achieve:
123
test
test3
testing
test2
Use -exec:
find . -name '*.py' -exec sh -c 'for f; do f=${f%.py}; echo "${f##*/}"; done' sh {} +
If GNU basename is an option, you can simplify this to
find . -name '*.py' -exec basename -s .py {} +
POSIX basename is a little more expensive, as you'll have to call it on every file individually:
find . -name '*.py' -exec basename {} .py \;
Using GNU grep instead of sed:
find . -name '*.py' | grep -oP '[^/]+(?=\.py$)'
If portability is not a concern, this is a very readable option:
find . -name '*.py' | xargs basename -a
This is also differentiated from chepner's answer in that it retains the .py file ending in the output.
I'm not familiar with the -exec flag, and I'm sure his one-liners can be customized to do the same, but I couldn't do so off the top of my head.
Chepner's version achieves the same with the small modification:
find . -name '*.py' -exec basename {} \;
if you want the literal output from find and didn't intend to drop the file endings when you used dummy variables (123,test, etc.) in your question.
find shows entries relative to where you ask it to search, you can simply replace the . with a *:
find * -name '*.py'| sed -n 's/\.py$//p'
(Be aware that this skips top level hidden directories)
This might work for you (GNU parallel):
find . -name '*.py*' 2>/dev/null | parallel echo "{/.}"

Shell command to list file names where the matching is found

I am trying to search through a list of binary files to find some keywords on Mac.
The following works to list out all the matches, but it doesn't show me the list of files where it is being found:
find . -type f -exec strings {} \;|grep "Bv9qtsZRgspQliITY4"
Is there any trick to do this?
Using -exec with a wee ‘script’:
find . -type f \
-exec sh -c 'strings "$1" | grep -q "Bv9qtsZRgspQliITY4"' -- {} \; \
-print
The above will print the paths of all the matching files. If you also want to print the matches you can use:
find . -type f \
-exec sh -c 'strings "$1" | grep "Bv9qtsZRgspQliITY4"' -- {} \; \
-print
This will, however, print the paths after the matches. If this is not desirable, then you can use:
find . -type f \
-print \
-exec sh -c 'strings "$1" | grep "Bv9qtsZRgspQliITY4"' -- {} \;
This, on the other hand, will print all paths, even non-matching ones. To print only matching paths and their matches:
find . -type f \
-exec sh -c 'strings "$1" | grep -q "Bv9qtsZRgspQliITY4"' -- {} \; \
-print \
-exec grep "Bv9qtsZRgspQliITY4" {} \;
This will run grep twice on matching files, which will make it slower. If this is a problem the matches can be stored in a variable, and if there are any the path printed first and then the matches. This is left as an exercise to the reader.*
* Let me know if I should post it here.
Try grep -rl "Bv9qtsZRgspQliITY4" ..
Explanation of options:
-r: search recursively
-l: don't print the contents of the file, just print the filename.
Optionally, you might want to use -i to search case-insensitively.
The problem with your idea is that you're piping the output of strings into grep. The filename is only passed to strings, meaning that nothing that comes after strings knows the filename.
I'm not quite sure about portability, but if you are using GNU's version of grep, then you can use --files-with-matches
-l, --files-with-matches print only names of FILEs containing matches
Then you can use something like this:
grep --recursive --files-with-matches "Bv9qtsZRgspQliITY4" *
Well, if it's only to print names of files don't use find but grep.
grep -ar . -e 'soloman' ./testo.txt:1:soloman
-a : Search in binary files
-r : recursive
And keep it simple.
If you don't want to see the words matched in your output simply add -l, --files-with-matches:
user#DESKTOP-RR909JI ~/projects/search
$ grep -arl . -e 'soloman'
./testo.txt
You can use
# this will list all the files containing given text in current directory
# i to ignore case
# l to list files with matches
# R read and process all files in that directory, recursively, following all symbolic links
grep -iRl "your-text-to-find" ./
# for case sensitive search
grep -Rl "your-text-to-find" ./

issue with piping find into sed (find and replace)

Here is my current code, my goal is to find every file in a given directory (recursively) and replace "FIND" with "REPLACEWITH" and overwrite the files.
FIND='ALEX'
REPLACEWITH='<strong>ALEX</strong>'
DIRECTORY='/some/directory/'
find $DIRECTORY -type f -name "*.html" -print0 |
LANG=C xargs -0 sed -i "s|$FIND|$REPLACEWITH|g"
The error I am getting is:
sed: 1: "/some/directory ...": command a expects \ followed by text
As given in BashFAQ #21, you can use perl to perform search-and-replace operations with no potential for data being treated as code:
in="$FIND" out="$REPLACEWITH" find "$DIRECTORY" -type f -name '*.html' \
-exec perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' '{}' +
If you want to include only files matching the FIND string, find can be told to only pass files which grep flags on to perl:
in="$FIND" out="$REPLACEWITH" find "$DIRECTORY" -type f -name '*.html' \
-exec grep -F -q -e "$FIND" '{}' ';' \
-exec perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' '{}' +
Because grep is being used to evaluate individual files, it's necessary to use one grep call per file so its exit status can be evaluated on a per-file basis; thus, the use of the less efficient -exec ... {} ';' action. For perl, it's possible to put multiple files to process on one command, hence the use of -exec ... {} +.
Note that fgrep is line-oriented; if your FIND string contains multiple lines, then files with any one of those lines will be passed to perl for replacements.
You can have find invoke sed directly although I think all the modification times on your files will be affected (which might matter or not):
find $DIRECTORY -type f -name "*.html" -exec sed -i "s|$FIND|$REPLACEWITH|g" '{}' ';'

Awk/Sed: How to do a recursive find/replace of a string in files with a certain file extension?

I need to recursively find and replace a string in my .cpp and .hpp files.
Looking at an answer to this question I've found the following command:
find /home/www -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'
Changing it to include my file type did not work - did not changed any single word:
find /myprojects -type f -name *.cpp -print0 | xargs -0 sed -i 's/previousword/newword/g'
Help appreciated.
Don't bother with xargs; use the -exec primary. (Split across two lines for readability.)
find /home/www -type f -name '*.cpp' \
-exec sed -i 's/previousword/newword/g' '{}' \;
chepner's helpful answer proposes the simpler and more efficient use of find's -exec action instead of piping to xargs.
Unless special xargs features are needed, this change is always worth making, and maps to xargs features as follows:
find ... -exec ... {} \; is equivalent to find ... -print0 | xargs -0 -n 1 ...
find ... -exec ... {} + is equivalent to find ... -print0 | xargs -0 ...
In other words:
the \; terminator invokes the target command once for each matching file/folder.
the + terminator invokes the target command once overall, supplying all matching file/folder paths as a single list of arguments.
Multiple calls happen only if the resulting command line becomes too long, which is rare, especially on Linux, where getconf ARG_MAX, the max. command-line length, is large.
Troubleshooting the OP's command:
Since the OP's xargs command passes all matching file paths at once - and per xargs defaults at the end of the command line, the resulting command will effectively look something like this:
sed -i 's/previousword/newword/g' /myprojects/file1.cpp /myprojects/file2.cpp ...
This can easily be verified by prepending echo to sed - though (conceptual) quoting of arguments that need it (paths with, e.g., embedded spaces) will not show (note the echo):
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 echo sed -i 's/previousword/newword/g'
Next, after running the actual command, check whether the last-modified date of the files has changed using stat:
If they have, yet the contents haven't changed, the implication is that sed has processed the files, but the regex in the s function call didn't match anything.
It is conceivable that older GNU sed versions don't work properly when combining -i (in-place editing) with multiple file operands (though I couldn't find anything in the GNU sed release notes).
To rule that out, invoke sed once for each file:
If you still want to use xargs, add -n 1:
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 -n 1 sed -i 's/previousword/newword/g'
To use find's -exec action, see chepner's answer.
With a GNU sed version that does support updating of multiple files with the -i option - which is the case as of at least v4.2.2 - the best formulation of your command is (note the quoted *.cpp argument to prevent premature expansion by the shell, and the use of terminator + to only invoke sed once):
find /myprojects -type f -name '*.cpp' -exec sed -i 's/previousword/newword/g' '{}' +

How do I write a bash alias/function to grep all files in all subdirectories for a string?

I've been using the following command to grep for a string in all the python source files in and below my current directory:
find . -name '*.py' -exec grep -nHr <string> {} \;
I'd like to simplify things so that I can just type something like
findpy <string>
And get the exact same result. Aliases don't seem sufficient since they only do a string expansion, and the argument I need to specify is not the last argument. It sounds like functions are suitable for the task, so I have several questions:
How do I write it?
Where do I put it?
If you don't want to create an entire script for this, you can do it with just a shell function:
findpy() { find . -name '*.py' -exec grep -nHr "$1" {} \; ; }
...but then you may have to define it in both ~/.bashrc and ~/.bash_profile, so it gets defined for both login and interactive shells (see the INVOCATION section of bash's man page).
All the "find ... -exec" solutions above are OK in the sense that they work, but they are horribly inefficient and will be extremely slow for large trees. The reason is that they launch a new process for every single *.py file. Instead, use xargs(1), and run grep only on files (not directories):
#! /bin/sh
find . -name \*.py -type f | xargs grep -nHr "$1"
For example:
$ time sh -c 'find . -name \*.cpp -type f -exec grep foo {} \; >/dev/null'
real 0m3.747s
$ time sh -c 'find . -name \*.cpp -type f | xargs grep foo >/dev/null'
real 0m0.278s
On a side note, you should take a look at Ack for what you are doing. It is designed as a replacement for Grep written in Perl. Filtering files based on the target language or ignoring .svn directories and the like.
Example (snippet from Trac source):
$ ack --python foo ./mysource
ticket/tests/wikisyntax.py
139:milestone:foo
144:<a class="missing milestone" href="/milestone/foo" rel="nofollow">milestone:foo</a>
ticket/tests/conversion.py
34: ticket['foo'] = 'This is a custom field'
ticket/query.py
239: count_sql = 'SELECT COUNT(*) FROM (' + sql + ') AS foo'
I wanted something similar, and the answer by Idelic reminded of one of the nice features of xargs: that it puts the command at the end. You see, my problem was that I wanted to write a shell alias that would "accept parameters" (really, that it would expand in such a way to allow me to pass parameter so grep).
Here's what I added to my bash_aliases:
alias findpy="find . -type f -name '*.py' | xargs grep"
This way, I could write findpy WORD or findpy -e REGEX or findpy -il WORD - the point being that could use any grep command-line option.
Put the following three lines in a file named findpy
#!/bin/bash
find . -name '*.py' -exec grep -nHr $1 {} \;
Then say
chmod u+x findpy
I normally have a directory called bin in my home directory where I put little shell scripts like this. Make sure to add the directory to your PATH.
The script:
#!/bin/bash
find . -name '*.py' -exec grep -nHr "$1" {} ';'
is how I'd do it.
You write it with an editor like vim and put it somewhere on your path. My normal approach is to have a ~/bin directory and make sure my .profile file (or equivalent) contains:
PATH=$PATH:~/bin
Many versions of grep have options to do recursion, specify filename pattern, etc.
grep --perl-regexp --recursive --include='*.py' --regexp="$1" .
This recurses starting from the current directory (.), looks only at files ending in 'py', uses Perl-style regular expressions.
If your version of grep doesn't support --recursive and --include, then you can still use find and xargs, but be sure to allow for pathnames with embedded spaces by using the -print0 argument to find and the --null option to xargs to handle that.
find . -type f -name '*.py' -print0 | xargs --null grep "$1"
should work.
Add the following line to your ~/.bashrc or ~/.bash_profile or ~/.profile
alias findpy='find . -type f -name "*.py" -print0 | xargs -0 grep'
then you can use it like this
findpy def
or with grep options
findpy -i class
the following alias will ignore the version control meta-directory of git and svn
alias findpy='find . -type f -not -path "*/.git/*" -a -not -path "*/.svn/*" -name "*.py" -print0 | xargs -0 grep'
#######################################################################################
#
# Function to search all files (including sub-directories) that match a given file
# extension ($2) looking for an indicated string ($1) - in a case insensitive manner.
#
# For Example:
#
# -> findfile AllowNegativePayments cpp
#
#
#######################################################################################
findfile ()
{
find . -iname "*.$2*" -type f -print0 | xargs -0 grep -i "$1" {} \; 2> /dev/nul
}
alias _ff='findfile'

Resources