issue with piping find into sed (find and replace) - bash

Here is my current code, my goal is to find every file in a given directory (recursively) and replace "FIND" with "REPLACEWITH" and overwrite the files.
FIND='ALEX'
REPLACEWITH='<strong>ALEX</strong>'
DIRECTORY='/some/directory/'
find $DIRECTORY -type f -name "*.html" -print0 |
LANG=C xargs -0 sed -i "s|$FIND|$REPLACEWITH|g"
The error I am getting is:
sed: 1: "/some/directory ...": command a expects \ followed by text

As given in BashFAQ #21, you can use perl to perform search-and-replace operations with no potential for data being treated as code:
in="$FIND" out="$REPLACEWITH" find "$DIRECTORY" -type f -name '*.html' \
-exec perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' '{}' +
If you want to include only files matching the FIND string, find can be told to only pass files which grep flags on to perl:
in="$FIND" out="$REPLACEWITH" find "$DIRECTORY" -type f -name '*.html' \
-exec grep -F -q -e "$FIND" '{}' ';' \
-exec perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' '{}' +
Because grep is being used to evaluate individual files, it's necessary to use one grep call per file so its exit status can be evaluated on a per-file basis; thus, the use of the less efficient -exec ... {} ';' action. For perl, it's possible to put multiple files to process on one command, hence the use of -exec ... {} +.
Note that fgrep is line-oriented; if your FIND string contains multiple lines, then files with any one of those lines will be passed to perl for replacements.

You can have find invoke sed directly although I think all the modification times on your files will be affected (which might matter or not):
find $DIRECTORY -type f -name "*.html" -exec sed -i "s|$FIND|$REPLACEWITH|g" '{}' ';'

Related

Is there way to use If condition inside a find command with option exec?

scenario: There are multiple files in an folder, I'm trying to find specific set of files and if a given file has specific info then I need to grep the information.
Ex:
find /abc/test \( -type f -name 'tst*.txt' -mtime -1 \) -exec grep -Po '(?<type1).*(?=type1|(?<=type2).*(?=type2)' {} \;
I need to include if condition along with find -exec (if grep is true then print the above)
if grep -q 'case=1' <filename>; then
grep -Po '(?<type1).*(?=type1|(?<=type2).*(?=type2)'
fi
Thanks
You can use -exec in find as a condition -- the file matches if the command returns a successful exit code. So you can write:
find /abc/test -type f -name 'tst*.txt' -mtime -1 -exec grep -q 'case=1' {} \; -exec grep -Po '(?<type1).*(?=type1|(?<=type2).*(?=type2)' {} \;
Tests in find are evaluated left-to-right, so the second grep will only be executed if the first one was successful.
If your conditions are more complicated, you can put the whole shell code into a script, and execute the script with -exec. E.g. put this in myscript.sh:
#!/bin/sh
if grep -q 'case=1' "$1"; then
grep -Po '(?<type1).*(?=type1|(?<=type2).*(?=type2)' "$1";
fi
and then do:
find /abc/test -type f -name 'tst*.txt' -mtime -1 -exec ./myscript.sh {} \;
Since you're using PCRE option -P in grep you can combine both searches into one grep as well using lookahead:
find /abc/test -type f -name 'tst*.txt' -mtime -1 -exec grep -Po '(?=.*case=1).*\K((?<=type1).*(?=type1)|(?<=type2).*(?=type2))' {} +
btw the regex shown in your question is invalid, that I've tried to correct it here.

Awk/Sed: How to do a recursive find/replace of a string in files with a certain file extension?

I need to recursively find and replace a string in my .cpp and .hpp files.
Looking at an answer to this question I've found the following command:
find /home/www -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'
Changing it to include my file type did not work - did not changed any single word:
find /myprojects -type f -name *.cpp -print0 | xargs -0 sed -i 's/previousword/newword/g'
Help appreciated.
Don't bother with xargs; use the -exec primary. (Split across two lines for readability.)
find /home/www -type f -name '*.cpp' \
-exec sed -i 's/previousword/newword/g' '{}' \;
chepner's helpful answer proposes the simpler and more efficient use of find's -exec action instead of piping to xargs.
Unless special xargs features are needed, this change is always worth making, and maps to xargs features as follows:
find ... -exec ... {} \; is equivalent to find ... -print0 | xargs -0 -n 1 ...
find ... -exec ... {} + is equivalent to find ... -print0 | xargs -0 ...
In other words:
the \; terminator invokes the target command once for each matching file/folder.
the + terminator invokes the target command once overall, supplying all matching file/folder paths as a single list of arguments.
Multiple calls happen only if the resulting command line becomes too long, which is rare, especially on Linux, where getconf ARG_MAX, the max. command-line length, is large.
Troubleshooting the OP's command:
Since the OP's xargs command passes all matching file paths at once - and per xargs defaults at the end of the command line, the resulting command will effectively look something like this:
sed -i 's/previousword/newword/g' /myprojects/file1.cpp /myprojects/file2.cpp ...
This can easily be verified by prepending echo to sed - though (conceptual) quoting of arguments that need it (paths with, e.g., embedded spaces) will not show (note the echo):
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 echo sed -i 's/previousword/newword/g'
Next, after running the actual command, check whether the last-modified date of the files has changed using stat:
If they have, yet the contents haven't changed, the implication is that sed has processed the files, but the regex in the s function call didn't match anything.
It is conceivable that older GNU sed versions don't work properly when combining -i (in-place editing) with multiple file operands (though I couldn't find anything in the GNU sed release notes).
To rule that out, invoke sed once for each file:
If you still want to use xargs, add -n 1:
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 -n 1 sed -i 's/previousword/newword/g'
To use find's -exec action, see chepner's answer.
With a GNU sed version that does support updating of multiple files with the -i option - which is the case as of at least v4.2.2 - the best formulation of your command is (note the quoted *.cpp argument to prevent premature expansion by the shell, and the use of terminator + to only invoke sed once):
find /myprojects -type f -name '*.cpp' -exec sed -i 's/previousword/newword/g' '{}' +

how do i convert tabs to spaces on many files with bash

How can I convert tabs to spaces in in all .js files in a directory in one command?
find . -type f -iname "*.js" -print0 | xargs -0 -I _FILE_ tab2space _FILE_ _FILE_
This would convert tabs to four spaces:
find /path/to/directory -type f -iname '*.js' -exec sed -ie 's|\t| |g' '{}' \;
Change the space part in sed between the next two | to have a custom number of spaces you like.
Another way is to process all files to one sed call at once with +:
find /path/to/directory -type f -iname '*.js' -exec sed -ie 's|\t| |g' '{}' '+'
Just consider the possible limit of arguments to a command by the system.
Simpler syntax:
for F in *.js; do sed -iE 's|\t| |g' $F; done
(Caution, edits files in place.) Could be made to rename edited copy, or placed into a function if you do this often.

Find files containing a given text

In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string "document.cookie" | "setcookie"
How would I do that?
egrep -ir --include=*.{php,html,js} "(document.cookie|setcookie)" .
The r flag means to search recursively (search subdirectories). The i flag means case insensitive.
If you just want file names add the l (lowercase L) flag:
egrep -lir --include=*.{php,html,js} "(document.cookie|setcookie)" .
Try something like grep -r -n -i --include="*.html *.php *.js" searchstrinhere .
the -i makes it case insensitlve
the . at the end means you want to start from your current directory, this could be substituted with any directory.
the -r means do this recursively, right down the directory tree
the -n prints the line number for matches.
the --include lets you add file names, extensions. Wildcards accepted
For more info see: http://www.gnu.org/software/grep/
find them and grep for the string:
This will find all files of your 3 types in /starting/path and grep for the regular expression '(document\.cookie|setcookie)'. Split over 2 lines with the backslash just for readability...
find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \
xargs egrep -i '(document\.cookie|setcookie)'
Sounds like a perfect job for grep or perhaps ack
Or this wonderful construction:
find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null {} \;
find . -type f -name '*php' -o -name '*js' -o -name '*html' |\
xargs grep -liE 'document\.cookie|setcookie'
Just to include one more alternative, you could also use this:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \;
Where:
-regextype posix-extended tells find what kind of regex to expect
-regex "^.*\.(php|html|js)$" tells find the regex itself filenames must match
-exec grep -EH '(document\.cookie|setcookie)' {} \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where {} represents where the file path goes in this command.
while
E option tells grep to use extended regex (to support the parentheses) and...
H option tells grep to print file paths before the matches.
And, given this, if you only want file paths, you may use:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \; | sed -r 's/(^.*):.*$/\1/' | sort -u
Where
| [pipe] send the output of find to the next command after this (which is sed, then sort)
r option tells sed to use extended regex.
s/HI/BYE/ tells sed to replace every First occurrence (per line) of "HI" with "BYE" and...
s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by ()] including everything [.* = one or more of any-character] from the beginning of the line [^] till' the first ':' followed by anything till' the end of line [$]) by the first group [\1] of the replaced regex.
u tells sort to remove duplicate entries (take sort -u as optional).
...FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).

How do I use a pipe in the exec parameter for a find command?

I'm trying to construct a find command to process a bunch of files in a directory using two different executables. Unfortunately, -exec on find doesn't allow to use pipe or even \| because the shell interprets that character first.
Here is specifically what I'm trying to do (which doesn't work because pipe ends the find command):
find /path/to/jpgs -type f -exec jhead -v {} | grep 123 \; -print
Try this
find /path/to/jpgs -type f -exec sh -c 'jhead -v {} | grep 123' \; -print
Alternatively you could try to embed your exec statement inside a sh script and then do:
find -exec some_script {} \;
A slightly different approach would be to use xargs:
find /path/to/jpgs -type f -print0 | xargs -0 jhead -v | grep 123
which I always found a bit easier to understand and to adapt (the -print0 and -0 arguments are necessary to cope with filenames containing blanks)
This might (not tested) be more effective than using -exec because it will pipe the list of files to xargs and xargs makes sure that the jhead commandline does not get too long.
With -exec you can only run a single executable with some arguments, not arbitrary shell commands. To circumvent this, you can use sh -c '<shell command>'.
Do note that the use of -exec is quite inefficient. For each file that is found, the command has to be executed again. It would be more efficient if you can avoid this. (For example, by moving the grep outside the -exec or piping the results of find to xargs as suggested by Palmin.)
Using find command for this type of a task is maybe not the best alternative. I use the following command frequently to find files that contain the requested information:
for i in dist/*.jar; do echo ">> $i"; jar -tf "$i" | grep BeanException; done
As this outputs a list would you not :
find /path/to/jpgs -type f -exec jhead -v {} \; | grep 123
or
find /path/to/jpgs -type f -print -exec jhead -v {} \; | grep 123
Put your grep on the results of the find -exec.
There is kind of another way you can do it but it is also pretty ghetto.
Using the shell option extquote you can do something similar to this in order to make find exec stuff and then pipe it to sh.
root#ifrit findtest # find -type f -exec echo ls $"|" cat \;|sh
filename
root#ifrit findtest # find -type f -exec echo ls $"|" cat $"|" xargs cat\;|sh
h
I just figured I'd add that because at least the way i visualized it, it was closer to the OP's original question of using pipes within exec.

Resources