Find and replace string and print file directory on change - shell

I am using find and sed to replace a string in multiple files. Here is my script:
find ./ -type f -name "*.html" -maxdepth 1 -exec sed -i '' "s/${REPLACE_STRING}/${STRING}/g" {} \; -print
The -print always prints the file no matter if something was changed or not. What I would like to see what files are changed. Ideally I would like the output to be something like this(as the files are changing):
/path/to/file was changed
- REPLACE STRING line 9 was changed
- REPLACE STRING line 12 was changed
- REPLACE STRING line 26 was changed
/path/to/file2 was changed
- REPLACE STRING line 1 was changed
- REPLACE STRING line 6 was changed
- REPLACE STRING line 36 was changed
Is there anyway of doing something like this?

Cool idea. I think -print is a deadend for the reason you mention, so it needs to be done in the exec. I think sed is also a deadend due to the challenge of printing to STDOUT as well as modifying the file. So a natural extension is to wrap some Perl around it.
What if this was your exec statement:
perl -p -i -e '$i=1 if not defined($i); print STDOUT "$ARGV, line $i: $_" if s/REPLACE_STRING/STRING/; $i++' {} \;
-p wraps the Perl statements in a standard while(<>) loop so the file is processed line by line just like sed.
-i does in-place replacement, just like sed.
-e means execute the following Perl statements.
if not defined is a sneaky way of initialising a line count variable, even though it's executed for every line.
STDOUT tells print to output to the console instead of the file.
$ARGV is the current filename, when reading from <>.
$_ is the line being processed.
if means the print only gets executed if a match is found.
For an input file text.txt containing:
line 1
token 2
line 3
token 4
line 5
The statement perl -p -i -e '$i=1 if not defined($i); print STDOUT "$ARGV, line $i: $_" if s/token/sub/; $i++' text.txt gives me:
text.txt, line 2: sub 2
text.txt, line 4: sub 4
Leaving text.txt containing:
line 1
sub 2
line 3
sub 4
line 5
So you don't get your introductory "file was changed" line, but for a one-liner I think it's a pretty good compromise.
Operating on a couple of files it looks like this:
find ./ -type f -name "*.txt" -maxdepth 1 -exec perl -p -i -e '$i=1 if not defined($i); print STDOUT "$ARGV, line $i: $_" if s/token/sub/; $i++' {} \;
.//text1.txt, line 2: sub 2
.//text1.txt, line 4: sub 4
.//text2.txt, line 1: sub 1
.//text2.txt, line 3: sub 3
.//text2.txt, line 5: sub 5

You could chain -exec actions and take advantage of the exit status. For example:
find . \
-maxdepth 1 \
-type f \
-name '*.html' \
-exec grep -Hn "$REPLACE_STRING" {} \; \
-exec sed -i '' "s/${REPLACE_STRING}/${STRING}/g" {} \;
This prints, for each matching file, the path, the line number and the line:
./file1.html:9:contents of line 9
./file1.html:12:contents of line 12
./file1.html:26:contents of line 26
./file2.html:1:contents of line 1
./file2.html:6:contents of line 6
./file2.html:36:contents of line 36
For files without a match, nothing else happens; for files with a match, the sed command will be called.
If you wanted output closer to what you have in your question, you could add a few actions:
find . \
-maxdepth 1 \
-type f \
-name '*.html' \
-exec grep -q "$REPLACE_STRING" {} \; \
-printf '%p was changed\n' \
-exec grep -n "$REPLACE_STRING" {} \; \
-exec sed -i '' "s/${REPLACE_STRING}/${STRING}/g" {} \; \
| sed -E "s/^([[:digit:]]+):.*/ - $REPLACE_STRING line \1 was changed/"
This now first checks if the file contains the string, silently, with grep -q, then prints the filename (-printf), then all the matching lines with line numbers (grep -n), then does the substitution with sed and finally modifies the output slightly with sed.
Since you're using sed -i '', I assume you're on macOS; I'm not sure if the stock find on there supports the printf option.
By now, we're pretty close to running a complex-ish script on each file that matches, so we might as well do that directly:
shopt -s nullglob
for f in ./*.html; do
if grep -q "$REPLACE_STRING" "$f"; then
printf '%s\n' "$f was changed"
grep -n "$REPLACE_STRING" "$f" \
| sed -E "s/^([[:digit:]]+):.*/ - $REPLACE_STRING line \1 was changed/"
sed -i '' "s/${REPLACE_STRING}/${STRING}/g" "$f"
fi
done

Replace your find+sed command:
find ./ -type f -name "*.html" -maxdepth 1 -exec sed -i '' "s/${REPLACE_STRING}/${STRING}/g" {} \; -print
with this GNU awk command (needs gawk for inplace editing):
gawk -i inplace -v old="$REPLACE_STRING" -v new="$STRING" '
FNR==1 { hdr=FILENAME " was changed\n" }
gsub(old,new) { printf "%s - %s line %d was changed\n", hdr, old, FNR | "cat>&2"; hdr="" }
1' *.html
You could also make it much more robust with awk than with sed if necessary since awk can support literal strings while sed can't

Alright, always defer to Ed's awk script for efficiency, but continuing with the sed + helper script using a preliminary call to grep to determine whether your file contains the word to replace, you could use a short helper script taking your ${REPLACE_STRING}, ${STRING} and filename as the first three positional parameters as follows:
Helper Script named helper.sh
#!/bin/sh
test -z "$1" && exit
test -z "$2" && exit
test -z "$3" && exit
findw="$1"
replw="$2"
fname="$3"
grep -q "$findw" "$fname" || exit
echo "$(readlink -f $fname) was changed"
grep -n "$findw" "$fname" | {
while read line; do
printf -- " - REPLACE STRING line %d was changed\n" "${line%:*}"
done }
sed -i "s/$findw/$replw/g" "$fname"
Then your call to find could be, e.g.:
find . -type f -name "f*" -exec ./helper.sh "dog" "cat" '{}' \;
Example Use/Output
Starting with a couple of files named f containing:
$ cat f
my
dog
dog
has
fleas
In a file structure containing the script in the present directory with a subdirectory d1 and multiple copies of f, e.g.
$ tree .
.
├── d1
│   └── f
├── f
└── helper.sh
Running the script results in the following:
$ find . -type f -name "f*" -exec ./helper.sh "dog" "cat" '{}' \;
/tmp/tmp-david/f was changed
- REPLACE STRING line 2 was changed
- REPLACE STRING line 3 was changed
/tmp/tmp-david/d1/f was changed
- REPLACE STRING line 2 was changed
- REPLACE STRING line 3 was changed
and the contents of f are changed accordingly
$ cat f
my
cat
cat
has
fleas
If there is no search term found in any of the files located by find, the modification times on those files are left unchanged.
Now with all that in mind, if you have gawk available, follow Ed's advise, but -- you can do it with sed and a helper :)

install Perl easily for free, define your own strings on bash shell and test here:
STRING=
REPLACE=
perl -ne 'foreach(`find . -maxdepth 1 -type f -iname "*.html"`){ open IH,$_ or die "Error $!"; print "Processing: $_";while (<IH>) {$s=$_;$t=s/$REPLACE/$STRING/; print "$s --> $_" if $t };print "Nothing replaced" if !$t}'
to truly edit it add -i option so it'd be perl -i -ne....

Related

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

How to get md5 output but tab separated?

I can use md5 -r foo.txt > md5.txt to create a text file with the md5 of the file followed by a space and then the local path to that file .. but how would I go about getting those two items separated by a TAB character?
For reference and context, the full command I'm using is
find . -type f -exec \
bash -c '
md=$(md5 -r "$0")
siz=$(wc -c <"$0")
echo -e "${md}\t${siz}"
' {} \; \
> listing.txt
Note that the filepath item of md5 output might also contain spaces, like ./path to file/filename, and these should not be converted to tabs.
sed is another option:
find directory/ -type f -exec md5 -r '{}' '+' | sed 's/ /\t/' > listing.txt
This will replace the first space on each line with a tab.
(Note that the file you're redirecting output to should not be in the directory tree being searched by find)
Try the builtin printf and P.E. parameter expansion, to split the md variable.
find . -type f -exec sh -c '
md=$(md5 -r "$0") siz=$(wc -c <"$0")
printf "%s\t%s\t%s\n" "${md%% *}" "${md#*"${md%% *}"}" "${siz}"
' {} \; > listing.txt
Output
d41d8cd98f00b204e9800998ecf8427e ./bar.txt 0
d41d8cd98f00b204e9800998ecf8427e ./foo.txt 0
d41d8cd98f00b204e9800998ecf8427e ./more.txt 0
d41d8cd98f00b204e9800998ecf8427e ./baz.txt 0
314a1673b94e05ed5d9757b6ee33e3b1 ./qux.txt 0
See the online manual for bash ParameExpansion
The local man pages if available. PAGER='less +/^[[:space:]]*parameter\ expansion' man bash
Looks like you are simply left with spaces between the hash and file name that you don't want. A quick pass through awk can clean that up for you. By default input awk delimiter is any amount of white space. Simply running though awk and printing the fields with a new OFS (output field separator) is all you need. In fact, it makes the pass through echo pointless.
time find . -type f -exec bash -c 'md=$(md5 -r "$0"); siz=$(wc -c <"$0"); awk -vOFS="\t" "{print \$1,\$2,\$3}" <<< "${md} ${siz}" ' > listing.txt {} \;
Personally, I would have ran the output of that find command into a while loop. This is basically the same as above, but a little easier to follow.
time find . -type f | \
while read -r file; do
md=$(md5 -r "$file")
siz=$(wc -c < "$file")
awk -vOFS="\t" '{print $1,$2,$3}' <<< "${md} ${siz}"
done > listing.txt

Find single line files and move them to a subfolder

I am using the following bash line to find text files in a subfolder with a given a pattern inside it and move them to a subfolder:
find originalFolder/ -maxdepth 1 -type f -exec grep -q 'mySpecificPattern' {} \; -exec mv -i {} destinationFolder/ \;
Now instead of grepping a pattern, I would like to move the files to a subfolder if they consist only of a single line (of text): how can I do that?
You can do it this way:
while IFS= read -r -d '' file; do
[[ $(wc -l < "$file") -eq 1 ]] && echo mv -i "$file" destinationFolder/
done < <(find originalFolder/ -maxdepth 1 -type f -print0)
Note use of echo in front of mv so that you can verify output before actually executing mv. Once you're satisfied with output, remove echo before mv.
Using wc as shown above is the most straightforward way, although it reads the entire file to determine the length. It's also possible to do length checks with awk, and the exit function lets you fit that into a find command.
find . -type f -exec awk 'END { exit (NR==1 ? 0 : 1) } NR==2 { exit 1 }' {} \; -print
The command returns status 0 if there has been only 1 input record at end-of-file, and it also exits immediately with status 1 when line 2 is encountered; this should easily outrun wc if large files are a performance concern.

Bash - Multiple replace with sed statement

I'm getting mad with a script performance.
Basically I have to replace 600 strings in more than 35000 files.
I have got something like this:
patterns=(
oldText1 newText1
oldText2 newText2
oldText3 newText3
)
pattern_count=${#patterns[*]}
files=(`find \. -name '*.js'`);
files_count=${#files[*]}
for ((i=0; i < $pattern_count ; i=i+2)); do
search=${patterns[i]};
replace=${patterns[i+1]};
echo -en "\e[0K\r Status "$proggress"%. Iteration: "$i" of " $pattern_count;
for ((j=0; j < $files_count; j++)); do
command sed -i s#$search#$replace#g ${files[j]};
proggress=$(($i*100/$files_count));
echo -en "\e[0K\r Inside the second loop: " $proggress"%. File: "$j" of "$files_count;
done
proggress=$(($i*100/$pattern_count));
echo -en "\e[0K\r Status "$proggress"%. Iteration: "$i" of " $pattern_count;
done
But this takes tons of minutes. There is another solution? Probably using sed just one time and not in a double loop?
Thanks a lot.
Create a proper sed script:
s/pattern1/replacement1/g
s/pattern2/replacement2/g
...
Run this script with sed -f script.sed file (or in whatever way is required).
You may create that sed script using your array:
printf 's/%s/%s/g\n' "${patterns[#]}" >script.sed
Applying it to the files:
find . -type f -name '*.js' -exec sed -i -f script.sed {} ';'
I don't quite know how GNU sed (which I assume you're using) is handling multiple files when you use -i, but you may also want to try
find . -type f -name '*.js' -exec sed -i -f script.sed {} +
which may potentially be much more efficient (executing as few sed commands as possible). As always, test on data that you can afford to throw away after testing.
For more information about using -exec with find, see https://unix.stackexchange.com/questions/389705
You don't need to run sed multiple times over one file. You can separate sed commands with ';'
You can execute multiple seds in parallel
For example:
patterns=(
oldText1 newText1
oldText2 newText2
oldText3 newText3
)
// construct sed argument such as 's/old/new/g;s/old2/new2/g;...'
sedarg=$(
for ((i = 0; i < ${#patterns[#]}; i += 2)); do
echo -n "s/${patterns[i]}/${patterns[i+1]}/g;"
done
)
// find all files named '*.js' and pass them to args with zero as separator
// xargs will parse them:
// -0 use zero as separator
// --verbose will print the line before execution (ie. sed -i .... file)
// -n1 pass one argument/one line to one sed
// -P8 run 8 seds simulteneusly (experiment with that value, depends on how fast your cpu and harddrive is)
find . -type f -name '*.js' -print0 | xargs -0 --verbose -n1 -P8 sed -i "$sedarg"
If you need the progress bar so much, I guess you can count the lines xargs --verbose returns or better use parallel --bar, see this post.

getting the output of a grep command in a loop

I have a shell script that includes this search:
find . -type f -exec grep -iPho "barh(li|mar|ag)" {} \;
I want to capture each string the grep command finds and send it a function I will create named "parser"
parser(){
# do stuff with each single grep result found
}
how can that be done?
is this right?
find . -type f -exec grep -iPho "barh(li|mar|ag)" {parser $1} \;
I do not want to output the entire find command result to the function
Only shell can execute a function. You need to use bash -c in your find in order to execute it. That is also the reason you need to export your function, so that the new process sees it.
parser() {
while IFS= read -r line; do
echo "Processing line: $line"
done <<< "$1"
}
export -f parser
find . -type f -exec bash -c 'parser "$(grep -iPho "barh(li|mar|ag)" "$1")"' -- {} \;
The code above will send all occurrences from file1, then file2 etc to your function to process. It will not send each line one by one and therefore you need to loop over the lines in your function. If there is no occurrence of your regex in a file, it will still call your function with an empty input!
That might not be the best solution for you so let's try to add the loop inside the bash -c statement and really process the lines one by one:
parser() {
echo "Processing line: $1"
}
export -f parser
find . -type f -exec bash -c 'grep -iPho "barh(li|mar|ag)" "$#" | while IFS= read -r line; do parser "$line"; done' -- {} +
EDIT: Very nice and simple solution not using bash -c suggested by #gniourf_gniourf:
parser() {
echo "Processing line: $1"
}
find . -type f -exec grep -iPho "barh(li|mar|ag)" {} + | while IFS= read -r line; do parser "$line"; done
This approach works fine and it will process each line one by one. You also do not need to export your function with this approach. But you have to care for some things that might surprise you.
Each command in a pipeline is executed in its own subshell, and any variable assignment in your parser function or your while in general will be lost after returning from that very subshell. If you are writing a script, simple shopt -s lastpipe will suffice and run the last pipe command in the current shell environment. Or you can use process substitution:
parser() {
echo "Processing line: $1"
}
while IFS= read -r line; do
parser "$line";
done < <(find . -type f -exec grep -iPho "barh(li|mar|ag)" {} +)
Note that in the previous bash -c examples, you will experience the same behavior and your variable assignments will be lost as well.
You need to export your function.
You also need to call bash to execute the function.
parser() {
echo "GOT: $1"
}
export -f parser
find Projects/ -type f -name '*rb' -exec bash -c 'parser "$0"' {} \;
i suggest you to use sed ,this is more powerful tool to do text processing.
for example i want to add string "myparse" after the line that end as "ha",i can do this like
# echo "haha" > text1
# echo "hehe" > text2
# echo "heha" > text3
# find . -type f -exec sed '/ha$/s/ha$/ha myparse/' {} \;
haha myparse
heha myparse
hehe
if you really want to replace the file ,not just print to stdout,you can do this like
# find . -type f -exec sed -i '/ha$/s/ha$/ha myparse/' {} \;

Resources