I have a input file "test.txt" as below -
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
for f in `cat test.txt`
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.

Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
s//$(escapeSubst "$new")/g
s/\n/ --> /
w /dev/stdout
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1] -->
[file1] -->
[file2] -->
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.

From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.

There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
orig = $0
for (old in map) {
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
' test.txt "${files[#]}"


Speed up bash for loop which contains multiple sed commands

my bash for loop looks like:
for i in read_* ; do
cut -f1 $i | sponge $i
sed -i '1 s/^/>/g' $i
sed -i '3 s/^/>ref\n/g' $i
sed -i '4d' $i
sed -i '1h;2H;1,2d;4G' $i
mv $i $i.fasta
Are there any methods of speeding up this process, perhaps using GNU parallel?
EDIT: Added input and expected output.
sampleid 97 stuff 2086 42 213M = 3322 1431
Hopeful output:
I used the sed -i '1h;2H;1,2d;4G' $i command to swap lines 2 and 4.
If I read it right, this should create the same result, though it would probably help a LOT if I could see what your input and expected output look like...
awk '{$0=$1}
FNR==1{hd=">"$0; next}
FNR==3{print ">ref\n"$0 > FILENAME".fasta"}
FNR==5{print hd"\n"$0 > FILENAME".fasta"}
' read_*
My input files:
$: cat read_x
foo x
bar x
baz x
last x
curiosity x
$: cat read_y
and the resulting output files:
$: cat read_x.fasta
$: cat read_y.fasta
This runs in one pass with no loop aside from awk's usual internals, and leaves the originals in place so you can check it first. If all is good, all that's left is to remove the originals. For that, I would use extended globbing.
$: shopt -s extglob; rm read_!(*.fasta)
That will clean up the original inputs but not the new outputs.
Same results, three commands, no loops.
I am, or course, making some assumptions about what you are meaning to do that might not be accurate. To get this format in a single sed call -
$: sed -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_x
but that's not the same commands you used, so maybe I'm misreading it.
To use this to in-place edit multiple files at a time (instead of calling it in a loop on each file), use -si so that the line numbers apply to each file rather than the stream of records they collectively produce.
DON'T use -is, though you could use -i -s.
$: sed -s -i -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_*
This still leaves you with the issue of renaming each, but xargs makes that pretty easy in the given example.
printf "%s\n" read_* | xargs -I# mv # #.fasta
Using the file you gave in the OP, assuming every file is the same general structure and exactly 4 lines -
$: cat file_0 # I made files 0 through 7, but with same data
sampleid 97 stuff 2086 42 213M = 3322 1431
$: sed -Esi '1{s/^([^[:space:]]+).*/>\1/;h;s/.*/>ref/}; 3x;' file_?
$: cat file_0 # used a diff on each, worked on all at once
-Esi Extended pattern matching, separate file linecounts, in-place edits
1{...}; Collectively do these commands, in order, only on every line 1
s/^([^[:space:]]+).*/>\1/ add leading > but strip everything after any whitespace
h store the resulting >\1 line in the hold buffer
s/.*/>ref/ then replace the whole line with a literal >ref
`3x' swap line 3 with the value in the hold buffer from line 1
file_? I used a glob to supply the appropriate list of files all at once.
Doing same with awk:
$: awk 'FNR==1{id=">"$1; print ">ref" >FILENAME".fasta"; next} FNR==3{print id > FILENAME".fasta"; next} {print $0 > FILENAME".fasta"}' file_?
Then you can do file management as above with the xargs/mv for the sed or the shopt/rm for the awk - or we could add a little organizational work in awk if you like. Consider this:
awk 'BEGIN { system(" mkdir -p done ") }
FNR==1 { id=">"$1; print ">ref" > FILENAME".fasta"; next } # skip printing original
FNR==3 { print id > FILENAME".fasta"; next } # skip printing original
{ print $0 > FILENAME".fasta" } # every line NOT skipped
FNR==4 { close(FILENAME); close(FILENAME".fasta");
system("mv " FILENAME " done/")
}' file_?
Then if there are any problems, it's easy to delete the fasta's, move the originals back, adjust the code, and try again. If everything is ok, it's fast and easy to rm -fr done, yes?
Note that I really only added the mkdir inside a system call in the awk to show that you can, and to keep from having to manually do it separately if you have to run a few iterations or move it all into a wrapper script, etc.
The code in the question runs multiple subprocesses (cut, sponge, sed four times, and mv) for each file that is processed. Running subprocesses is relatively slow, so you can speed up the code significantly by reducing the number of them.
This Shellcheck-clean code is one way to do it:
#! /bin/bash -p
for f in read_* ; do
readarray -t lines <"$f"
printf '>ref\n%s\n>%s\n%s\n' \
"${lines[3]}" "${lines[0]%%[[:space:]]*}" "${lines[1]}" >"$f.fasta"
old_files+=( "$f" )
rm -- "${old_files[#]}"
This runs no subprocesses when processing individual files. It just reads the lines of the old file into an array using the built-in readarray command and writes to the new file using the built-in printf.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the %% in ${lines[0]%%[[:space:]]*}.
To avoid running rm for each file, the code keeps a list of files to be deleted and removes all of them at the end. If you try the code, consider commenting the rm line until you are very confident that the rest of the code is doing what you want.

different ways to grep for two distinct strings, appearing in any order, or line, in the same file

I want to return all files that have the strings: "main(" as well as "foo".
This is like using a multi pattern OR grep but with AND instead.
The best I've come up with is:
grep -rl . -e "main("|while read fname; do grep -rl "$fname" -e "foo"; done
It does the job, but ideally I wouldn't have to write bash script.
int main()
grep command would return text1.txt since it contains the strings 'main(' and 'foo'
Just use awk to match both patterns and print filenames:
awk 'FNR == 1 { m = f = 0 } # reset flags at start of each file
/main\(/ { ++m } /foo/ { ++f } # set flags when patterns match
m && f { print FILENAME; nextfile }' **/*
nextfile is a GNU extension which skips to the next file, rather than the next line. With globstar enabled, ** expands recursively. In an interactive bash shell, it is enabled by default, but in a script you can enable it yourself using shopt -s globstar.
With non-GNU awk, you can use another flag to skip lines and avoid printing the filename multiple times:
awk 'FNR == 1 { m = f = p = 0 } # reset flags at start of each file
p { next } # skip lines once this filename has been printed
/main\(/ { ++m } /foo/ { ++f }
m && f { print FILENAME; ++p }' **/*
grep -rlZ 'main(' | xargs -0 grep -l 'foo'
-Z, --null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file
name. For example, grep -lZ outputs a zero byte after each file name instead of the usual newline.
This option makes the output unambiguous, even in the presence of file names containing unusual
characters like newlines. This option can be used with commands like find -print0, perl -0, sort -z,
and xargs -0 to process arbitrary file names, even those that contain newline characters.
The first grep would print all filenames containing main( separated by NUL character. xargs would then pass the files to second grep command which would print files containing foo
If the files are small enough and do not contain NUL character,
grep -rlz 'main(.*foo\|foo.*main('
where -z would use NUL as line separator, effectively slurping whole file

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Looking for a regex pattern, passing that pattern to a script, and replacing the pattern with the output of the script

For every time the pattern shows up (In this example the case of a 2 digit number) I want to pass that pattern to a script and replace that pattern with the output of a script.
I'm using sed an example of what it should look like would be
echo 'siedi87sik65owk55dkd' | sed 's/[0-9][0-9]/.\/'
Right now this returns
But I would like it to return
This is what is in ./
echo "!!!$1!!!"
It has to be replaced with the output. In this example I know I could just use a normal sed substitution but I don't want that as an answer.
sed is for simple substitutions on individual lines, that is all. Anything else, even if it can be done, requires arcane language constructs that became obsolete in the mid-1970s when awk was invented and are used today purely for the mental exercise. Your problem is not a simple substitution so you shouldn't try to use sed to solve it.
You're going to want something like:
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "./ " tgt
if ( (cmd | getline line) > 0) {
tgt = line
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
print head tail
e.g. using an echo in place of your command:
$ echo 'siedi87sik65owk55dkd' |
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "echo !!!" tgt "!!!"
if ( (cmd | getline line) > 0) {
tgt = line
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
print head tail
Ed's awk solution is obviously the way to go here.
For fun, I tried to come up with a sed solution, and here is (a convoluted GNU sed) one that takes the pattern and the script to be run as parameters; the input is either read from standard input (i.e., you can pipe to it) or from a file supplied as the third argument.
For your example, we'd have infile with contents
(two lines to demonstrate how this works for multiple lines), then script with contents
echo "!!!${1}!!!"
and finally the solution script itself, so. Usage is
./so pattern script [input]
where pattern is an extended regular expression as understood by GNU sed (with the -r option), script is the name of the command you want to run for each match, and the optional input is the name of the input file if input is not standard input.
For your example, this would be
./so '[[:digit:]]{2}' script infile
or, as a filter,
cat infile | ./so '[[:digit:]]{2}' script
with output
This is what so looks like:
pat=$1 # The pattern to match
script=$2 # The command to run for each pattern
infile=${3:-/dev/stdin} # Read from standard input if not supplied
# Use sed and have $pattern and $script expand to the supplied parameters
sed -r "
:build_loop # Label to loop back to
h # Copy pattern space to hold space
s/.*($pat).*/.\/\"$script\" \1/ # (1) Extract last match and prepare command
# Replace pattern space with output of command
G # (2) Append hold space to pattern space
s/(.*)$pat(.*)/\1~~~\2/ # (3) Replace last match of pattern with ~~~
/\n[^\n]*$pat[^\n]*$/b build_loop # Loop if string contains match
:fill_loop # Label for second loop
s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/ # (4) Replace last ~~~
t fill_loop # Loop if there was a replacement
s/(.*)\n(.*)~~~(.*)$/\2\1\3/ # (5) Final ~~~ replacement
" < "$infile"
The sed command works with two loops. The first one copies the pattern space to the hold space, then removes everything but the last match from the pattern space and prepares the command to be run. After the substitution with (1) in its comment, the pattern space looks like this:
./script 55
The e command (a GNU extension) then replaces the pattern space with the output of this command. After this, G appends the hold space to the pattern space (2). The pattern space now looks like this:
The substitution at (3) replaces the last match with a string hopefully not equal to the pattern and we get
The loop repeats if the last line of the pattern space still has a match for the pattern. After three loops, the pattern space looks like this:
The second loop now replaces the last ~~~ with the second to last line of the pattern space with substitution (4). The command uses lots of "not a newline" ([^\n]) to make sure we're not pulling the wrong replacement for ~~~.
Because of the way command (4) is written, the loop ends with one last substitution to go, so before command (5), we have this pattern space:
Command (5) is a simpler version of command (4), and after it, the output is as desired.
This seems to be fairly robust and can deal with spaces in the name of the script to be run as long as it's properly quoted when calling:
./so '[[:digit:]]{2}' 'my script' infile
This would fail if
The input file contains ~~~ (solvable by replacing all occurrences at the start, putting them back at the end)
The output of script contains ~~~
The pattern contains ~~~
i.e., the solution very much depends on ~~~ being unique.
Because nobody asked: so as a one-liner.
sed -re ":b;h;s/.*($1).*/.\/\"$2\" \1/;e" -e "G;s/(.*)$1(.*)/\1~~~\2/;/\n[^\n]*$1[^\n]*$/bb;:f;s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/;tf;s/(.*)\n(.*)~~~(.*)$/\2\1\3/" < "${3:-/dev/stdin}"
Still works!
A conceptually simpler multi-utility solution:
Using GNU utilities:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./ &)|g' |
xargs -d'\n' -I% sh -c 'echo '\"%\"
Using BSD utilities (also works with GNU utilities):
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./ &)|g' | tr '\n' '\0' |
xargs -0 -I% sh -c 'echo '\"%\"
The idea is to use sed to translate the tokens of interest lexically into a string containing shell command substitutions that invoke the target script with the token, and then pass the result to the shell for evaluation.
Any embedded " and $ characters in the input must be \-escaped.
xargs -d'\n' (GNU) and tr '\n' '\0' / xargs -0 (BSD) are only needed to correctly preserve whitespace in the input - if that is not needed, the following POSIX-compliant solution will do:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./ &)|g' | tr '\n' '\0' |
xargs -I% sh -c 'printf "%s\n" '\"%\"

Replace strings in multiple files with corresponding caps using bash on MacOSX

I have multiple .txt files, in which I want to replace the strings
old -> new
Old -> New
The first step is to only replace one string Old->New. Here is my current code, but it does not do the job (the files remain unchanged). The sed line works only if I replace the variables with the actual strings.
sed -i '.bak' 's/$old_string/$new_string/g' *.txt
Also, how do I convert a string to all upper-caps and all lower-caps?
Thank you very much for your advice!
To complement #merlin2011's helpful answer:
If you wanted to create the case variants dynamically, try this:
# Define search and replacement strings
# as all-lowercase.
# Loop 3 times and create the case variants dynamically.
# Build up a _single_ sed command that performs all 3
# replacements.
for (( i = 1; i <= 3; i++ )); do
case $i in
1) # as defined (all-lowercase)
2) # initial capital
old_string_variant="$(tr '[:lower:]' '[:upper:]' <<<"${old_string:0:1}")${old_string:1}"
new_string_variant="$(tr '[:lower:]' '[:upper:]' <<<"${new_string:0:1}")${new_string:1}"
3) # all-uppercase
old_string_variant=$(tr '[:lower:]' '[:upper:]' <<<"$old_string")
new_string_variant=$(tr '[:lower:]' '[:upper:]' <<<"$new_string")
# Append to the sed command string. Note the use of _double_ quotes
# to ensure that variable references are expanded.
sedCmd+="s/$old_string_variant/$new_string_variant/g; "
# Finally, invoke sed.
sed -i '.bak' "$sedCmd" *.txt
Note that bash 4 supports case conversions directly (as part of parameter expansion), but OS X, as of 10.9.3, is still on bash 3.2.51.
Alternative solution, using awk to create the case variants and synthesize the sed command:
Aside from being shorter, it is also more robust, because it also handles strings correctly that happen to contain characters that are regex metacharacters (characters with special meaning in an regular expression, e.g., *) or have special meaning in sed's s function's replacement-string parameter (e.g., \), through appropriate escaping; without escaping, the sed command would not work as expected.
Caveat: Doesn't support strings with embedded \n chars. (though that could be fixed, too).
# Define search and replacement strings as all-lowercase literals.
# Synthesize the sed command string, utilizing awk and its tolower() and toupper()
# functions to create the case variants.
# Note the need to escape \ chars to prevent awk from interpreting them.
sedCmd=$(awk \
-v old_string="${old_string//\\/\\\\}" \
-v new_string="${new_string//\\/\\\\}" \
printf "s/%s/%s/g; s/%s/%s/g; s/%s/%s/g",
old_string, new_string,
toupper(substr(old_string,1,1)) substr(old_string,2), toupper(substr(new_string,1,1)) substr(new_string,2),
toupper(old_string), toupper(new_string)
# Invoke sed with the synthesized command.
# The inner sed command ensures that all regex metacharacters in the strings
# are escaped so that sed treats them as literals.
sed -i '.bak' "$(sed 's#[][(){}^$.*?+\]#\\&#g' <<<"$sedCmd")" *.txt
If you want to do bash variable expansion inside the argument to sed, you need to use double quotes " instead of single quotes '.
sed -i '.bak' "s/$old_string/$new_string/g" *.txt
In terms of getting matches on all three of the literal substitutions, the cleanest solution may be just to run sed three times in a loop like this.
declare -a olds=(old Old OLD)
declare -a news=(new New NEW)
for i in `seq 0 2`; do
sed -i "s/${olds[$i]}/${news[$i]}/g" *.txt
Update: The solution above works on Linux, but apparently OS X has different requirements. Additionally, as #mklement0 mentioned, my for loop is silly. Here is an improved version for OS X.
declare -a olds=(old Old OLD)
declare -a news=(new New NEW)
for (( i = 0; i < ${#olds[#]}; i++ )); do
sed -i '.bak' "s/${olds[$i]}/${news[$i]}/g" *.txt
Assuming each string is separated by spaces from your other strings and that you don't want partial matches within longer strings and that you don't care about preserving white space on output and assuming that if an "old" string matches on a "new" string after a previous conversion operation, then the string should be changed again:
$ cat tst.awk
for (fldNr=1; fldNr<=NF; fldNr++) {
for (stringNr=1; stringNr in oldStrs; stringNr++) {
oldStr = oldStrs[stringNr]
if (tolower($fldNr) == oldStr) {
newStr = newStrs[stringNr]
$fldNr = ""
for (charNr=1; charNr in fldChars; charNr++) {
fldChar = fldChars[charNr]
newChar = newChars[charNr]
$fldNr = $fldNr ( fldChar ~ /[[:lower:]]/ ?
newChar : toupper(newChar) )
$ cat file
The old Old OLD smOLDering QuICk brown FoX jumped
$ awk -v old="old" -v new="new" -f tst.awk file
The new New NEW smOLDering QuICk brown FoX jumped
Note that the "old" in "smOLDering" did not get changed. Is that desirable?
$ awk -v old="QUIck Fox" -v new="raBid DOG" -f tst.awk file
The old Old OLD smOLDering RaBId brown DoG jumped
$ awk -v old="THE brown Jumped" -v new="FEW dingy TuRnEd" -f tst.awk file
Few old Old OLD smOLDering QuICk dingy FoX turned
Think about whether or not this is your expected output:
$ awk -v old="old new" -v new="new yes" -f tst.awk file
The yes Yes YES smOLDering QuICk brown FoX jumped
A few lines of sample input and expected output in the question would be useful to avoid all the guessing and assumptions.
