Replacing text with sed in shell - bash

Like a number of other questers on here, I'm attempting to use sed to replace text inside of a file. This is in the context of a sh script I'm using to run an analysis code (the executable: "turns2d_over") a number of times with a slight modification to the inputs (contained in the file "steady") after each run. There is a specific line that I'm trying to replace that reads as follows where X.X is a number:
ALFA = X.X EXPLANATION
However...instead of replacing ALFA = X.X with ALFA = Y.Y (a different number), sed either does nothing or, in the case I've put in below (which is a copy of my current sh script), it deletes ALL text in the entire file...which is rather unhelpful.
TOT_VALS=9
ALPHAS[1]=-2.0
ALPHAS[2]=0.0
ALPHAS[1]=2.2
ALPHAS[4]=4.5
ALPHAS[5]=6.3
ALPHAS[6]=8.3
ALPHAS[7]=10.2
ALPHAS[8]=12.2
ALPHAS[9]=14.4
cd ./naca0008
for VAL in $(seq 1 $TOT_VALS); do
sed -n -i 's/ALFA = .*/ALFA = '${ALPHAS[$VAL]}'/g' steady
./turns2d_over<steady
mv 'fort.1' 'grid_in' # rename input grid to save from bulk move
mv 'fort.8' 'fort.q' # rename soln & grid outputs
mv 'fort.9' 'fort.g'
mkdir ./'alfa='${ALPHAS[$VAL]}/; mv 'fort.'* *'.dat' $_ # move files
cp 'steady' $_
mv 'grid_in' 'fort.1' # ready input grid for next run
done
So what's going on?

You either need to remove the -n or add a p at the end of your sed command. The -n means sed won't automatically print the pattern space (and the p will print out the current match).
If you want sed to leave the rest of the file (steady) alone, then remove the -n.
sed -i 's/ALFA = .*/ALFA = '${ALPHAS[$VAL]}'/g' steady
If you want sed to delete the rest of the file (steady) and only print the matching ALFA line, add p like this:
sed -n -i 's/ALFA = .*/ALFA = '${ALPHAS[$VAL]}'/gp' steady

Related

Speed up bash for loop which contains multiple sed commands

my bash for loop looks like:
for i in read_* ; do
cut -f1 $i | sponge $i
sed -i '1 s/^/>/g' $i
sed -i '3 s/^/>ref\n/g' $i
sed -i '4d' $i
sed -i '1h;2H;1,2d;4G' $i
mv $i $i.fasta
done
Are there any methods of speeding up this process, perhaps using GNU parallel?
EDIT: Added input and expected output.
Input:
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Hopeful output:
>ref
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
I used the sed -i '1h;2H;1,2d;4G' $i command to swap lines 2 and 4.
If I read it right, this should create the same result, though it would probably help a LOT if I could see what your input and expected output look like...
awk '{$0=$1}
FNR==1{hd=">"$0; next}
FNR==2{hd=hd"\n"$0;next}
FNR==3{print ">ref\n"$0 > FILENAME".fasta"}
FNR==4{next}
FNR==5{print hd"\n"$0 > FILENAME".fasta"}
' read_*
My input files:
$: cat read_x
foo x
bar x
baz x
last x
curiosity x
$: cat read_y
FOO y
BAR y
BAZ y
LAST y
CURIOSITY y
and the resulting output files:
$: cat read_x.fasta
>ref
baz
>foo
bar
curiosity
$: cat read_y.fasta
>ref
BAZ
>FOO
BAR
CURIOSITY
This runs in one pass with no loop aside from awk's usual internals, and leaves the originals in place so you can check it first. If all is good, all that's left is to remove the originals. For that, I would use extended globbing.
$: shopt -s extglob; rm read_!(*.fasta)
That will clean up the original inputs but not the new outputs.
Same results, three commands, no loops.
I am, or course, making some assumptions about what you are meaning to do that might not be accurate. To get this format in a single sed call -
$: sed -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_x
>ref
baz
>foo
bar
curiosity
but that's not the same commands you used, so maybe I'm misreading it.
To use this to in-place edit multiple files at a time (instead of calling it in a loop on each file), use -si so that the line numbers apply to each file rather than the stream of records they collectively produce.
DON'T use -is, though you could use -i -s.
$: sed -s -i -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_*
This still leaves you with the issue of renaming each, but xargs makes that pretty easy in the given example.
printf "%s\n" read_* | xargs -I# mv # #.fasta
addendum
Using the file you gave in the OP, assuming every file is the same general structure and exactly 4 lines -
$: cat file_0 # I made files 0 through 7, but with same data
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
$: sed -Esi '1{s/^([^[:space:]]+).*/>\1/;h;s/.*/>ref/}; 3x;' file_?
$: cat file_0 # used a diff on each, worked on all at once
>ref
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Breakout:
-Esi Extended pattern matching, separate file linecounts, in-place edits
1{...}; Collectively do these commands, in order, only on every line 1
s/^([^[:space:]]+).*/>\1/ add leading > but strip everything after any whitespace
h store the resulting >\1 line in the hold buffer
s/.*/>ref/ then replace the whole line with a literal >ref
`3x' swap line 3 with the value in the hold buffer from line 1
file_? I used a glob to supply the appropriate list of files all at once.
Doing same with awk:
$: awk 'FNR==1{id=">"$1; print ">ref" >FILENAME".fasta"; next} FNR==3{print id > FILENAME".fasta"; next} {print $0 > FILENAME".fasta"}' file_?
Then you can do file management as above with the xargs/mv for the sed or the shopt/rm for the awk - or we could add a little organizational work in awk if you like. Consider this:
awk 'BEGIN { system(" mkdir -p done ") }
FNR==1 { id=">"$1; print ">ref" > FILENAME".fasta"; next } # skip printing original
FNR==3 { print id > FILENAME".fasta"; next } # skip printing original
{ print $0 > FILENAME".fasta" } # every line NOT skipped
FNR==4 { close(FILENAME); close(FILENAME".fasta");
system("mv " FILENAME " done/")
}' file_?
Then if there are any problems, it's easy to delete the fasta's, move the originals back, adjust the code, and try again. If everything is ok, it's fast and easy to rm -fr done, yes?
Note that I really only added the mkdir inside a system call in the awk to show that you can, and to keep from having to manually do it separately if you have to run a few iterations or move it all into a wrapper script, etc.
The code in the question runs multiple subprocesses (cut, sponge, sed four times, and mv) for each file that is processed. Running subprocesses is relatively slow, so you can speed up the code significantly by reducing the number of them.
This Shellcheck-clean code is one way to do it:
#! /bin/bash -p
old_files=()
for f in read_* ; do
readarray -t lines <"$f"
printf '>ref\n%s\n>%s\n%s\n' \
"${lines[3]}" "${lines[0]%%[[:space:]]*}" "${lines[1]}" >"$f.fasta"
old_files+=( "$f" )
done
rm -- "${old_files[#]}"
This runs no subprocesses when processing individual files. It just reads the lines of the old file into an array using the built-in readarray command and writes to the new file using the built-in printf.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the %% in ${lines[0]%%[[:space:]]*}.
To avoid running rm for each file, the code keeps a list of files to be deleted and removes all of them at the end. If you try the code, consider commenting the rm line until you are very confident that the rest of the code is doing what you want.

looping with grep over several files

I have multiple files /text-1.txt, /text-2.txt ... /text-20.txt
and what I want to do is to grep for two patterns and stitch them into one file.
For example:
I have
grep "Int_dogs" /text-1.txt > /text-1-dogs.txt
grep "Int_cats" /text-1.txt> /text-1-cats.txt
cat /text-1-dogs.txt /text-1-cats.txt > /text-1-output.txt
I want to repeat this for all 20 files above. Is there an efficient way in bash/awk, etc. to do this ?
#!/bin/sh
count=1
next () {
[[ "${count}" -lt 21 ]] && main
[[ "${count}" -eq 21 ]] && exit 0
}
main () {
file="text-${count}"
grep "Int_dogs" "${file}.txt" > "${file}-dogs.txt"
grep "Int_cats" "${file}.txt" > "${file}-cats.txt"
cat "${file}-dogs.txt" "${file}-cats.txt" > "${file}-output.txt"
count=$((count+1))
next
}
next
grep has some features you seem not to be aware of:
grep can be launched on lists of files, but the output will be different:
For a single file, the output will only contain the filtered line, like in this example:
cat text-1.txt
I have a cat.
I have a dog.
I have a canary.
grep "cat" text-1.txt
I have a cat.
For multiple files, also the filename will be shown in the output: let's add another textfile:
cat text-2.txt
I don't have a dog.
I don't have a cat.
I don't have a canary.
grep "cat" text-*.txt
text-1.txt: I have a cat.
text-2.txt: I don't have a cat.
grep can be extended to search for multiple patterns in files, using the -E switch. The patterns need to be separated using a pipe symbol:
grep -E "cat|dog" text-1.txt
I have a dog.
I have a cat.
(summary of the previous two points + the remark that grep -E equals egrep):
egrep "cat|dog" text-*.txt
text-1.txt:I have a dog.
text-1.txt:I have a cat.
text-2.txt:I don't have a dog.
text-2.txt:I don't have a cat.
So, in order to redirect this to an output file, you can simply say:
egrep "cat|dog" text-*.txt >text-1-output.txt
Assuming you're using bash.
Try this:
for i in $(seq 1 20) ;do rm -f text-${i}-output.txt ; grep -E "Int_dogs|Int_cats" text-${i}.txt >> text-${i}-output.txt ;done
Details
This one-line script does the following:
Original files are intended to have the following name order/syntax:
text-<INTEGER_NUMBER>.txt - Example: text-1.txt, text-2.txt, ... text-100.txt.
Creates a loop starting from 1 to <N> and <N> is the number of files you want to process.
Warn: rm -f text-${i}-output.txt command first will be run and remove the possible outputfile (if there is any), to ensure that a fresh new output file will be only available at the end of the process.
grep -E "Int_dogs|Int_cats" text-${i}.txt will try to match both strings in the original file and by >> text-${i}-output.txt all the matched lines will be redirected to a newly created output file with the relevant number of the original file. Example: if integer number in original file is 5 text-5.txt, then text-5-output.txt file will be created & contain the matched string lines (if any).

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

how to add a word at the end of a line with ^ without a line break?

I would like to add a string (example: "1565555555") at the end of a particular line in my file.
My file .txt before :
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
Here is my script:
for file in mydirectory/*txt; do
filename=`basename "$file"`
# read each line
while IFS= read -r line
do
old="$IFS"
IFS="^"
set $line
IFS="$old"
count=1
id="2656556655"
sed "s/$line/&^$id/" -i $file #my problem
((count++))
done < "$file"
done
Today, my result :
mystrinsdsfssffdfdg
^2656556655
mystrdsfdsfdfffding
^2656556655
mystrsfdsdfsffdfing
^2656556655
mystrdsfdfsdfsffing
^2656556655
Expected result :
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
Assuming the objective is to append a string (^2656556655) on the end of every line in a given file ...
One sample file:
$ cat mystring.txt
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
One sed solution that appends to the end of every line in the file:
$ sed 's/$/^2656556655/g' mystring.txt
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
One benefit to this method is that you replace a) the inner looping construct and the repeated sed calls for each line in the file with b) a single sed call and a single pass through the input file. Net result is that you should see a noticeable speed up in the time it takes to process a given file.

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Resources