How to add lines after a pattern using sed - macos

In shell script, how can I add lines after a certain pattern? Say I have the following file and I want to add two lines after block 1 and blk 2.
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
[block 1] and [blk 2] will be present in the file.
The output I am expecting is below.
abc
def
[block 1]
oranges = 5
pears = 2
apples = 3
grapes = 4
[blk 2]
oranges = 5
pears = 2
banana = 2
apples = 3
I thought of doing this with sed. I tried the below command but it does not work on my Mac. I checked these posts but I couldn't find what I am doing wrong.
$sed -i '/\[block 1\]/a\n\toranges = 3\n\tpears = 2' sample2.txt
sed: 1: "sample2.txt": unterminated substitute pattern
How can I fix this? Thanks for your help!
[Edit]
I tried the below and these didn't work on my Mac.
$sed -E '/\[block 1\]|\[blk 2\]/r\\n\\toranges = 3\\n\\tpears = 2' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
$sed -E '/\[block 1\]|\[blk 2\]/r\n\toranges = 3\n\tpears = 2' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
Awk attempt:
$awk -v RS= '/\[block 1\]/{$0 = $0 ORS "\toranges = 3" ORS "\tpears = 2" ORS}
/\[blk 2\]/{$0 = $0 ORS "\toranges = 5" ORS "\tpears = 2" ORS} 1' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
oranges = 3
pears = 2
oranges = 5
pears = 2

Note that the text provided to the a command has to be on a separate line:
sed '/\[block 1\]/ {a\
\toranges = 3\n\tpears = 2
}' file
and all embedded newlines have to be escaped. Another way to write it (probably more readable):
sed '/\[block 1\]/ {a\
oranges = 3\
pears = 2
}' file
Also, consider the r command as an alternative to the a command when larger amounts of text have to be inserted (e.g. more than one line). It will read data from a text file provided:
sed '/\[block 1\]/r /path/to/text' file
To handle multiple sections with one sed program, you can use the alternation operator (available in ERE, notice the -E flag):
sed -E '/\[block 1\]|\[blk 2\]/r /path/to/text' file

This awk should work with empty RS. This breaks each block into a single record.
awk -v RS= '/\[block 1\]/{$0 = $0 ORS "\toranges = 3" ORS "\tpears = 2" ORS}
/\[blk 2\]/{$0 = $0 ORS "\toranges = 5" ORS "\tpears = 2" ORS} 1' file
abc
def
[block 1]
apples = 3
grapes = 4
oranges = 3
pears = 2
[blk 2]
banana = 2
apples = 3
oranges = 5
pears = 2

This might work for you (GNU sed):
sed '/^\[\(block 1\|blk 2\)\]\s*$/{n;h;s/\S.*/oranges = 5/p;s//pears = 2/p;x}' file
Locate the required match, print it and then store the next line in the hold space. Replace the first non-space character to the end of the line with the first required line, repeat for the second required string and then revert to the original line.

Related

Isolate certain parts of a text file with shell script

//unit-translator
#head
<
shell: /bin/bash;
>
#stuffs
<
[~]: ~;
[binary's]: /bin/bash;
[run-as-root]: sudo;
>
#commands
<
make-directory:mkdir;
move-to-directory:cd;
url-download-current-dirrectory:wget;
extract-here-tar:tar;
copy:cp;
remove-directory-+files:rm -R;
enter-root:su;
>
I want to isolate everything after "#commands", between the 2 "<", ">"'s as a string. How do I go about this?
I made the whole fille a string
translator=$(<config.txt)
I want to iscolate everything in the commands section, and store it as the variable "translator commands".
From that point I plan to split each line, and each set of commands something like this:
IFS=';' read -a translatorcommandlines <<< "$translatorcommands"
IFS=':' read -a translatorcommand <<< "$translatorcommandlines"
I'm so clueless, please help me!
If you mean to extract all lines after #command between < and > you can go with this command:
sed '0,/^#command/d' config.txt | sed '/>/q' | grep "^\w"
which skips all lines before #command, prints lines until > and takes only those starting with word character.
My output for your file is:
make-directory:mkdir;
move-to-directory:cd;
url-download-current-dirrectory:wget;
extract-here-tar:tar;
copy:cp;
remove-directory-+files:rm -R;
enter-root:su;
The general purpose text processing tool for UNIX is "awk". You don't show in your question what you want your output to be so idk what you want but hopefully this is enough for you to figure it out from here:
$ cat tst.awk
BEGIN { RS=">"; FS="\n" }
{ gsub(/^.*<[[:blank:]]*\n|\n[[:blank:]]*$/,"") }
NF {
for (i=1;i<=NF;i++) {
print "record", NR, "field", i, "= [" $i "]"
}
print "----"
}
$ awk -f tst.awk file
record 1 field 1 = []
record 1 field 2 = [shell: /bin/bash;]
record 1 field 3 = []
----
record 2 field 1 = []
record 2 field 2 = [[~]: ~;]
record 2 field 3 = [[binary's]: /bin/bash;]
record 2 field 4 = [[run-as-root]: sudo;]
record 2 field 5 = []
record 2 field 6 = []
----
record 3 field 1 = []
record 3 field 2 = [make-directory:mkdir;]
record 3 field 3 = [move-to-directory:cd;]
record 3 field 4 = [url-download-current-dirrectory:wget;]
record 3 field 5 = [extract-here-tar:tar;]
record 3 field 6 = [copy:cp;]
record 3 field 7 = [remove-directory-+files:rm -R;]
record 3 field 8 = [enter-root:su;]
record 3 field 9 = []
----

Output the line number when there is a matching value, for each column

Say I've got a file.txt
Position name1 name2 name3
2 A G F
4 G S D
5 L K P
7 G A A
8 O L K
9 E A G
and I need to get the output:
name1 name2 name3
2 2 7
4 7 9
7 9
It outputs each name, and the position numbers where there is an A or G
In file.txt, the name1 column has an A in position 2, G's in positions 4 and 7... therefore in the output file: 2,4,7 is listed under name1
...and so on
Strategy I've devised so far (not very efficient): reading each column one at a time, and outputting the position number when a match occurs. Then I'd get the result for each column and cbind them together using r.
I'm fairly certain there's a better way using awk or bash... ideas appreciated.
$ cat tst.awk
NR==1 {
for (nameNr=2;nameNr<=NF;nameNr++) {
printf "%5s%s", $nameNr, (nameNr<NF?OFS:ORS)
}
next
}
{
for (nameNr=2;nameNr<=NF;nameNr++) {
if ($nameNr ~ /^[AG]$/) {
hits[nameNr,++numHits[nameNr]] = $1
maxHits = (numHits[nameNr] > maxHits ? numHits[nameNr] : maxHits)
}
}
}
END {
for (hitNr=1; hitNr<=maxHits; hitNr++) {
for (nameNr=2;nameNr<=NF;nameNr++) {
printf "%5s%s", hits[nameNr,hitNr], (nameNr<NF?OFS:ORS)
}
}
}
$ awk -f tst.awk file
name1 name2 name3
2 2 7
4 7 9
7 9
Save the below script :
#!/bin/bash
gawk '{if( NR == 1 ) {print $2 >>"name1"; print $3 >>"name2"; print $4>>"name3";}}
{if($2=="A" || $2=="G"){print $1 >> "name1"}}
{if($3=="A" || $3=="G"){print $1 >> "name2"}}
{if($4=="A" || $4=="G"){print $1 >> "name3"}}
END{system("paste name*;rm name*")}' $1
as finder. Make finder an executable(using chmod) and then do :
./finder file.txt
Note : I have used three temporary files name1, name2 and name3. You could change the file names at your convenience. Also, these files will be deleted at the end.
Edit : Removed the BEGIN part of the gawk.

How can I read first n and last n lines from a file?

How can I read the first n lines and the last n lines of a file?
For n=2, I read online that (head -n2 && tail -n2) would work, but it doesn't.
$ cat x
1
2
3
4
5
$ cat x | (head -n2 && tail -n2)
1
2
The expected output for n=2 would be:
1
2
4
5
head -n2 file && tail -n2 file
Chances are you're going to want something like:
... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'
or if you need to specify a number and taking into account #Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:
... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'
I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).
To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:
$ cat file
1
2
3
4
5
6
7
8
.
$ cat tst.awk
BEGIN {
print "Populating array by index ((NR-1)%n)+1:"
}
{
buf[((NR-1)%n)+1] = $0
printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]
}
END {
print "\nAccessing array by index ((NR+i-1)%n)+1:"
for (i=1;i<=n;i++) {
printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
}
}
$
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8
Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8
This might work for you (GNU sed):
sed -n ':a;N;s/[^\n]*/&/2;Ta;2p;$p;D' file
This keeps a window of 2 (replace the 2's for n) lines and then prints the first 2 lines and at end of file prints the window i.e. the last 2 lines.
Here's a GNU sed one-liner that prints the first 10 and last 10 lines:
gsed -ne'1,10{p;b};:a;$p;N;21,$D;ba'
If you want to print a '--' separator between them:
gsed -ne'1,9{p;b};10{x;s/$/--/;x;G;p;b};:a;$p;N;21,$D;ba'
If you're on a Mac and don't have GNU sed, you can't condense as much:
sed -ne'1,9{' -e'p;b' -e'}' -e'10{' -e'x;s/$/--/;x;G;p;b' -e'}' -e':a' -e'$p;N;21,$D;ba'
Explanation
gsed -ne' invoke sed without automatic printing pattern space
-e'1,9{p;b}' print the first 9 lines
-e'10{x;s/$/--/;x;G;p;b}' print line 10 with an appended '--' separator
-e':a;$p;N;21,$D;ba' print the last 10 lines
If you are using a shell that supports process substitution, another way to accomplish this is to write to multiple processes, one for head and one for tail. Suppose for this example your input comes from a pipe feeding you content of unknown length. You want to use just the first 5 lines and the last 10 lines and pass them on to another pipe:
cat | { tee >(head -5) >(tail -10) 1>/dev/null} | cat
The use of {} collects the output from inside the group (there will be two different programs writing to stdout inside the process shells). The 1>/dev/null is to get rid of the extra copy tee will try to write to it's own stdout.
That demonstrates the concept and all the moving parts, but it can be simplified a little in practice by using the STDOUT stream of tee instead of discarding it. Note the command grouping is still necessary here to pass the output on through the next pipe!
cat | { tee >(head -5) | tail -15 } | cat
Obviously replace cat in the pipeline with whatever you are actually doing. If your input can handle the same content to writing to multiple files you could eliminate the use of tee entirely as well as monkeying with STDOUT. Say you have a command that accepts multiple -o output file name flags:
{ mycommand -o >(head -5) -o >(tail -10)} | cat
awk -v n=4 'NR<=n; {b = b "\n" $0} NR>=n {sub(/[^\n]*\n/,"",b)} END {print b}'
The first n lines are covered by NR<=n;. For the last n lines, we just keep track of a buffer holding the latest n lines, repeatedly adding one to the end and removing one from the front (after the first n).
It's possible to do it more efficiently, with an array of lines instead of a single buffer, but even with gigabytes of input, you'd probably waste more in brain time writing it out than you'd save in computer time by running it.
ETA: Because the above timing estimate provoked some discussion in (now deleted) comments, I'll add anecdata from having tried that out.
With a huge file (100M lines, 3.9 GiB, n=5) it's taken 454 seconds, compared to #EdMorton's lined-buffer solution, which executed in only 30 seconds. With more modest inputs ("mere" millions of lines) the ratio is similar: 4.7 seconds vs. 0.53 seconds.
Almost all of that additional time in this solution seems to be spent in the sub() function; a tiny fraction also does come from string concatenation being slower than just replacing an array member.
Use GNU parallel. To print the first three lines and the last three lines:
parallel {} -n 3 file ::: head tail
Based on dcaswell's answer, the following sed script prints the first and last 10 lines of a file:
# Make a test file first
testit=$(mktemp -u)
seq 1 100 > $testit
# This sed script:
sed -n ':a;1,10h;N;${x;p;i\
-----
;x;p};11,$D;ba' $testit
rm $testit
Yields this:
1
2
3
4
5
6
7
8
9
10
-----
90
91
92
93
94
95
96
97
98
99
100
Here is another AWK script. Assuming there might be overlap of head and tail.
File script.awk
BEGIN {range = 3} # Define the head and tail range
NR <= range {print} # Output the head; for the first lines in range
{ arr[NR % range] = $0} # Store the current line in a rotating array
END { # Last line reached
for (row = NR - range + 1; row <= NR; row++) { # Reread the last range lines from array
print arr[row % range];
}
}
Running the script
seq 1 7 | awk -f script.awk
Output
1
2
3
5
6
7
For overlapping head and tail:
seq 1 5 |awk -f script.awk
1
2
3
3
4
5
Print the first and last n lines
For n=1:
seq 1 10 | sed '1p;$!d'
Output:
1
10
For n=2:
seq 1 10 | sed '1,2P;$!N;$!D'
Output:
1
2
9
10
For n>=3, use the generic regex:
':a;$q;N;(n+1),(n*2)P;(n+1),$D;ba'
For n=3:
seq 1 10 | sed ':a;$q;N;4,6P;4,$D;ba'
Output:
1
2
3
8
9
10

Usage of sed in shell

I need to use sed in shell to do the following things in a file:
1) Suppose I have a line:
#listen_tcp = 1
I want to delete first character #
2) Suppose I have a line:
#listen_tcp = 1
I want to change to last character 1 to 0
3) Suppose I have a line:
#listen_tcp
I want to append it, to #listen_tcp = 1
4) Suppose I have a line:
libvirtd_opts="-d"
I want to insert something, to libvirtd_opts="-d -l"
5) Suppose I have a line:
tcp_port = "16059"
I just want to change it, to tcp_port = "16509"
How can I use sed to do all of them in a text file? I only know how to relace it with a whole line. For example, I will open the file and remember the line of the words, then use sed s/a/b/g to replace a with b. That's my best knowledge, I wonder if there is better way to achieve this? For instance, search the line by keywords and replace part of this line?
For 1)
echo "#listen_tcp = 1" | sed '/listen/s/#//g'
listen_tcp = 1
For 2)
echo "#listen_tcp = 1" | sed '/listen/s/1/0/g'
#listen_tcp = 0
For 3)
echo "#listen_tcp" | sed 's/#listen_tcp/& = 1/g'
#listen_tcp = 1
For 4)
echo 'libvirtd_opts="-d"' | sed 's/libvirtd_opts="-d/& -l/g'
libvirtd_opts="-d -l"
For 5)
echo 'tcp_port = "16059"' | sed '/tcp_port/s/16059/16509/g'
tcp_port = "16509"

Bash using first lines counted output to add whitespace to following lines

I have this code here
printf '$request1 = "select * from whatever where this = that and active = 1 order by something asc";\n'
| perl -pe 's/select/SELECT/gi ; s/from/\n FROM/gi ; s/where/\n WHERE/gi ; s/and/\n AND/gi ; s/order by/\n ORDER BY/gi ; s/asc/ASC/gi ; s/desc/DESC/gi ;'
| awk '{gsub(/\r/,"");printf "%s\n%d",$0,length($0)}'
it produce output like this currently
$request1 = "SELECT *
22 FROM whatever
17 WHERE this = that
24 AND active = 1
21 ORDER BY something ASC";
I would like to take the count of the first line (22) and add that amount of whitespace to each additional line.
Assuming that you don't want to print the numbers, change your AWK command to:
awk 'NR == 1 {pad = length($0); print} NR > 1 {gsub(/\r/,""); printf "%*s%s\n", pad, " ", $0}'
Output:
$request1 = "SELECT *
FROM whatever
WHERE this = that
AND active = 1
ORDER BY something ASC";

Resources