awk replace string with newline - shell

I have never used awk or sed. I am trying to replace
aaa
{
with
aaa
{
bbb
I tried different solutions using sed/awk, but couldn't figure it out.
awk '{gsub("aaa\n{", "aaa\n{\tbbb")}1' file.txt
Could you please help me on how to do it.

With awk:
awk '{print} /^aaa$/{i=NR} /^{$/ && NR==i+1 {print "\tbbb"}' File
Output:
aaa
{
bbb
sdjdhsjdhdsd
ds
ddsdsdsd
aaa
{
bbb

This might work for you (GNU sed):
sed '/^aaa$/!b;n;/^{$/a\\tbbb' file

awk with getline
awk '/aaa/ {print;getline} /{/ { print ;print "\tbbb";next}1 ' file

Related

How to get lines from the last match to the end of file?

Need to print lines after the last match to the end of file. The number of matches could be anything and not definite. I have some text as shown below.
MARKER
aaa
bbb
ccc
MARKER
ddd
eee
fff
MARKER
ggg
hhh
iii
MARKER
jjj
kkk
lll
Output desired is
jjj
kkk
lll
Do I use awk with RS and FS to get the desired output?
You can actually do it with awk (gawk) without using any pipe.
$ awk -v RS='(^|\n)MARKER\n' 'END{printf "%s", $0}' file
jjj
kkk
lll
Explanations:
You define your record separator as (^|\n)MARKER\n via RS='(^|\n)MARKER\n', by default it is the EOL char
'END{printf "%s", $0}' => at the end of the file, you print the whole line, as RS is set at (^|\n)MARKER\n, $0 will include all the lines until EOF.
Another option is to use grep (GNU):
$ grep -zoP '(?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z' file
jjj
kkk
lll
Explanations:
-z to use the ASCII NUL character as delimiter
-o to print only the matching
-P to activate the perl mode
PCRE regex: (?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z explained here https://regex101.com/r/RpQBUV/2/
Last but not least, the following sed approach can also been used:
sed -n '/^MARKER$/{n;h;b};H;${x;p}' file
jjj
kkk
lll
Explanations:
n jump to next line
h replace the hold space with the current line
H do the same but instead of replacing, append
${x;p} at the end of the file exchange (x) hold space and pattern space and print (p)
that can be turned into:
tac file | sed -n '/^MARKER$/q;p' | tac
if we use tac.
Could you please try following.
tac file | awk '/MARKER/{print val;exit} {val=(val?val ORS:"")$0}' | tac
Benefit of this approach will be awk will just read last block of the Input_file(which will be actually first block for awk after tac prints it reverse)and exit after that.
Explanation:
tac file | ##Printing Input_file in reverse order.
awk '
/MARKER/{ ##Searching for a string MARKER in a line of Input_file.
print val ##Printing variable val here. Because we need last occurrence of string MARKER,which has become first instance after reversing the Input_file.
exit ##Using exit to exit from awk program itself.
}
{
val=(val?val ORS:"")$0 ##Creating variable named val whose value will be keep appending to its own value with a new line to get values before string MARKER as per OP question.
}
' | ##Sending output of awk command to tac again to make it in its actual form, since tac prints it in reverse order.
tac ##Using tac to make it in correct order(lines were reversed because of previous tac).
You can try Perl as well
$ perl -0777 -ne ' /.*MARKER(.*)/s and print $1 ' input.txt
jjj
kkk
lll
$
This might work for you (GNU sed):
sed -nz 's/.*MARKER.//p' file
This uses greed to delete all lines upto and including the last occurrence of MARKER.
Simplest to remember:
tac fun.log | sed "/MARKER/Q" | tac
This awk solution would work with any version of awk on any OS:
awk '/^MARKER$/ {s=""; next} {s = s $0 RS} END {printf "%s", s}' file
jjj
kkk
lll

Print all lines between two patterns, exclusive, first instance only (in sed, AWK or Perl) [duplicate]

This question already has answers here:
How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
(9 answers)
Closed 3 years ago.
Using sed, AWK (or Perl), how do you print all lines between (the first instance of) two patterns, exclusive of the patterns?1
That is, given as input:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
Or possibly even:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
I would expect, in both cases:
bbb
ccc
ddd
1 A number of users voted to close this question as a duplicate of this one. In the end, I provided a gist that proves they are different. The question is also superficially similar to a number of others, but there is no exact match, and none of them are of high quality, and, as I believe that this specific problem is the one most commonly faced, it deserves a clear formulation, and a set of correct, clear answers.
If you have GNU sed (tested using version 4.7 on Mac OS X), the simplest solution could be:
sed '0,/PATTERN1/d;/PATTERN2/Q'
Explanation:
The d command deletes from line 1 to the line matching /PATTERN1/ inclusive.
The Q command then exits without printing on the first line matching /PATTERN2/.
If the file has only once instance of the pattern, or if you don't mind extracting all of them, and you want a solution that doesn't depend on a GNU extension, this works:
sed -n '/PATTERN1/,/PATTERN2/{//!p}'
Explanation:
Note that the empty regular expression // repeats the last regular expression match.
With awk (assumes that PATTERN1 and PATTERN2 are always present in pairs and either of them do not occur inside a pair)
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
$ awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}' ip.txt
bbb
ccc
ddd
/PATTERN1/{f=1} set flag if /PATTERN1/ is matched
/PATTERN2/{exit} exit if /PATTERN2/ is matched
f; print input line if flag is set
Generic solution, where the block required can be specified
$ awk -v b=1 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
bbb
ccc
ddd
$ awk -v b=2 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
2
46
This might work for you (GNU sed);
sed -n '/PATTERN1/{:a;n;/PATTERN2/q;p;$!ba}' file
This prints only the lines between the first set of delimiters, or if the second delimiter does not exist, to the end of the file.
I attempted twice to answer, but the questions switched hold/duplicate statuses..
Borrowing input from #Sundeep and adding the answer which I shared in the question comments.
Using awk
awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' file
with Perl
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } '
Results:
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
PATTERN1
2
46
PATTERN2
xyz
$
$ awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' ip.txt
bbb
ccc
ddd
$ perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } ' ip.txt
bbb
ccc
ddd
$
To make it generic
awk here y is the input
awk -v x=0 -v y=2 ' /PATTERN1/ { x++;next } /PATTERN2/ { if(x==y) exit } x==y ' ip.txt
2
46
perl check ++$x against the occurence.. here it is 2
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if ++$x==2 } ' ip.txt
2
46
Adding more solutions(possible ways here, for fun :) and not at all claiming that these are better than usual ones) All tested and written in GNU awk. Also tested with given examples only.
1st Solution:
awk -v RS="" -v FS="PATTERN2" -v ORS="" '$1 ~ /\nPATTERN1\n/{sub(/.*PATTERN1\n/,"",$1);print $1}' Input_file
2nd solution:
awk -v RS="" -v ORS="" 'match($0,/PATTERN1[^(PATTERN2)]*/){val=substr($0,RSTART,RLENGTH);gsub(/^PATTERN1\n|^$\n/,"",val);print val}' Input_file
3rd solution:
awk -v RS="" -v OFS="\n" -v ORS="" 'sub(/PATTERN2.*/,"") && sub(/.*PATTERN1/,"PATTERN1"){$1=$1;sub(/^PATTERN1\n/,"")} 1' Input_file
In all above codes output will be as follows.
bbb
ccc
ddd
Using GNU sed:
sed -nE '/PATTERN1/{:s n;/PATTERN2/q;p;bs}'
-n will prune all but lines between PATTERN1 and PATTERN2 including both, because there will be p printout command.
every sed range check if it's true will execute only one the next, so {} grouping is mandated..
Drop PATTERN1 by n command (means next), if reach the first PATTERN2 outrightly quit otherwise print the line then and continue the next line within that boundary.

Splitting content of file and make it in order

I have a file like so:
{A{AAA} B{BBB} test {CCC CCC
}}
{E{EEE} F{FFF} test {GGG GGG
}}
{H{HHH} I{III} test {JJJ -JJJ
}}
{K{KKK} L{LLL} test {MMM
}}
Updated
I want to use linux commands in order to have the following output:
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM
Using gnu-awk you can do this:
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{for (i=1; i<=NF; i++) {
gsub(/[{}]|\n/, "", $i); printf "%s%s", $i, (i<NF)?OFS:ORS}}' file
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM
-v RS='}}' will break each record using }} text
-v FPAT='{[^{}]+(}|\n)' will split field using given regex. Regex matches each field that starts with { and matches anything but { and } followed by } or a newline.
-v OFS=':' sets output field separator as :
gsub(/[{}]|\n/, "", $i) removes { or } or newline from each field
Shorter command (thanks to JoseRicardo):
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{$1=$1} gsub(/[{}]|\n/, "")' file
or even this:
awk -v FPAT='{[^{}]{2,}' -v OFS=':' '{$1=$1} gsub(/[{}]/, "")' file
Perl solution
perl -nwe 'print join ":", /{([^{}]{2,})/g' file
The regular expression extracts groups of 2 or more non-curlies following a curlie, they are then printed separated with colons.
for this specific format
sed -n 's/...//;s/}[^{]*//g;s/{/:/gp' YourFile

Unix Bash - print field values matching pattern

Say I have this in file, (FIX Message)
35=D|11=ABC|52=123456|33=AA|44=BB|17=CC
35=D|33=ABC|11=123456|44=ZZ|17=EE|66=YY
I want to grep and print only the values after 11= and 17=, output like this.
ABC|CC
123456|EE
How do I achieve this?
Whenever there's name=value pairs in the input I find it useful for clarity, future enhancements, etc. to create a name2value array and then use that to print the values by name:
$ cat tst.awk
BEGIN { FS="[|=]"; OFS="|" }
{
delete n2v
for (i=1; i<=NF; i+=2) {
n2v[$i] = $(i+1)
}
print n2v[11], n2v[17]
}
$ awk -f tst.awk file
ABC|CC
123456|EE
Through sed,
$ sed 's/.*\b11=\([^|]*\).*\b17=\([^\|]*\).*/\1|\2/g' file
ABC|CC
123456|EE
Through grep and paste.
$ grep -oP '\b11=\K[^|]*|\b17=\K[^|]*' file | paste -d'|' - -
ABC|CC
123456|EE
Here is another awk
awk -F"11=|17=" '{for (i=2;i<NF;i++) {split($i,a,"|");printf "%s|",a[1]}split($i,a,"|");print a[1]}' file
ABC|CC
123456|EE

How can I extract a subset from a column/field using awk?

I wondered how can I extract a subset from a column/field using awk?
Here is the input file test.txt:
aaa bbb ccc=0.7707;ddd=0.21
I would like to be able to extract figure "0.21" from the 3rd column, and output it with the 1st and 2nd columns:
aaa bbb 0.21
I have tried and used the code below but failed:
awk 'BEGIN { OFS = "\t" } { $4 = /^ddd=(+\d)/ ; print $1,$2,$4 }' test.txt
Please help!
Many thanks,
TP
You can specify multiple delimiters using the -F flag or setting FS in the BEGIN block. For example:
echo "aaa bbb ccc=0.7707;ddd=0.21" | awk -F "[ =]" '{ print $1, $2, $NF }'
Results:
aaa bbb 0.21
You could use gsub:
awk 'BEGIN { OFS = "\t" } { gsub(/.*=/, "", $3); print $1,$2,$3 }' text.txt
For your input, it'd give:
aaa bbb 0.21
Another awk
awk '{split($3,a,"=");print $1,$2,a[3]}'
aaa bbb 0.21

Resources