I have never used awk or sed. I am trying to replace
aaa
{
with
aaa
{
bbb
I tried different solutions using sed/awk, but couldn't figure it out.
awk '{gsub("aaa\n{", "aaa\n{\tbbb")}1' file.txt
Could you please help me on how to do it.
With awk:
awk '{print} /^aaa$/{i=NR} /^{$/ && NR==i+1 {print "\tbbb"}' File
Output:
aaa
{
bbb
sdjdhsjdhdsd
ds
ddsdsdsd
aaa
{
bbb
This might work for you (GNU sed):
sed '/^aaa$/!b;n;/^{$/a\\tbbb' file
awk with getline
awk '/aaa/ {print;getline} /{/ { print ;print "\tbbb";next}1 ' file
Related
Need to print lines after the last match to the end of file. The number of matches could be anything and not definite. I have some text as shown below.
MARKER
aaa
bbb
ccc
MARKER
ddd
eee
fff
MARKER
ggg
hhh
iii
MARKER
jjj
kkk
lll
Output desired is
jjj
kkk
lll
Do I use awk with RS and FS to get the desired output?
You can actually do it with awk (gawk) without using any pipe.
$ awk -v RS='(^|\n)MARKER\n' 'END{printf "%s", $0}' file
jjj
kkk
lll
Explanations:
You define your record separator as (^|\n)MARKER\n via RS='(^|\n)MARKER\n', by default it is the EOL char
'END{printf "%s", $0}' => at the end of the file, you print the whole line, as RS is set at (^|\n)MARKER\n, $0 will include all the lines until EOF.
Another option is to use grep (GNU):
$ grep -zoP '(?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z' file
jjj
kkk
lll
Explanations:
-z to use the ASCII NUL character as delimiter
-o to print only the matching
-P to activate the perl mode
PCRE regex: (?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z explained here https://regex101.com/r/RpQBUV/2/
Last but not least, the following sed approach can also been used:
sed -n '/^MARKER$/{n;h;b};H;${x;p}' file
jjj
kkk
lll
Explanations:
n jump to next line
h replace the hold space with the current line
H do the same but instead of replacing, append
${x;p} at the end of the file exchange (x) hold space and pattern space and print (p)
that can be turned into:
tac file | sed -n '/^MARKER$/q;p' | tac
if we use tac.
Could you please try following.
tac file | awk '/MARKER/{print val;exit} {val=(val?val ORS:"")$0}' | tac
Benefit of this approach will be awk will just read last block of the Input_file(which will be actually first block for awk after tac prints it reverse)and exit after that.
Explanation:
tac file | ##Printing Input_file in reverse order.
awk '
/MARKER/{ ##Searching for a string MARKER in a line of Input_file.
print val ##Printing variable val here. Because we need last occurrence of string MARKER,which has become first instance after reversing the Input_file.
exit ##Using exit to exit from awk program itself.
}
{
val=(val?val ORS:"")$0 ##Creating variable named val whose value will be keep appending to its own value with a new line to get values before string MARKER as per OP question.
}
' | ##Sending output of awk command to tac again to make it in its actual form, since tac prints it in reverse order.
tac ##Using tac to make it in correct order(lines were reversed because of previous tac).
You can try Perl as well
$ perl -0777 -ne ' /.*MARKER(.*)/s and print $1 ' input.txt
jjj
kkk
lll
$
This might work for you (GNU sed):
sed -nz 's/.*MARKER.//p' file
This uses greed to delete all lines upto and including the last occurrence of MARKER.
Simplest to remember:
tac fun.log | sed "/MARKER/Q" | tac
This awk solution would work with any version of awk on any OS:
awk '/^MARKER$/ {s=""; next} {s = s $0 RS} END {printf "%s", s}' file
jjj
kkk
lll
This question already has answers here:
How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
(9 answers)
Closed 3 years ago.
Using sed, AWK (or Perl), how do you print all lines between (the first instance of) two patterns, exclusive of the patterns?1
That is, given as input:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
Or possibly even:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
I would expect, in both cases:
bbb
ccc
ddd
1 A number of users voted to close this question as a duplicate of this one. In the end, I provided a gist that proves they are different. The question is also superficially similar to a number of others, but there is no exact match, and none of them are of high quality, and, as I believe that this specific problem is the one most commonly faced, it deserves a clear formulation, and a set of correct, clear answers.
If you have GNU sed (tested using version 4.7 on Mac OS X), the simplest solution could be:
sed '0,/PATTERN1/d;/PATTERN2/Q'
Explanation:
The d command deletes from line 1 to the line matching /PATTERN1/ inclusive.
The Q command then exits without printing on the first line matching /PATTERN2/.
If the file has only once instance of the pattern, or if you don't mind extracting all of them, and you want a solution that doesn't depend on a GNU extension, this works:
sed -n '/PATTERN1/,/PATTERN2/{//!p}'
Explanation:
Note that the empty regular expression // repeats the last regular expression match.
With awk (assumes that PATTERN1 and PATTERN2 are always present in pairs and either of them do not occur inside a pair)
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
$ awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}' ip.txt
bbb
ccc
ddd
/PATTERN1/{f=1} set flag if /PATTERN1/ is matched
/PATTERN2/{exit} exit if /PATTERN2/ is matched
f; print input line if flag is set
Generic solution, where the block required can be specified
$ awk -v b=1 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
bbb
ccc
ddd
$ awk -v b=2 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
2
46
This might work for you (GNU sed);
sed -n '/PATTERN1/{:a;n;/PATTERN2/q;p;$!ba}' file
This prints only the lines between the first set of delimiters, or if the second delimiter does not exist, to the end of the file.
I attempted twice to answer, but the questions switched hold/duplicate statuses..
Borrowing input from #Sundeep and adding the answer which I shared in the question comments.
Using awk
awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' file
with Perl
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } '
Results:
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
PATTERN1
2
46
PATTERN2
xyz
$
$ awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' ip.txt
bbb
ccc
ddd
$ perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } ' ip.txt
bbb
ccc
ddd
$
To make it generic
awk here y is the input
awk -v x=0 -v y=2 ' /PATTERN1/ { x++;next } /PATTERN2/ { if(x==y) exit } x==y ' ip.txt
2
46
perl check ++$x against the occurence.. here it is 2
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if ++$x==2 } ' ip.txt
2
46
Adding more solutions(possible ways here, for fun :) and not at all claiming that these are better than usual ones) All tested and written in GNU awk. Also tested with given examples only.
1st Solution:
awk -v RS="" -v FS="PATTERN2" -v ORS="" '$1 ~ /\nPATTERN1\n/{sub(/.*PATTERN1\n/,"",$1);print $1}' Input_file
2nd solution:
awk -v RS="" -v ORS="" 'match($0,/PATTERN1[^(PATTERN2)]*/){val=substr($0,RSTART,RLENGTH);gsub(/^PATTERN1\n|^$\n/,"",val);print val}' Input_file
3rd solution:
awk -v RS="" -v OFS="\n" -v ORS="" 'sub(/PATTERN2.*/,"") && sub(/.*PATTERN1/,"PATTERN1"){$1=$1;sub(/^PATTERN1\n/,"")} 1' Input_file
In all above codes output will be as follows.
bbb
ccc
ddd
Using GNU sed:
sed -nE '/PATTERN1/{:s n;/PATTERN2/q;p;bs}'
-n will prune all but lines between PATTERN1 and PATTERN2 including both, because there will be p printout command.
every sed range check if it's true will execute only one the next, so {} grouping is mandated..
Drop PATTERN1 by n command (means next), if reach the first PATTERN2 outrightly quit otherwise print the line then and continue the next line within that boundary.
I have a file like so:
{A{AAA} B{BBB} test {CCC CCC
}}
{E{EEE} F{FFF} test {GGG GGG
}}
{H{HHH} I{III} test {JJJ -JJJ
}}
{K{KKK} L{LLL} test {MMM
}}
Updated
I want to use linux commands in order to have the following output:
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM
Using gnu-awk you can do this:
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{for (i=1; i<=NF; i++) {
gsub(/[{}]|\n/, "", $i); printf "%s%s", $i, (i<NF)?OFS:ORS}}' file
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM
-v RS='}}' will break each record using }} text
-v FPAT='{[^{}]+(}|\n)' will split field using given regex. Regex matches each field that starts with { and matches anything but { and } followed by } or a newline.
-v OFS=':' sets output field separator as :
gsub(/[{}]|\n/, "", $i) removes { or } or newline from each field
Shorter command (thanks to JoseRicardo):
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{$1=$1} gsub(/[{}]|\n/, "")' file
or even this:
awk -v FPAT='{[^{}]{2,}' -v OFS=':' '{$1=$1} gsub(/[{}]/, "")' file
Perl solution
perl -nwe 'print join ":", /{([^{}]{2,})/g' file
The regular expression extracts groups of 2 or more non-curlies following a curlie, they are then printed separated with colons.
for this specific format
sed -n 's/...//;s/}[^{]*//g;s/{/:/gp' YourFile
Say I have this in file, (FIX Message)
35=D|11=ABC|52=123456|33=AA|44=BB|17=CC
35=D|33=ABC|11=123456|44=ZZ|17=EE|66=YY
I want to grep and print only the values after 11= and 17=, output like this.
ABC|CC
123456|EE
How do I achieve this?
Whenever there's name=value pairs in the input I find it useful for clarity, future enhancements, etc. to create a name2value array and then use that to print the values by name:
$ cat tst.awk
BEGIN { FS="[|=]"; OFS="|" }
{
delete n2v
for (i=1; i<=NF; i+=2) {
n2v[$i] = $(i+1)
}
print n2v[11], n2v[17]
}
$ awk -f tst.awk file
ABC|CC
123456|EE
Through sed,
$ sed 's/.*\b11=\([^|]*\).*\b17=\([^\|]*\).*/\1|\2/g' file
ABC|CC
123456|EE
Through grep and paste.
$ grep -oP '\b11=\K[^|]*|\b17=\K[^|]*' file | paste -d'|' - -
ABC|CC
123456|EE
Here is another awk
awk -F"11=|17=" '{for (i=2;i<NF;i++) {split($i,a,"|");printf "%s|",a[1]}split($i,a,"|");print a[1]}' file
ABC|CC
123456|EE
I wondered how can I extract a subset from a column/field using awk?
Here is the input file test.txt:
aaa bbb ccc=0.7707;ddd=0.21
I would like to be able to extract figure "0.21" from the 3rd column, and output it with the 1st and 2nd columns:
aaa bbb 0.21
I have tried and used the code below but failed:
awk 'BEGIN { OFS = "\t" } { $4 = /^ddd=(+\d)/ ; print $1,$2,$4 }' test.txt
Please help!
Many thanks,
TP
You can specify multiple delimiters using the -F flag or setting FS in the BEGIN block. For example:
echo "aaa bbb ccc=0.7707;ddd=0.21" | awk -F "[ =]" '{ print $1, $2, $NF }'
Results:
aaa bbb 0.21
You could use gsub:
awk 'BEGIN { OFS = "\t" } { gsub(/.*=/, "", $3); print $1,$2,$3 }' text.txt
For your input, it'd give:
aaa bbb 0.21
Another awk
awk '{split($3,a,"=");print $1,$2,a[3]}'
aaa bbb 0.21