Splitting content of file and make it in order - bash

I have a file like so:
{A{AAA} B{BBB} test {CCC CCC
}}
{E{EEE} F{FFF} test {GGG GGG
}}
{H{HHH} I{III} test {JJJ -JJJ
}}
{K{KKK} L{LLL} test {MMM
}}
Updated
I want to use linux commands in order to have the following output:
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM

Using gnu-awk you can do this:
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{for (i=1; i<=NF; i++) {
gsub(/[{}]|\n/, "", $i); printf "%s%s", $i, (i<NF)?OFS:ORS}}' file
AAA:BBB:CCC CCC
EEE:FFF:GGG GGG
HHH:III:JJJ -JJJ
KKK:LLL:MMM
-v RS='}}' will break each record using }} text
-v FPAT='{[^{}]+(}|\n)' will split field using given regex. Regex matches each field that starts with { and matches anything but { and } followed by } or a newline.
-v OFS=':' sets output field separator as :
gsub(/[{}]|\n/, "", $i) removes { or } or newline from each field
Shorter command (thanks to JoseRicardo):
awk -v RS='}}' -v FPAT='{[^{}]+(}|\n)' -v OFS=':' '{$1=$1} gsub(/[{}]|\n/, "")' file
or even this:
awk -v FPAT='{[^{}]{2,}' -v OFS=':' '{$1=$1} gsub(/[{}]/, "")' file

Perl solution
perl -nwe 'print join ":", /{([^{}]{2,})/g' file
The regular expression extracts groups of 2 or more non-curlies following a curlie, they are then printed separated with colons.

for this specific format
sed -n 's/...//;s/}[^{]*//g;s/{/:/gp' YourFile

Related

sed - extract data between 2 words from different parts of .txt by combining 3 sed commands

I have multiple .txt with info like this:
"commercial_name":"THE OUTBACK","contact_name":"JEFF","contact_person":"MANAGER","working_place"
there is a lot of garbage behind and after the given sentence.
I want to get results like this:
THE OUTBACK,JEFF,MANAGER
All in the same line for each .txt file, but jump line for the next .txt.
I am doing with 3 different sed commands
sed -n 's:.*"commercial_name"\(.*\)"contact_name".*:\1:p' *.txt
sed -n 's:.*"contact_name"\(.*\)"contact_person".*:\1:p' *.txt
sed -n 's:.*"contact_person"\(.*\)"working_place".*:\1:p' *.txt
even if I combine these 3, the result is:
:"THE OUTBACK",
-all commercial names 1 line for each .txt
:"JEFF",
-all contact names 1 line for each .txt
:"MANAGER",
-all contact person 1 line for each .txt
I want to extract all the info in the same line:
THE OUTBACK,JEFF,MANAGER
then the info for the next .txt in the next line
and so on.
You may use this awk:
awk 'BEGIN {
FS=OFS=","
}
{
gsub(/"/, "")
for(i=1; i<=NF; ++i) {
if (split($i, entry, ":") == 2)
map[entry[1]] = entry[2]
}
print map["commercial_name"], map["contact_name"], map["contact_person"]
}' file
THE OUTBACK,JEFF,MANAGER
With awk
we set FS and OFS separately:
awk -v FS=',|:' -v OFS=',' '{print $2,$4,$6}' file
"THE OUTBACK","JEFF","MANAGER"
and gsub for removing double quotes:
awk -v FS=',|:' -v OFS=',' '{gsub(/"/, "")} {print $2,$4,$6}' file
THE OUTBACK,JEFF,MANAGER
This code:
why printing $2,$4,$6?
Ed Morton gives a detail explication here:
converting regex to sed or grep regex
Using Ed's code, you can see it with for
awk -v FS=',|:' -v OFS=',' '{gsub(/"/, "")} {for (i=1; i<=NF;i++) print "Record", NR, "Field", i, ": " $i;}{print RT}' file
Record,1,Field,1,: commercial_name
Record,1,Field,2,: THE OUTBACK
Record,1,Field,3,: contact_name
Record,1,Field,4,: JEFF
Record,1,Field,5,: contact_person
Record,1,Field,6,: MANAGER
Record,1,Field,7,: working_place
In this case, we are interested in fields 2, 4 and 6:
{print $2,$4,$6}
--

Need to use awk to get a specific word or value after another specific word?

I need to use awk to get a specific word or value after another specific word, I tried some awk commands already but after many other filters like grep and sed. The file that I need to get the word from is having the same line more than one time like the below line:
Configuration: number=6 model=MSA SNT=4 IC=8 SIZE=16384MB NRF=24 meas=2.00
If need 24 I used
grep IC file | awk 'NF>1{print $NF}'
If need 16384MB I used
grep IC file | awk -F'SIZE=' '{ print $2 }'|awk '{ print $1 }'
We need to get any word from that line using awk? what I used can get what is needed but we still need a minimized awk command.
I am sure we can use one single awk to get the needed info from one line minimized command?
sed -r 's/.*SIZE=([^ ]+).*/\1/' input
16384MB
sed -r 's/.*NRF=([^ ]+).*/\1/' input
24
grep way :
grep -oP 'SIZE=\K[^ ]+' imput
16384MB
awk way :
awk '{for(i=1;i<=NF;i++) if($i ~ /SIZE=/) split($i,a,"=");print a[2]}' input
You could use an Awk with multi-character de-limiter as below to get this done. Loop through the fields, match the pattern you need and print the next field which contains the field value.
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
Examples,
match="number"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
6
match="model"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
MSA
match="meas"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
2.00
here is a more general approach
... | awk -v k=NRF '{for(i=2;i<=NF;i++) {split($i,a,"="); m[a[1]]=a[2]} print m[k]}'
code will stay the same just change the key k.
If you have GNU awk you could use the third parameter of match:
$ awk 'match($0,/( IC=)([^ ]*)/,a)&& $0=a[2]' file
8
Or get the meas:
$ awk 'match($0,/( meas=)([^ ]*)/,a)&& $0=a[2]' file
2.00
Should you use some other awk, you could use this combination of split, substr and match:
$ awk 'split(substr($0,match($0,/ IC=[^ ]*/),RLENGTH),a,"=") && $0=a[2]' file
8

Unix Bash - print field values matching pattern

Say I have this in file, (FIX Message)
35=D|11=ABC|52=123456|33=AA|44=BB|17=CC
35=D|33=ABC|11=123456|44=ZZ|17=EE|66=YY
I want to grep and print only the values after 11= and 17=, output like this.
ABC|CC
123456|EE
How do I achieve this?
Whenever there's name=value pairs in the input I find it useful for clarity, future enhancements, etc. to create a name2value array and then use that to print the values by name:
$ cat tst.awk
BEGIN { FS="[|=]"; OFS="|" }
{
delete n2v
for (i=1; i<=NF; i+=2) {
n2v[$i] = $(i+1)
}
print n2v[11], n2v[17]
}
$ awk -f tst.awk file
ABC|CC
123456|EE
Through sed,
$ sed 's/.*\b11=\([^|]*\).*\b17=\([^\|]*\).*/\1|\2/g' file
ABC|CC
123456|EE
Through grep and paste.
$ grep -oP '\b11=\K[^|]*|\b17=\K[^|]*' file | paste -d'|' - -
ABC|CC
123456|EE
Here is another awk
awk -F"11=|17=" '{for (i=2;i<NF;i++) {split($i,a,"|");printf "%s|",a[1]}split($i,a,"|");print a[1]}' file
ABC|CC
123456|EE

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

How to include variable to output filename using awk

There is a command which prints out to file range of values from CSV file:
date1var=mm/dd/yyyy hh:mm:ss
date2var=mm/dd/yyyy hh:mm:ss
awk -F, -v d1var="$date1var" -v d2var="$date2var" '$1 > d1var && $1 <= d2var {print $0 }' OFS=, plot_data.csv > graph1.csv
I'm just guessing if it's possible to include my variables to the output filename?
Final name of the file should be similar to:
graph_d1var-d2var.csv
Any ideas?
You can redirect the output of print command to a file name, like:
awk -F, -v d1var="$date1_var" -v d2var="$date2var" '
$1 > d1var && $1 <= d2var {
print > ("graph_" d1var "-" d2var ".csv")
}'
OFS=, plot_data.csv
This uses the values of d1var and d2var to create the name of the output file. If you want the name of the variables, surround the whole name in double quotes.
Let the shell handle it: you're starting with shell variables after all
date1var=mm/dd/yyyy hh:mm:ss
date2var=mm/dd/yyyy hh:mm:ss
awk -F, -v OFS=, -v d1var="$date1var" \
-v d2var="$date2var" \
'
# awk script is unchanged
' plot_data.csv > "graph1_${date1var}-${date2var}.csv"
#!/bin/bash
date1var="1234"
date2var="5678"
awk -F, -v d1="$date1var" -v d2="$date2var" '{print > ("graph" d1 "-" d2 ".txt")}' OFS=, plot_data.csv
Note that you can't compare date strings in awk like you are trying to do. You also have a typo, in that you have written date1_var with an underscore whereas you have used date1var without an underscore further on.
I guess the short answer is that you can print to a named file with print > "filename" and that you can concatenate (join) strings by placing them beside each other like this string2 = string1 "and" string3;

Resources