Unix Bash - print field values matching pattern - bash

Say I have this in file, (FIX Message)
35=D|11=ABC|52=123456|33=AA|44=BB|17=CC
35=D|33=ABC|11=123456|44=ZZ|17=EE|66=YY
I want to grep and print only the values after 11= and 17=, output like this.
ABC|CC
123456|EE
How do I achieve this?

Whenever there's name=value pairs in the input I find it useful for clarity, future enhancements, etc. to create a name2value array and then use that to print the values by name:
$ cat tst.awk
BEGIN { FS="[|=]"; OFS="|" }
{
delete n2v
for (i=1; i<=NF; i+=2) {
n2v[$i] = $(i+1)
}
print n2v[11], n2v[17]
}
$ awk -f tst.awk file
ABC|CC
123456|EE

Through sed,
$ sed 's/.*\b11=\([^|]*\).*\b17=\([^\|]*\).*/\1|\2/g' file
ABC|CC
123456|EE
Through grep and paste.
$ grep -oP '\b11=\K[^|]*|\b17=\K[^|]*' file | paste -d'|' - -
ABC|CC
123456|EE

Here is another awk
awk -F"11=|17=" '{for (i=2;i<NF;i++) {split($i,a,"|");printf "%s|",a[1]}split($i,a,"|");print a[1]}' file
ABC|CC
123456|EE

Related

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

sed - extract data between 2 words from different parts of .txt by combining 3 sed commands

I have multiple .txt with info like this:
"commercial_name":"THE OUTBACK","contact_name":"JEFF","contact_person":"MANAGER","working_place"
there is a lot of garbage behind and after the given sentence.
I want to get results like this:
THE OUTBACK,JEFF,MANAGER
All in the same line for each .txt file, but jump line for the next .txt.
I am doing with 3 different sed commands
sed -n 's:.*"commercial_name"\(.*\)"contact_name".*:\1:p' *.txt
sed -n 's:.*"contact_name"\(.*\)"contact_person".*:\1:p' *.txt
sed -n 's:.*"contact_person"\(.*\)"working_place".*:\1:p' *.txt
even if I combine these 3, the result is:
:"THE OUTBACK",
-all commercial names 1 line for each .txt
:"JEFF",
-all contact names 1 line for each .txt
:"MANAGER",
-all contact person 1 line for each .txt
I want to extract all the info in the same line:
THE OUTBACK,JEFF,MANAGER
then the info for the next .txt in the next line
and so on.
You may use this awk:
awk 'BEGIN {
FS=OFS=","
}
{
gsub(/"/, "")
for(i=1; i<=NF; ++i) {
if (split($i, entry, ":") == 2)
map[entry[1]] = entry[2]
}
print map["commercial_name"], map["contact_name"], map["contact_person"]
}' file
THE OUTBACK,JEFF,MANAGER
With awk
we set FS and OFS separately:
awk -v FS=',|:' -v OFS=',' '{print $2,$4,$6}' file
"THE OUTBACK","JEFF","MANAGER"
and gsub for removing double quotes:
awk -v FS=',|:' -v OFS=',' '{gsub(/"/, "")} {print $2,$4,$6}' file
THE OUTBACK,JEFF,MANAGER
This code:
why printing $2,$4,$6?
Ed Morton gives a detail explication here:
converting regex to sed or grep regex
Using Ed's code, you can see it with for
awk -v FS=',|:' -v OFS=',' '{gsub(/"/, "")} {for (i=1; i<=NF;i++) print "Record", NR, "Field", i, ": " $i;}{print RT}' file
Record,1,Field,1,: commercial_name
Record,1,Field,2,: THE OUTBACK
Record,1,Field,3,: contact_name
Record,1,Field,4,: JEFF
Record,1,Field,5,: contact_person
Record,1,Field,6,: MANAGER
Record,1,Field,7,: working_place
In this case, we are interested in fields 2, 4 and 6:
{print $2,$4,$6}
--

delete all line after a specific date

I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318
quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile
If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.
ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318
With awk please try:
awk -F, '$2<=20031231' input.csv

shell script to get key value

I have content like
key1="value1" key2="value2"
key1="value11" key2="value22"
key1="value111" key2="value222"
I want to output like
value1
value11
value111
i.e basically values for key one
but when I grep the entire line will be shown, I tried using cut still could not get expected result, can some help me to write scrip for this please
Using awk you can search for given key like this:
awk -v s="key1" -F '[= "]+' '{for (i=1; i<NF; i+=2) if ($i==s) print $(i+1)}' file
value1
value11
value111
awk -v s="key2" -F '[= "]+' '{for (i=1; i<NF; i+=2) if ($i==s) print $(i+1)}' file
value2
value22
value222
Using cut itself:
cut -d \" -f 2 < File
Set " as delimiter and extract the 2nd field. Hope it helps.
Another similar solution with awk:
awk -F\" '{print $2}' File
Using grep -P:
$ grep -oP '(?<=key1=")[^"]*' file
value1
value11
value111

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

Resources