print multiple fields if multiple pattern matches - bash

I have a comma delimited file like below
0,category=a,type=b,value=1
1,category=c,type=b,.....,original_value=0
2,category=b,type=c,....,original_value=1,....,corrected_value=3
A line in the file can contain
(1)only 'value'
(2)only 'original_value'
(3)both 'original value' and 'corrected_value'
The values can be in any column.
The following awk command I wrote can only print one field after pattern match.
cat file | awk -F, 'BEGIN{OFS=","} /value/ { for (x=1;x<=NF;x++) if ($x~"value") {print $2,$3,$(x)} }' | sort -u
Current Output:
category=a,type=b,value=1
category=b,type=c,corrected_value=3
category=b,type=c,original_value=1
category=c,type=b,original_value=0
How do I print two fields (columns) of a line if two pattern matches occur? In this case, if both original_value and corrected_value exist.
Expected Output:
category=a,type=b,value=1
category=b,type=c,original_value=1,corrected_value=3
category=c,type=b,original_value=0
Bash Version: 4.3.11

You can use this awk command:
awk 'BEGIN{FS=OFS=","} {printf "%s%s%s", $2,OFS,$3; for(i=4; i<=NF; i++)
if ($i ~ /value/) printf "%s%s", OFS,$i; print ""}' file
category=a,type=b,value=1
category=c,type=b,original_value=0
category=b,type=c,original_value=1,corrected_value=3

Similar to #anubhava's answer, but does not rely on the category or type being in a particular column:
awk -F, '
BEGIN { pattern = "^(category|type|value|original_value|corrected_value)" }
{
sep = ""
for (i=1; i<=NF; i++) {
if ($i ~ pattern) {
printf "%s%s", sep, $i
sep = ","
}
}
print ""
}
' file

Related

Get substring of regex from string

I am using a bash script to parse logs and need to extract a substring.
My string is something like this -
TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing
How can I extract this string using bash
Would you please try the following:
awk '
/FROM image/ {
if (match($0, /image[^[:space:]]+/))
print(substr($0, RSTART, RLENGTH))
}
' logfile
The regex image[^[:space:]]+ matches a substring which starts with
image and followed by non-space character(s).
Then the awk variables RSTART and RLENGTH are assigned to the position
and the length of the matched substring.
Another awk option:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"TestingmkJHSBD,MFV FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7
Or using different log string:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"bdkjf asfjkklsdfsg FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7

Define field with FPAT

I am trying to split data into field in awk, but I cant come up with the right regex using FPAT.
I have tried:
echo 'C002 2019-06-28;16:03;approved;content=L1-34,EE;not taken;;1024 ' | awk 'BEGIN {FPAT = "([^ ]+) +[^ ]+|;"} {print "f1:"$1;print "f2:"$2;print "f3:"$3;print "f6:"$6;print "f7:"$7}'
Expected result:
f1:C002
f2:2019-06-28
f3:16:03
f6:not taken
f7:
There are no simple way to separate random space from random space.
You need to do as David writes, separate using ; and then split first field by space.
awk -F";" '{split($1,a,"[ \t]+");print "a[1]---"a[1]"\na[2]---"a[2];for (i=1;i<=NF;i++) print i"---"$i}'
a[1]---C002
a[2]---2019-06-28
1---C002 2019-06-28
2---16:03
3---approved
4---content=L1-34,EE
5---not taken
6---
7---1024
A bit similar to the answer of Jotne, but you could write a function to split the record according to your wishes:
awk 'function split_record(string,f, t,n,m) {
n=split(string,t,";"); m=split(t[1],f,"[ \t]+")
for(i=2;i<=n;++i) f[m+i-1]=t[i]
return m+n-1
}
{ split_record($0,f) }
{print "f1:"f[1];print "f2:"f[2];print "f3:"f[3];print "f6:"f[6];print "f7:"f[7]}'
This returns:
f1:C002
f2:2019-06-28
f3:16:03
f6:not taken
f7:
You can update the split record in any way you like.
awk '
BEGIN { FS=OFS=";" }
{
split($1,a,/[[:space:]]+/)
$1 = ""
$0 = a[1] FS a[2] $0
for (i=1; i<=NF; i++) {
print "f" i ":" $i
}
}
' file
f1:C002
f2:2019-06-28
f3:16:03
f4:approved
f5:content=L1-34,EE
f6:not taken
f7:
f8:1024

Remove duplicate from csv using bash / awk

I have a csv file with the format :
"id-1"|"A"
"id-2"|"C"
"id-1"|"B"
"id-1"|"D"
"id-2"|"B"
"id-3"|"A"
"id-3"|"A"
"id-1"|"B"
I want to group by first column unique id's and concat types in a single row like this:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
I found awk does a great job in handling such scenarios. But all I could achieve is this:
"id-1"|"A":"B":"D":"B"
"id-2"|"B":"C"
"id-3"|"A":"A"
I used this command:
awk -F "|" '{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS="|" file
How can I remove the duplicates and also handle the formatting of the second column types?
quick fix:
$ awk -F "|" '!seen[$0]++{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS="|" file
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
!seen[$0]++ will be true only if line was not already seen
If second column should all be within double quotes
$ awk -v dq='"' 'BEGIN{FS=OFS="|"}
!seen[$0]++{a[$1]=a[$1] ? a[$1]":"$2 : $2}
END{for (i in a){gsub(dq,"",a[i]); print i, dq a[i] dq}}' file
"id-1"|"A:B:D"
"id-2"|"C:B"
"id-3"|"A"
With GNU awk for true multi-dimensional arrays and gensub() and sorted_in:
$ awk -F'|' '
{ a[$1][gensub(/"/,"","g",$2)] }
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (i in a) {
c = 0
for (j in a[i]) {
printf "%s%s", (c++ ? ":" : i "|\""), j
}
print "\""
}
}
' file
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
The output rows and columns will both be string-sorted (i.e. alphabetically by characters) in ascending order.
Short GNU datamash + tr solution:
datamash -st'|' -g1 unique 2 <file | tr ',' ':'
The output:
"id-1"|"A":"B":"D"
"id-2"|"B":"C"
"id-3"|"A"
----------
In case if between-item double quotes should be eliminated - use the following alternative:
datamash -st'|' -g1 unique 2 <file | sed 's/","/:/g'
The output:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
For sample, input below one will work, but unsorted
One-liner
# using two array ( recommended )
awk 'BEGIN{FS=OFS="|"}!seen[$1,$2]++{a[$1] = ($1 in a ? a[$1] ":" : "") $2}END{for(i in a)print i,a[i]}' infile
# using regexp
awk 'BEGIN{FS=OFS="|"}{ a[$1] = $1 in a ? ( a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2 ) : $2}END{for(i in a)print i,a[i]}' infile
Test Results:
$ cat infile
"id-1"|"A"
"id-2"|"C"
"id-1"|"B"
"id-1"|"D"
"id-2"|"B"
"id-3"|"A"
"id-3"|"A"
"id-1"|"B"
$ awk 'BEGIN{FS=OFS="|"}!seen[$1,$2]++{a[$1] = ($1 in a ? a[$1] ":" : "") $2}END{for(i in a)print i,a[i]}' infile
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
$ awk 'BEGIN{FS=OFS="|"}{ a[$1] = $1 in a ? ( a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2 ) : $2}END{for(i in a)print i,a[i]}' infile
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
Better Readable:
Using regexp
awk 'BEGIN{
FS=OFS="|"
}
{
a[$1] =$1 in a ?(a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2):$2
}
END{
for(i in a)
print i,a[i]
}
' infile
Using two array
awk 'BEGIN{
FS=OFS="|"
}
!seen[$1,$2]++{
a[$1] = ($1 in a ? a[$1] ":" : "") $2
}
END{
for(i in a)
print i,a[i]
}' infile
Note: you can also use !seen[$0]++, it will use entire line as index, but in case in your real data, if
you want to prefer some other column, you may prefer !seen[$1,$2]++,
here column1 and column2 are used as index
awk + sort solution:
awk -F'|' '{ gsub(/"/,"",$2); a[$1]=b[$1]++? a[$1]":"$2:$2 }
END{ for(i in a) printf "%s|\"%s\"\n",i,a[i] }' <(sort -u file)
The output:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"

How to convert space separated key value data into CSV format in bash?

I am working on some data files where data is of key and value pairs that are separated by space.
The data in files is inconsistent. All the Key and values are not always present.But the keys will always be as Table, count and size.
Below example has table_name, count, size information
cat sample1.txt
Table SCOTT.TABLE1 count 3889 size 300
Table SCOTT.TABLE2 count 7744
Table SCOTT.TABLE3 count 2622
Table SCOTT.TABLE4 size 2773 count 22
Table SCOTT.TABLE5 size 21
Below file have just table_name but no count and size data.
cat sample2.txt
Table SCOTT.TABLE1
Table SCOTT.TABLE2
Table SCOTT.TABLE3
Table SCOTT.TABLE4
Table SCOTT.TABLE5
So I am trying to convert these files into CSV format using following
cat <file_name> | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
cat sample1.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," }
NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }
{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,3889,300
,,
,,
,,
And for the second sample
cat sample2.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,,
,,
,,
,,
But exepected as following:
For sample1.txt
TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21
For sample2.txt
Table,Count,Size
SCOTT.TABLE1,,
SCOTT.TABLE2,,
SCOTT.TABLE3,,
SCOTT.TABLE4,,
SCOTT.TABLE5,,
Thanks in advance.
awk to the rescue!
$ awk -v OFS=',' '{for(i=1;i<NF;i+=2)
{if(!($i in c)){c[$i];cols[++k]=$i};
v[NR,$i]=$(i+1)}}
END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS;
print "";
for(i=1;i<=NR;i++)
{for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
print ""}}' file
Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,
if you have gawk you can simplify it more with sorted-in
UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. This simplifies the problem and the following script should do the trick.
$ awk -v header='Table,count,size' \
'BEGIN{OFS=","; n=split(header,h,OFS); print header}
{for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)}
END{for(i=1; i<=NR; i++)
{printf "%s", v[i,h[1]];
for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]];
print ""}}' file
here is an inelegant but fast and comprehensible solution:
awk 'BEGIN{OFS=",";print "TABLE,count,size"}
{
t=$2
if($3=="count"){
c=$4
s=$6
}
else{
s=$4
c=$6
}
print t,c,s
}' 1.txt
output:
TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622,
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

Resources