How to print a pattern using AWK? - bash

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?

Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.

Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

Related

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

Define field with FPAT

I am trying to split data into field in awk, but I cant come up with the right regex using FPAT.
I have tried:
echo 'C002 2019-06-28;16:03;approved;content=L1-34,EE;not taken;;1024 ' | awk 'BEGIN {FPAT = "([^ ]+) +[^ ]+|;"} {print "f1:"$1;print "f2:"$2;print "f3:"$3;print "f6:"$6;print "f7:"$7}'
Expected result:
f1:C002
f2:2019-06-28
f3:16:03
f6:not taken
f7:
There are no simple way to separate random space from random space.
You need to do as David writes, separate using ; and then split first field by space.
awk -F";" '{split($1,a,"[ \t]+");print "a[1]---"a[1]"\na[2]---"a[2];for (i=1;i<=NF;i++) print i"---"$i}'
a[1]---C002
a[2]---2019-06-28
1---C002 2019-06-28
2---16:03
3---approved
4---content=L1-34,EE
5---not taken
6---
7---1024
A bit similar to the answer of Jotne, but you could write a function to split the record according to your wishes:
awk 'function split_record(string,f, t,n,m) {
n=split(string,t,";"); m=split(t[1],f,"[ \t]+")
for(i=2;i<=n;++i) f[m+i-1]=t[i]
return m+n-1
}
{ split_record($0,f) }
{print "f1:"f[1];print "f2:"f[2];print "f3:"f[3];print "f6:"f[6];print "f7:"f[7]}'
This returns:
f1:C002
f2:2019-06-28
f3:16:03
f6:not taken
f7:
You can update the split record in any way you like.
awk '
BEGIN { FS=OFS=";" }
{
split($1,a,/[[:space:]]+/)
$1 = ""
$0 = a[1] FS a[2] $0
for (i=1; i<=NF; i++) {
print "f" i ":" $i
}
}
' file
f1:C002
f2:2019-06-28
f3:16:03
f4:approved
f5:content=L1-34,EE
f6:not taken
f7:
f8:1024

Increment a value regarding a pattern in a file

I have a file like this :
"A";"1"
"A";""
"A";""
"B";"1"
"C";"1"
"C";""
"C";""
When I have the same pattern between first part of current line and previous line, I want increment the second part of my line.
like this :
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
or if second part is empty I take the previous line and I increment it.
Do you have any idea how I can do this with a shell script or maybe with awk or sed command?
With perl:
$ perl -F';' -lane 'if ($F[1] =~ /"(\d+)"/) { $saved = $1; } else { $saved++; $F[1] = qq/"$saved"/; }
print join(";", #F)' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"
With awk:
$ awk -F';' -v OFS=';' '
$2 ~ /"[0-9]+"/ { saved = substr($2, 2, length($2) - 2) }
$2 == "\"\"" { $2 = "\"" ++saved "\"" }
{ print }' example.txt
"A";"1"
"A";"2"
"A";"3"
"B";"1"
"C";"1"
"C";"2"
"C";"3"

print multiple fields if multiple pattern matches

I have a comma delimited file like below
0,category=a,type=b,value=1
1,category=c,type=b,.....,original_value=0
2,category=b,type=c,....,original_value=1,....,corrected_value=3
A line in the file can contain
(1)only 'value'
(2)only 'original_value'
(3)both 'original value' and 'corrected_value'
The values can be in any column.
The following awk command I wrote can only print one field after pattern match.
cat file | awk -F, 'BEGIN{OFS=","} /value/ { for (x=1;x<=NF;x++) if ($x~"value") {print $2,$3,$(x)} }' | sort -u
Current Output:
category=a,type=b,value=1
category=b,type=c,corrected_value=3
category=b,type=c,original_value=1
category=c,type=b,original_value=0
How do I print two fields (columns) of a line if two pattern matches occur? In this case, if both original_value and corrected_value exist.
Expected Output:
category=a,type=b,value=1
category=b,type=c,original_value=1,corrected_value=3
category=c,type=b,original_value=0
Bash Version: 4.3.11
You can use this awk command:
awk 'BEGIN{FS=OFS=","} {printf "%s%s%s", $2,OFS,$3; for(i=4; i<=NF; i++)
if ($i ~ /value/) printf "%s%s", OFS,$i; print ""}' file
category=a,type=b,value=1
category=c,type=b,original_value=0
category=b,type=c,original_value=1,corrected_value=3
Similar to #anubhava's answer, but does not rely on the category or type being in a particular column:
awk -F, '
BEGIN { pattern = "^(category|type|value|original_value|corrected_value)" }
{
sep = ""
for (i=1; i<=NF; i++) {
if ($i ~ pattern) {
printf "%s%s", sep, $i
sep = ","
}
}
print ""
}
' file

Use AWK to print FILENAME to CSV

I have a little script to compare some columns inside a bunch of CSV files.
It's working fine, but there are some things that are bugging me.
Here is the code:
FILES=./*
for f in $FILES
do
cat -v $f | sed "s/\^A/,/g" > op_tmp.csv
awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' op_tmp.csv >> output.csv
rm op_tmp.csv
done
Just to explain:
I get all files on the directory, then i use CAT to replace the divisor ^A for a Pipe |.
Then i use the awk onliner to compare the columns i need and print the result to a output.csv.
But now i want to print the filename before every loop.
I tried using the cat sed and awk in the same line and printing the $FILENAME, but it doesn't work:
cat -v $f | sed "s/\^A/,/g" | awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' > output.csv
Can anyone help?
You can rewrite the whole script better, but assuming it does what you want for now just add
echo $f >> output.csv
before awk call.
If you want to add filename in every awk output line, you have to pass it as an argument, i.e.
awk ... -v fname="$f" '{...; print fname... etc
A rewrite:
for f in ./*; do
awk -F '\x01' -v OFS="|" '
BEGIN {
letter[1]="A"; letter[2]="C"; letter[3]="R"; letter[4]="P"; letter[5]="T"
letters["A"] = letters["C"] = letters["R"] = letters["P"] = letters["T"] = 1
}
NR == 1 {next}
$9 in letters {
count[$9,$8] += $7
seen[$8]
}
END {
print FILENAME
for (i in seen) {
sum = 0
for (j=1; j<=4; j++) {
print i, letter[j], count[letter[j],i]
sum += count[letter[j],i]
}
print i, "T", count["T",i], (count["T",i] == sum ? "ERROR" : "MATCHED")
}
}
' "$f"
done > output.csv
Notes:
your method of iterating over files will break as soon as you have a filename with a space in it
try to reduce duplication as much as possible.
newlines are free, use them to improve readability
improve your variable names i, n, etc -- here "letter" and "letters" could use improvement to hold some meaning about those symbols.
awk has a FILENAME variable (here's the actual answer to your question)
awk understands \x01 to be a Ctrl-A -- I assume that's the field separator in your input files
define an Output Field Separator that you'll actually use
If you have GNU awk (version ???) you can use the ENDFILE block and do away with the shell for loop altogether:
gawk -F '\x01' -v OFS="|" '
BEGIN {...}
FNR == 1 {next}
$9 in letters {...}
ENDFILE {
print FILENAME
for ...
# clean up the counters for the next file
delete count
delete seen
}
' ./* > output.csv

Resources