awk: using escape characters in for statement - for-loop

I have the following awk command which works perfectly:
awk 'BEGIN { ORS = " " } {print 22, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file
What it does is insert two columns: the value 22 as the first, and the nr of the row as the third (see this question).
I tried running this command in a for loop as follows:
for p in 20 21 22
do
awk 'BEGIN { ORS = " " } {print $p, \$1, NR; for(i=2;i<=NF;++i) print \$i}{print "\n"}' file > file$p
done
So instead of the 22 in the first column, 3 files are made where each file has 20, 21 or 22 as the first column. Unfortunately, this doesn't work and I get the message: ^ backslash not last character on line.
It probably has something to do with how I escaped the $ characters, but I don't know how else to do it... any ideas?

You can assign a shell-parameter to awk using
-v parameter=value
i.e. use
awk -v p=$p ...
Like this:
for p in 20 21 22
do
awk -v p=$p 'BEGIN { ORS = " " } [TOO LONG LINE]' file > file$p
done
Complete command:
for p in 20 21 22; do awk -v p=$p 'BEGIN { ORS = " " } {print p, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file > file$p; done

You are in the right direction. Because your awk script is within ' (single quotes), bash doesn't replace $p with its value in the script (See Advanced Bash-Scripting Guide - Quoting).
You need to do one of the following:
1 (Recommened):
for p in 20 21 22
do
awk -v p=$p 'BEGIN { ORS = " " } {print p, \$1, NR; for(i=2;i<=NF;++i) print \$i}{print "\n"}' file > file$p
done
2: Use " instead of ' and within the awk escape the " by \"

Related

AWK: increment a field based on values from previous line

Given the following input for AWK:
10;20;20
8;41;41
15;52;52
How could I increase/decrease the values so that:
$1 = remains unchanged
$2 = $2 of previous line + $1 of previous line + 1
$3 = $3 of previous line + $1 of previous line + 1
So the desired output would be:
10;20;20
8;31;31
15;40;40
I need to auto-increment and loop over the lines,
using associative arrays, but it's confusing for me.
Surely, this doesn't work as desired:
#!/bin/awk -f
BEGIN { FS = ";" }
{
print ln, st, of
ln=$1
st=$2 + ln + 1
of=$3 + ln + 1
}
with awk
awk -F";" -v OFS=";"
'NR!=1{ $2=a[2]+a[1]+1; $3=a[3]+a[1]+1 } { split($0,a,FS) } 1' file
split the line to an array and when processing the next line we can use the values stored.
test
10;20;20
8;31;31
15;40;40
Following awk may help you in same.
awk -F";" '
FNR==1{
val=$1;
val1=$2;
val2=$3;
print;
next
}
{
$2=val+val1+1;
$3=val+val2+1;
print;
val=$1;
val1=$2;
val2=$3;
}' OFS=";" Input_file
For your given Input_file, output will be as follows.
10;20;20
8;31;31
15;40;40
awk 'BEGIN{
FS = OFS = ";"
}
FNR>1{
$2 = p2 + p1 + 1
$3 = p3 + p1 + 1
}
{
p1=$1; p2=$2; p3=$3
}1
' infile
Input:
$ cat infile
10;20;20
8;41;41
15;52;52
Output:
awk 'BEGIN{FS=OFS=";"}FNR>1{$2=p2+p1+1; $3=p3+p1+1 }{p1=$1; p2=$2; p3=$3}1' infile
10;20;20
8;31;31
15;40;40
Or store only fields of your interest
awk -v myfields="2,3" '
BEGIN{
FS=OFS=";";
split(myfields,t,/,/)
}
{
for(i in t)
{
if(FNR>1)
{
$(t[i]) = a[t[i]] + a[1] + 1
}
a[t[i]] = $(t[i])
}
a[1] = $1
}1' infile

AWK to display a column based on Column name and remove header and last delimiter

Id,responseId,name,test1,test2,bcid,stype
213,A_123456,abc,test,zzz,987654321,alpha
412,A_234566,xyz,test,xxx,897564322,gama
125,A_456314,ttt,qa,yyy,786950473,delta
222,A_243445,hds,test,fff,643528290,alpha
456,A_466875,sed,test,hhh,543819101,beta
I want to extract columns responseId, and bcid from above. I found an answer which is really close
awk -F ',' -v cols=responseID,bcid '(NR==1){n=split(cols,cs,",");for(c=1;c<=n;c++){for(i=1;i<=NF;i++)if($(i)==cs[c])ci[c]=i}}{for(i=1;i<=n;i++)printf "%s" FS,$(ci[i]);printf "\n"}' <file_name>
however, it prints "," in the end and the header as shown below.
responseId,bcid,
A_123456,987654321,
A_234566,897564322,
A_456314,786950473,
A_243445,643528290,
A_466875,543819101,
How can I make it to not print the header and the "," after bcid??
Input
$ cat infile
Id,responseId,name,test1,test2,bcid,stype
213, A_123456, abc, test, zzz, 987654321, alpha
412, A_234566, xyz, test, xxx, 897564322, gama
125, A_456314, ttt, qa, yyy, 786950473, delta
222, A_243445, hds, test, fff, 643528290, alpha
456, A_466875, sed, test, hhh, 543819101, beta
Script
$ cat byname.awk
FNR==1{
split(header,h,/,/);
for(i=1; i in h; i++)
{
for(j=1; j<=NF; j++)
{
if(tolower(h[i])==tolower($j)){ d[i]=j; break }
}
}
next
}
{
for(i=1; i in h; i++)
printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");
print "";
}
How to execute ?
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" -f byname.awk infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
One-liner
$ awk -v FS=, -v OFS=, -v header="responseID,bcid" 'FNR==1{split(header,h,/,/);for(i=1; i in h; i++){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){ d[i]=j; break }}}next}{for(i=1; i in h; i++)printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"");print "";}' infile
A_123456, 987654321
A_234566, 897564322
A_456314, 786950473
A_243445, 643528290
A_466875, 543819101
try:
awk '{NR==1?FS=",":FS=", ";$0=$0} {print $2 OFS $(NF-1)}' OFS=, Input_file
Checking if line is 1st line then making delimiter as "," and other lines making field separator as ", " then printing the 2nd field and 2nd last field. Setting OFS(output field separator) as ,

Use AWK to print FILENAME to CSV

I have a little script to compare some columns inside a bunch of CSV files.
It's working fine, but there are some things that are bugging me.
Here is the code:
FILES=./*
for f in $FILES
do
cat -v $f | sed "s/\^A/,/g" > op_tmp.csv
awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' op_tmp.csv >> output.csv
rm op_tmp.csv
done
Just to explain:
I get all files on the directory, then i use CAT to replace the divisor ^A for a Pipe |.
Then i use the awk onliner to compare the columns i need and print the result to a output.csv.
But now i want to print the filename before every loop.
I tried using the cat sed and awk in the same line and printing the $FILENAME, but it doesn't work:
cat -v $f | sed "s/\^A/,/g" | awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' > output.csv
Can anyone help?
You can rewrite the whole script better, but assuming it does what you want for now just add
echo $f >> output.csv
before awk call.
If you want to add filename in every awk output line, you have to pass it as an argument, i.e.
awk ... -v fname="$f" '{...; print fname... etc
A rewrite:
for f in ./*; do
awk -F '\x01' -v OFS="|" '
BEGIN {
letter[1]="A"; letter[2]="C"; letter[3]="R"; letter[4]="P"; letter[5]="T"
letters["A"] = letters["C"] = letters["R"] = letters["P"] = letters["T"] = 1
}
NR == 1 {next}
$9 in letters {
count[$9,$8] += $7
seen[$8]
}
END {
print FILENAME
for (i in seen) {
sum = 0
for (j=1; j<=4; j++) {
print i, letter[j], count[letter[j],i]
sum += count[letter[j],i]
}
print i, "T", count["T",i], (count["T",i] == sum ? "ERROR" : "MATCHED")
}
}
' "$f"
done > output.csv
Notes:
your method of iterating over files will break as soon as you have a filename with a space in it
try to reduce duplication as much as possible.
newlines are free, use them to improve readability
improve your variable names i, n, etc -- here "letter" and "letters" could use improvement to hold some meaning about those symbols.
awk has a FILENAME variable (here's the actual answer to your question)
awk understands \x01 to be a Ctrl-A -- I assume that's the field separator in your input files
define an Output Field Separator that you'll actually use
If you have GNU awk (version ???) you can use the ENDFILE block and do away with the shell for loop altogether:
gawk -F '\x01' -v OFS="|" '
BEGIN {...}
FNR == 1 {next}
$9 in letters {...}
ENDFILE {
print FILENAME
for ...
# clean up the counters for the next file
delete count
delete seen
}
' ./* > output.csv

using awk to print every second field

I have the following input file and I wish to print every second field:
A=1=B=2=C=3
To get the following output:
1 2 3
I have tried:
awk 'BEGIN {FS="="; OFS=" "} {for (i=2; i<=NF; i+=2); print ($i) }' input_file
and it clearly doesn't work. I think I have the for loop portion correct, but something is wrong with my print portion.
Thank you.
$ awk -v RS== -v ORS=" " '0==NR%2' input_file
1 2 3
How it works
-v RS==
Set the input record separator to =.
-v ORS=" "
Set the output record separator to a space.
0==NR%2
Print every other line.
NR is the line number. NR%2 is the line number modulo 2. Thus, the condition 0==NR%2 is true on every other line. When the condition is true, the action is performed. Since no action is specified, the default action is performed which is to print the record.
Alternative
The key issue in the original code was a misplaced semicolon. Consider:
for (i=2; i<=NF; i+=2); print ($i)
In this case, the print command is executed only after the for loop exits.
Try:
$ awk 'BEGIN {FS="="; OFS=" "} {for (i=2; i<=NF; i+=2)print $i }' input_file
1
2
3
Or, if you want the output on one line:
$ awk 'BEGIN {FS="="} {for (i=2; i<=NF; i+=2)printf "%s ", $i; print "" }' input_file
1 2 3

Multiple condition in nawk command

I have the nawk command where I need to format the data based on the length .All the time I need to keep first 6 digit and last 4 digit and make xxxx in the middle. Can you help in fine tuning the below script
#!/bin/bash
FILES=/export/home/input.txt
cat $FILES | nawk -F '|' '{
if (length($3) >= 13 )
print $1 "|" $2 "|" substr($3,1,6) "xxxxxx" substr($3,13,4) "|" $4"|" $5
else
print $1 "|" $2 "|" $3 "|" $4 "|" $5"|
}' > output.txt
done
input.txt
"2"|"X"|"A"|"ST"|"245552544555201"|"1111-11-11"|75.00
"6"|"Y"|"D"|"VT"|"245652544555200"|"1111-11-11"|95.00
"5"|"X"|"G"|"ST"|"3445625445552023"|"1111-11-11"|75.00
"3"|"Y"|"S"|"VT"|"24532254455524"|"1111-11-11"|95.00
output.txt
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"
Try this:
$ awk '
BEGIN {FS = OFS = "|"}
length($5)>=13 {
fld5=$5
start = substr($5,1,7)
end = substr($5,length($5)-4)
gsub(/./,"x",fld5)
sub(/^......./,start,fld5)
sub(/.....$/,end,fld5)
$1=$2; $2=$4; $3=$5; $4=fld5; NF-=3;
}1' file
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"

Resources