Using one awk output into another awk command - bash

I have one file (excel file) which has some columns (not fixed, changes dynamically) and I need to get values for couple of particular columns. I'm able to get the columns using one awk command and then printing rows using these columns numbers into another awk command. Is there any way I can combine into one?
awk -F',' ' {for(i=1;i < 9;i++) {if($i ~ /CLIENT_ID/) {print i}}} {for(s=1;s < 2;s++) {if($s ~ /SEC_DESC/) {print s}}} ' <file.csv> | awk -F "," '!($5~/...[0-9]L/ && $21~/FUT /) {print $0}' <file.csv>
Gives me output as 5 and 9 for columns (client_idandsec_desc`), which is their column number (this changes with different files).
Now using this column number, I get the desired output as follows:
awk -F "," '!($5~/...[0-9]L/ && $21~/FUT /) {print $0}' <file.csv>
How can I combine these into one command? Pass a variable from the first to the second?
Input (csv file having various dynamic columns, interested in following two columns)
CLIENT_ID SEC_DESC
USZ256 FUT DEC 16 U.S.
USZ256L FUT DEC 16 U.S. BONDS
WNZ256 FUT DEC 16 CBX
WNZ256L FUT DEC 16 CBX BONDS
Output give me rows- 2 and 4 that matched my regex pattern in second awk command (having column numbers as 5 & 21). These column numbers changes as per file so first have to get the column number using first awl and then giving it as input to second awk.

I think I got it.
awk -F',' '
NR == 1 {
for (i=1; i<=NF; ++i) {
if ($i == "CLIENT_ID") cl_col = i
if ($i == "SEC_DESC") sec_col = i
}
}
NR > 1 && !($cl_col ~ /...[0-9]L/ && $sec_col ~ /FUT /) {print $0}
' RED_FUT_TST.csv

To solve your problem you can test when you're processing the first row, and put the logic to discover the column numbers there. Then when you are processing the data rows, use the column numbers from the first step.
(NR is an awk built-in variable containing the record number being processed. NF is the number of columns.)
E.g.:
$ cat red.awk
NR == 1 {
for (i=1; i<=NF; ++i) {
if ($i == "CLIENT_ID") cl_col = i;
if ($i == "SEC_DESC") sec_col = i;
}
}
NR > 1 && $cl_col ~ /...[0-9]L/ && $sec_col ~ /FUT /
$ awk -F'\t' -f red.awk RED_FUT_TST.csv
USZ256L FUT DEC 16 U.S. BONDS
WNZ256L FUT DEC 16 CBX BONDS

Related

awk script to find the average of a particular row

Is it possible with awk to find a average of a certain row ?
For example the txt file (average.txt) contains:
2 5 10
1 5 5
1 5 10
So I want to find only first row average: 5,666667.
I tried to do it this way:
awk 'NR==1 {sum+=NF} END {print(sum/NF)}' average.txt
but the output is wrong: 1
I want to explain what your code is actually doing. NF built-in variable is holding number of files in current line thus for file average.txt
2 5 10
1 5 5
1 5 10
code
awk 'NR==1 {sum+=NF} END {print(sum/NF)}' average.txt
does for fist line increase sum by number of fields (3 for provided file) and then after processing all files does print that value divided by number of fields in last line, in other words your code does compute ratio of number of fields in 1st field and number of fields in last line. If you want to know more about NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Like this:
$ awk 'NR==1{for (i=1; i<=NF; i++) sum+=$i}END{print sum/NF}' file
5.66667
You can loop through all fields and sum up all the values.
If you only want to process the first record, you can print the value directly and then exit awk.
To prevent a division by zero exception, you can check if the number of fields is > 0
awk 'NR==1 && NF>0 {for (i=1; i<=NF; i++) sum+=$i; print sum/NF; exit}' average.txt
Yet another solution. It seems useless for me to loop on every line if only the first is the one you are interested in.
Instead, just head -1 the file containing your data, like so (here, the file is called test):
head -1 test | awk '{for (i=1; i<=NF; i++) sum+=$i; print sum/NF}'
Here, the awk command is basically copy/pasta from other answers, but without all the NR=1 stuff.

Loop to create a a DF from values in bash

Im creating various text files from a file like this:
Chrom_x,Pos,Ref,Alt,RawScore,PHRED,ID,Chrom_y
10,113934,A,C,0.18943,5.682,rs10904494,10
10,126070,C,T,0.030435000000000007,3.102,rs11591988,10
10,135656,T,G,0.128584,4.732,rs10904561,10
10,135853,A,G,0.264891,6.755,rs7906287,10
10,148325,A,G,0.175257,5.4670000000000005,rs9419557,10
10,151997,T,C,-0.21169,0.664,rs9286070,10
10,158202,C,T,-0.30357,0.35700000000000004,rs9419478,10
10,158946,C,T,2.03221,19.99,rs11253562,10
10,159076,G,A,1.403107,15.73,rs4881551,10
What I am trying to do is extract, in bash, all values beetwen two values:
gawk '$6>=0 && $NF<=5 {print $0}' file.csv > 0_5.txt
And create files from 6 to 10, from 11 to 15... from 95 to 100. I was thinking in creating a loop for this with something like
#!/usr/bin/env bash
n=( 0,5,6,10...)
if i in n:
gawk '$6>=n && $NF<=n+1 {print $0}' file.csv > n_n+1.txt
and so on.
How can i convert this as a loop and create files with this specific values.
While you could use a shell loop to provide inputs to an awk script, you could also just use awk to natively split the values into buckets and write the lines to those "bucket" files itself:
awk -F, ' NR > 1 {
i=int((($6 - 1) / 5))
fname=(i*5) "_" (i+1)*5 ".txt"
print $0 > fname
}' < input
The code skips the header line (NR > 1) and then computes a "bucket index" by dividing the value in column six by five. The filename is then constructed by multiplying that index (and its increment) by five. The whole line is then printed to that filename.
To use a shell loop (and call awk 20 times on the input), you could use something like this:
for((i=0; i <= 19; i++))
do
floor=$((i * 5))
ceiling=$(( (i+1) * 5))
awk -F, -v floor="$floor" -v ceiling="$ceiling" \
'NR > 1 && $6 >= floor && $6 < ceiling { print }' < input \
> "${floor}_${ceiling}.txt"
done
The basic idea is the same; here, we're creating the bucket index with the outer loop and then passing the range into awk as the floor and ceiling variables. We're only asking awk to print the matching lines; the output from awk is captured by the shell as a redirection into the appropriate file.

Match column 1 of CSV, and then check if column 2 matches

I currently have a Bash script that scrapes particular info from access logs and writes them to a CSV in the following format:
0004F2426702,75.214.224.151,16/Apr/2020
0004F2426702,75.214.224.151,17/Apr/2020
0004F2426702,75.214.224.151,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020
00085D19F072,75.214.224.151,16/Apr/2020
00085D20A469,75.214.224.151,16/Apr/2020
0018B9FFDD58,75.214.224.151,16/Apr/2020
64167F801BF5,81.97.142.178,16/Apr/2020
64167F801BF5,95.97.142.178,18/Apr/2020
0004F2426702,80.111.224.252,19/Apr/2020
But, now I am stuck!
I want to match on column 1 (the MAC address), and then check to see if column two matches. If not, print all the lines where column 1 matched.
The purpose of this script is to spot if the source IP has changed.
Using my favorite tool, GNU datamash to do most of the work of grouping and counting the data:
$ datamash -st, -g1,2 unique 3 countunique 3 < input.csv | awk 'BEGIN {FS=OFS=","} $NF > 1 { NF--; print }'
0004F2426702,75.214.224.151,16/Apr/2020,17/Apr/2020,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020,19/Apr/2020
Pure awk:
$ awk 'BEGIN { FS = OFS = SUBSEP = "," }
{ if (++seen[$1,$2] == 1) dates[$1,$2] = $3; else dates[$1,$2] = dates[$1,$2] "," $3 }
END { for (macip in seen) if (seen[macip] > 1) print macip, dates[macip] }' input.csv
0004F2426702,75.214.224.151,16/Apr/2020,17/Apr/2020,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020,19/Apr/2020

Conditional replacement of field value in csv file

I am converting a lot of CSV files with bash scripts. They all have the same structure and the same header names. The values in the columns are variable of course. Col4 is always an integer.
Source file:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;12
Name3;Street3;City3;15
Name4;Street4;City4;10
Name5;Street5;City5;3
Now when Col4 contains a certain value, for example "10", the value has to be changed in "10 pcs" and the complete line has to be duplicated.
For every 5 pcs one line.
So you could say that the number of duplicates is the value of Col4 divided by 5 and then rounded up.
So if Col4 = 10 I need 2 duplicates and if Col4 = 12, I need 3 duplicates.
Result file:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name2;Street2;City2;... of 12
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name3;Street3;City3;... of 15
Name4;Street4;City4;... of 10
Name4;Street4;City4;... of 10
Name5;Street5;City5;3
Can anyone help me to put this in a script. Something with bash, sed, awk. These are the languages I'm familiar with. Although I'm interested in other solutions too.
Here is the awk code assuming that the input is in a file called /tmp/input
awk -F\; '$4 < 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
Explanation:
There are two rules.
First rule prints any rows where the $4 is less than 5. This will also print the header
$4 < 5 {print}
The second rule print if $4 is greater than 5. The loop runs $4/5 times:
$4 > 5 {for (i=0; i< ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}
Output:
Col1;Col2;Col3;Col4
Name1;Street1;City1;2
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name2;Street2;City2;...of 12
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name3;Street3;City3;...of 15
Name4;Street4;City4;...of 10
Name4;Street4;City4;...of 10
Name5;Street5;City5;3
The code does not handle the use case where $4 == 5. You can handle that by adding a third rule. I did not added that. But I think you got the idea.
Thanks Jay! This was just what I needed.
Here is the final awk code I'm using now:
awk -F\; '$4 == "Col4" {print}; $4 < 5 {print}; $4 == 5 {print}; $4 > 5 {for (i = 0; i < ($4/5); i++) printf "%s;%s;%s;...of %s\n",$1,$2,$3,$4}' /tmp/input
I added the rule below to print the header, because it wasn't printed
$4 == "Col4" {print}
I added the rule below to print the lines where the value is equal to 5
$4 == 5 {print}

Use AWK to filter files that have 0 value

My file contains 36 columns (tab-delimited), the first 4 columns contain names and identification info, the remaining 32 columns contain float numbers (double or 0). I want to filter out those rows that have 0 values in in the last 32 columns (also tab-delimited in the output file).
I am thinking about using this :
if ($5 != 0 && $6 !=0 && ..... $36 != 0) {print $0}
But this looks so ugly and I guess the efficiency is not high given the fact that there're 32 conditions in the if statement. Is there any efficient way to get the job done? thank you
Use a for loop:
awk '{ for (i=5; i<=NF; i++) { if ($i != 0) { print; next } } }' infile
Here is a short awk for you.
awk '/[[:space:]].*[1-9]/' file

Resources