Bash Script with AWK output to CSV - bash

I had amazing help on an AWK script here and thought to myself it would be really cool to have the exact same output I am monitoring on the CLI to go to a CSV file. I did research and found a great answer here, it basically showed code like this:
awk '{print $1","$2","$3","$4","$5}' < /tmp/file.txt > /tmp/file.csv
The first issue I have is /tmp/file.txt is not needed as my code is already producing the string with separated values. I don't know if my variables would work without running all new AWK commands, so I would prefer to just tag it to the end of the previous AWK command if possible. But I don't know how to implement the same concept within the actual script I am using. Could anyone show me the formatting schema I would need to tag this into the end of my script?
My ever-evolving script looks like this:
#!/bin/bash
CURRENT_DATE=`date +%Y-%m-%d`
tail -fn0 /var/log/pi-star/MMDVM-"$CURRENT_DATE".log | gawk '
match($0, /received.*voice header from ([[:alnum:]]+) to ([[:alnum:]]+
[0-9]+)/, a) {
in_record = 1
call_sign = a[1]
channel = a[2]
}
in_record && match($0, /DMR ID: ([0-9]+)/, a) {
dmr_id = a[1]
}
in_record && match($0, /([0-9.]+) seconds, ([0-9]+)% packet loss, BER:
([0-9.]+)%/, a) {
in_record = 0
print call_sign, channel, dmr_id, a[1], a[2], a[3]
}
' OFS=,
done
I still want to monitor via the terminal, I just think the appended output to CSV would be the icing on the cake. Am I overthinking it? Should it just be a separate script? If so, how?

After posting the question with a better description on another thread someone responded with a correct answer. He said that basically what I was seeing is awk buffering output when it's going to a pipeline (since that's lower-overhead), but writing it immediately when it's going to a TTY. He went on to offer a solution by calling fflush() from the awk program.
"Call fflush(), after your print command, add an extra command fflush()."
That fixed it. Thank you all for your efforts.

Related

Is there a way to take an input that behaves like a file in bash?

I have a task where I'm given an input of the format:
4
A CS 22 M
B ECE 23 M
C CS 23 F
D CS 22 F
as the user input from the command line. From this, we have to perform tasks like determine the number of male and female students, determine which department has the most students, etc. I have done this using awk with the input as a file. Is there any way to do this with a user input instead of a file?
Example of a command I used for a file (where the content in the file is in the same format):
numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l) #list number of males
Not Reproducible
It works fine for me, so your problem can't be reproduced with either GNU or BSD awk under Bash 5.0.18(1). With your posted code and file sample:
$ numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l)
$ echo $numberofmales
2
Check to make sure you don't have problems in your input file, or elsewhere in your code.
Also, note that if you call awk without a file argument or input from a pipe, it tries to collect data from standard input. It may not actually be hanging; it's probably just waiting on end-of-file, which you can trigger with CTRL+D.
Recommended Improvements
Even if your code works, it can be improved. Consider the following, which skips the unnecessary field-separator definition and performs all the actions of your pipeline within awk.
males=$(
awk 'tolower($4)=="m" {count++}; END {print count}' file.txt
)
echo "$males"
Fewer moving parts are often easier to debug, and can often be more performant on large datasets. However, your mileage may vary.
User Input
If you want to use user input rather than a file, you can use standard input to collect your data, and then pass it as a quoted argument to a function. For example:
count_males () {
awk 'tolower($4)=="m" {count++}; END {print count}' <<< "$*"
}
echo "Enter data (CTRL-D when done):"
data=$(cat -)
# If at command prompt, wait until EOF above before
# pasting this line. Won't matter in scripts.
males=$(count_males "$data")
The result is now stored in males, and you can echo "$males" or make use of the variable in whatever other way you like.
Bash indeed does not care whether a file handle is connected to standard input or to a file, and neither does Awk.
However, if you want to pass the same input to multiple Awk instances, it really does make sense to store it in a temporary file.
A better overall solution is to write a better Awk script so you only need to read the input once.
awk 'NF > 1 { ++a[$4] } END { for (g in a) print g, a[g] }'
Demo: https://ideone.com/0ML7Xk
The NF > 1 condition is to skip the silly first line. Probably don't put that information there in the first place and let Awk figure out how many lines there are; it's probably better at counting than you are anyway.

Prevent awk from interpeting variable contents?

I'm attempting to parse a make -n output to make sure only the programs I want to call are being called. However, awk tries to interpret the contents of the output and run (?) it. Errors are something like awk: fatal: Cannot find file 'make'. I have gotten around this by saving the output as a temporary file and then reading that into awk. However, I'm sure there's a better way; any suggestions?
EDIT: I'm using the output later in my script and would like to avoid saving a file to increase speed if possible.
Here's what isn't working:
my_input=$(make -n file)
my_lines=$(echo $my_input | awk '/bin/ { print $1 }') #also tried printf and cat
Here's what works but obviously takes longer than it has to because of writing the file:
make -n file > temp
my_lines=$(awk '/bin/ { print $1 }' temp)
Many thanks for your help!
You can directly parse the output when it is generated by the following command and save the result in a file.
make -n file | grep bin > result.out
If you really want to go for an overkill awk solution, change your second line in the following way:
my_lines="$(awk '/bin/ { print }' temp)"

fast alternative to grep file multiple times?

I currently use long piped bash commands to extract data from text files like this, where $f is my file:
result=$(grep "entry t $t " $f | cut -d ' ' -f 5,19 | \
sort -nk2 | tail -n 1 | cut -d ' ' -f 1)
I use a script that might do hundreds of similar searches of $f ,sorting selected lines in various ways depending on what I'm pulling out. I like one-line bash strings with a bunch of pipes because its compact and easy, but it can take forever. Can anyone suggest a faster alternative? Maybe something that loads the whole file into memory first?
Thanks
You might get a boost with doing the whole pipe with gawk or another awk that has asorti by doing:
contents="$(cat "$f")"
result="$(awk -vpattern="entry t $t" '$0 ~ pattern {matches[$5]=$19} END {asorti(matches,inds); print inds[1]}' <<<"$contents")"
This will read "$f" into a variable then we'll use a single awk command (well, gawk anyway) to do all the rest of the work. Here's how that works:
-vpattern="entry t $t": defines an awk variable named pattern that contains the shell variable t
$0 ~ pattern matches the current line against the pattern, if it matches we'll do the part in the braces, otherwise we skip it
matches[$5]=$19 adds an entry to an array (and creates the array if needed) where the key is the 5th field and the value is the 19th
END do the following function after all the input has been processed
asorti(matches,inds) sort the entries of matches such that the inds is an array holding the order of the keys in matches to get the values in sorted order
print inds[1] prints the index in matches (i.e., a $5 from before) associated with the lowest 19th field
<<<"$contents" have awk work on the value in the shell variable contents as though it were a file it was reading
Then you can just update the pattern for each, not have to read the file from disk each time and not need so many extra processes for all the pipes.
You'll have to benchmark to see if it's really faster or not though, and if performance is important you really should think about moving to a "proper" language instead of shell scripting.
Since you haven't provided sample input/output this is just a guess and I only post it because there's other answers already posted that you should not do, so - this may be what you want instead of that one line:
result=$(awk -v t="$t" '
BEGIN { regexp = "entry t " t " " }
$0 ~ regexp {
if ( ($6 > maxKey) || (maxKey == "") ) {
maxKey = $6
maxVal = $5
}
}
END { print maxVal }
' "$f")
I suspect your real performance issue, however, isn't that script but that you are running it and maybe others inside a loop that you haven't shown us. If so, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice and post a better example so we can help you.

how can I supply bash variables as fields for print in awk

I currently am trying to use awk to rearrange a .csv file that is similar to the following:
stack,over,flow,dot,com
and the output would be:
over,com,stack,flow,dot
(or any other order, just using this as an example)
and when it comes time to rearrange the csv file, I have been trying to use the following:
first='$2'
second='$5'
third='$1'
fourth='$3'
fifth='$4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' '{print $a,$b,$c,$d,$e}' somefile.csv
with the intent of awk/print interpreting the $a,$b,$c,etc as field numbers, so it would come out to the following:
{print $2,$5,$1,$3,$4}
and print out the fields of the csv file in that order, but unfortunately I have not been able to get this to work correctly yet. I've tried several different methods, this seeming like the most promising, but unfortunately have not been able to get any solution to work correctly yet. Having said that, I was wondering if anyone could possibly give any suggestions or point out my flaw as I am stumped at this point in time, any help would be much appreciated, thanks!
Use simple numbers:
first='2'
second='5'
third='1'
fourth='3'
fifth='4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' \
'{print $a, $b, $c, $d, $e}' somefile.csv
Another way with a shorter example:
aa='$2'
bb='$1'
cc='$3'
awk -F '^|,|$' "{print $aa,$bb,$cc}" somefile.csv
You already got the answer to your specific question but have you considered just specifying the order as a string instead of each individual field? For example:
order="2 5 1 3 4"
awk -v order="$order" '
BEGIN{ FS=OFS=","; n=split(order,a," ") }
{ for (i=1;i<n;i++) printf "%s%s",$(a[i]),OFS; print $(a[i]) }
' somefile.csv
That way if you want to add/delete fields or change the order you just trivially rearrange the numbers in the first line instead of having to mess with a bunch of hard-coded variables, etc.
Note that I changed your FS as there was no need for it to be that complicated. Also, you don't need the shell variable, "order",you could just populate the awk variable of the same name explicitly, I just started with the shell variable since you had started with shell variables so maybe you have a reason.

IP address and Country on same line with AWK

I'm looking for a one-liner that based on a list of IPs will append the country from where the IP is based
So if I have this as and input:
87.229.123.33
98.12.33.46
192.34.55.123
I'd like to produce this:
87.229.123.33 - GB
98.12.33.46 - DE
192.34.55.123 - US
I've already got a script that returns the country for IP but I need to glue it all together with awk, so far this is waht I came up with:
$ get_ips | nawk '{ print $1; system("ip2country " $1) }'
This is all cool but the ip and the country are not displayed on the same line, how can I merge the system output and the ip on one line ?
If you have a better way of doing this, I'm open to suggestions.
You can use printf instead of print:
{ printf("%s - ", $1); system("ip2country " $1); }
The proper one-liner solution in awk is:
awk '{printf("%s - ", $1) ; system("ip2country \"" $1 "\"")}' < inputfile
However I think it would be much faster if You would use a python program looking like that:
#!/usr/bin/python
# 'apt-get install python-geoip' if needed
import GeoIP
gi = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE)
for line in file("ips.txt", "r"):
line = line[:-1] # strip the last from the line
print line, "-", gi.country_code_by_addr(line)
As You can see, the geoip object is initialized only once and then it is reused for all queries. See a python binding for geoip. Also be aware that Your awk solution forks a new process 2 times per line!
I don't know how many entries You need to process, but if it's much of it, You should consider something that doesn't fork and keeps the geoip database in memory.
I'll answer with a perl one-liner because I know that syntax better than awk. The "chomp" will cut off the newline that is bothering you.
get_ips | perl -ne 'chomp; print; print `ip2country $_`'

Resources