IP address and Country on same line with AWK - bash

I'm looking for a one-liner that based on a list of IPs will append the country from where the IP is based
So if I have this as and input:
87.229.123.33
98.12.33.46
192.34.55.123
I'd like to produce this:
87.229.123.33 - GB
98.12.33.46 - DE
192.34.55.123 - US
I've already got a script that returns the country for IP but I need to glue it all together with awk, so far this is waht I came up with:
$ get_ips | nawk '{ print $1; system("ip2country " $1) }'
This is all cool but the ip and the country are not displayed on the same line, how can I merge the system output and the ip on one line ?
If you have a better way of doing this, I'm open to suggestions.

You can use printf instead of print:
{ printf("%s - ", $1); system("ip2country " $1); }

The proper one-liner solution in awk is:
awk '{printf("%s - ", $1) ; system("ip2country \"" $1 "\"")}' < inputfile
However I think it would be much faster if You would use a python program looking like that:
#!/usr/bin/python
# 'apt-get install python-geoip' if needed
import GeoIP
gi = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE)
for line in file("ips.txt", "r"):
line = line[:-1] # strip the last from the line
print line, "-", gi.country_code_by_addr(line)
As You can see, the geoip object is initialized only once and then it is reused for all queries. See a python binding for geoip. Also be aware that Your awk solution forks a new process 2 times per line!
I don't know how many entries You need to process, but if it's much of it, You should consider something that doesn't fork and keeps the geoip database in memory.

I'll answer with a perl one-liner because I know that syntax better than awk. The "chomp" will cut off the newline that is bothering you.
get_ips | perl -ne 'chomp; print; print `ip2country $_`'

Related

Split sentences into separate lines

I'm trying to split sentences in a file into separate lines using a shell script.
Now I would like to split the strings by !, ? or . . The output should be like this :
The file that I want to read from my_text.txt and contains
you want to learn shell script? First, you want to learn Linux command! then. you can learn shell script.
Now I would like to split the strings by " ! " or "? " or "." The output should be like this :
you want to learn shell script
First, you want to learn Linux command
then
you can learn shell script
I used this script :
while read p
do
echo $p | tr "? ! ." "\n "
done < my_text.txt
But the output is:
you want to learn shell script
First, you want to learn Linux command then you can learn shell script
Can somebody help?
This could be done in a single awk using its global substitution option as follows, written and tested with shown samples only in GNU awk. Simply globally substituting ?,!,. with new line(by default ORS(output record separator) value as new line).
awk '{gsub(/\?|!|\./,ORS)} 1' Input_file
$ sed 's/[!?.]/\n/g' file
you want to learn shell script
First, you want to learn Linux command
then
you can learn shell script
You can call 3 tr commands to split for ? ! and .
cat test_string.txt | tr "!" "\n" | tr "?" "\n" | tr "." "\n"
Awk is ideal for this:
awk -F '[?.!]' '{ for (i=1;i<=NF;i++) { print $i } }' file
Set the field delimiters to ? or . or ! and then loop through each field and print the entry.
That's not how you use tr. Both arguments to it should be of the same length, otherwise the second one is extended to length of the first by repeating its last character*—that is, in this case, a space—to make one-by-one transliteration possible. In other words, given ? ! . and \n  as arguments, tr will replace ? with a line feed, and !, , and . with a space. What you're looking for is I guess:
$ tr '?!.' '\n' <file
you want to learn shell script
First, you want to learn Linux command
then
you can learn shell script
Or, more portably:
tr '?!.' '[\n*]' <file
*This is what GNU tr does, POSIX leaves the behavior unspecified when arguments aren't of the same length.
In gnu-awk we can get it with gensub() function:
awk '{print gensub(/([.?!]\s*)/, "\n", "g", $0)}' file
you want to learn shell script
First, you want to learn Linux command
then
you can learn shell script
why limit yourself to new line \n being the RS ? Maybe something like this :
\056 is the period. \040 is space. i'll add the + in case there have
been legacy practices of typing 2 spaces after each sentence and u
wanna standardize it.
I presume question mark \044 is more frequent
than exclamation \041. Only reason why i'm using all octal is that
all those are ones that can wreck havor on a terminal when just a
slight chance of didn't quoting and escaping properly.
Unlike FS or RS, OFS/ORS are constant strings (are they?), so typing in the characters will be safe.
the periods are taken care of by RS. No need special processing. So if the row contains neither ? nor ! , just print it as is, and move on (it'll handle the ". \n" )
.
mawk 'BEGIN { RS = "[\056][\040]+" ; ORS = ". \n";
FS = "[\044][\040]+"; OFS = "? \n"; }
($0 !~ /[\041\044]/) {
print; next; }
/[\041]/ {
gsub("[\041][\040]+", "\041 \n"); }
( NF==1 ) || ( $1=$1 )'
As fast as mawk is, a gsub ( ) or $1=$1 still costs money, so skip the costly parts unless it actually has a ? or ! mark.
Last line is the fun trick, done *outside the brace brackets. You've already done the ! the line before, so if no ? found (aka NF is 1), then that one evaluates true, which awk will short circuit and not execute part 2 , simply print.
But if you've found any ? marks, the assignment of $1=$1 will re-arrange them in new order, and because it's an assignment operation not equality-compare, it always come back successful if the assignment itself didn't fail, which will also serve as it self's always-true flag to print towards the end.
Awk's record separator variable RS should do the trick.
echo 'you want to learn shell script? First, you want to learn Linux command! then. you can learn shell script.' |
awk 'BEGIN{RS="[?.!] "}1'

Bash Script with AWK output to CSV

I had amazing help on an AWK script here and thought to myself it would be really cool to have the exact same output I am monitoring on the CLI to go to a CSV file. I did research and found a great answer here, it basically showed code like this:
awk '{print $1","$2","$3","$4","$5}' < /tmp/file.txt > /tmp/file.csv
The first issue I have is /tmp/file.txt is not needed as my code is already producing the string with separated values. I don't know if my variables would work without running all new AWK commands, so I would prefer to just tag it to the end of the previous AWK command if possible. But I don't know how to implement the same concept within the actual script I am using. Could anyone show me the formatting schema I would need to tag this into the end of my script?
My ever-evolving script looks like this:
#!/bin/bash
CURRENT_DATE=`date +%Y-%m-%d`
tail -fn0 /var/log/pi-star/MMDVM-"$CURRENT_DATE".log | gawk '
match($0, /received.*voice header from ([[:alnum:]]+) to ([[:alnum:]]+
[0-9]+)/, a) {
in_record = 1
call_sign = a[1]
channel = a[2]
}
in_record && match($0, /DMR ID: ([0-9]+)/, a) {
dmr_id = a[1]
}
in_record && match($0, /([0-9.]+) seconds, ([0-9]+)% packet loss, BER:
([0-9.]+)%/, a) {
in_record = 0
print call_sign, channel, dmr_id, a[1], a[2], a[3]
}
' OFS=,
done
I still want to monitor via the terminal, I just think the appended output to CSV would be the icing on the cake. Am I overthinking it? Should it just be a separate script? If so, how?
After posting the question with a better description on another thread someone responded with a correct answer. He said that basically what I was seeing is awk buffering output when it's going to a pipeline (since that's lower-overhead), but writing it immediately when it's going to a TTY. He went on to offer a solution by calling fflush() from the awk program.
"Call fflush(), after your print command, add an extra command fflush()."
That fixed it. Thank you all for your efforts.

fast alternative to grep file multiple times?

I currently use long piped bash commands to extract data from text files like this, where $f is my file:
result=$(grep "entry t $t " $f | cut -d ' ' -f 5,19 | \
sort -nk2 | tail -n 1 | cut -d ' ' -f 1)
I use a script that might do hundreds of similar searches of $f ,sorting selected lines in various ways depending on what I'm pulling out. I like one-line bash strings with a bunch of pipes because its compact and easy, but it can take forever. Can anyone suggest a faster alternative? Maybe something that loads the whole file into memory first?
Thanks
You might get a boost with doing the whole pipe with gawk or another awk that has asorti by doing:
contents="$(cat "$f")"
result="$(awk -vpattern="entry t $t" '$0 ~ pattern {matches[$5]=$19} END {asorti(matches,inds); print inds[1]}' <<<"$contents")"
This will read "$f" into a variable then we'll use a single awk command (well, gawk anyway) to do all the rest of the work. Here's how that works:
-vpattern="entry t $t": defines an awk variable named pattern that contains the shell variable t
$0 ~ pattern matches the current line against the pattern, if it matches we'll do the part in the braces, otherwise we skip it
matches[$5]=$19 adds an entry to an array (and creates the array if needed) where the key is the 5th field and the value is the 19th
END do the following function after all the input has been processed
asorti(matches,inds) sort the entries of matches such that the inds is an array holding the order of the keys in matches to get the values in sorted order
print inds[1] prints the index in matches (i.e., a $5 from before) associated with the lowest 19th field
<<<"$contents" have awk work on the value in the shell variable contents as though it were a file it was reading
Then you can just update the pattern for each, not have to read the file from disk each time and not need so many extra processes for all the pipes.
You'll have to benchmark to see if it's really faster or not though, and if performance is important you really should think about moving to a "proper" language instead of shell scripting.
Since you haven't provided sample input/output this is just a guess and I only post it because there's other answers already posted that you should not do, so - this may be what you want instead of that one line:
result=$(awk -v t="$t" '
BEGIN { regexp = "entry t " t " " }
$0 ~ regexp {
if ( ($6 > maxKey) || (maxKey == "") ) {
maxKey = $6
maxVal = $5
}
}
END { print maxVal }
' "$f")
I suspect your real performance issue, however, isn't that script but that you are running it and maybe others inside a loop that you haven't shown us. If so, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice and post a better example so we can help you.

how can I supply bash variables as fields for print in awk

I currently am trying to use awk to rearrange a .csv file that is similar to the following:
stack,over,flow,dot,com
and the output would be:
over,com,stack,flow,dot
(or any other order, just using this as an example)
and when it comes time to rearrange the csv file, I have been trying to use the following:
first='$2'
second='$5'
third='$1'
fourth='$3'
fifth='$4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' '{print $a,$b,$c,$d,$e}' somefile.csv
with the intent of awk/print interpreting the $a,$b,$c,etc as field numbers, so it would come out to the following:
{print $2,$5,$1,$3,$4}
and print out the fields of the csv file in that order, but unfortunately I have not been able to get this to work correctly yet. I've tried several different methods, this seeming like the most promising, but unfortunately have not been able to get any solution to work correctly yet. Having said that, I was wondering if anyone could possibly give any suggestions or point out my flaw as I am stumped at this point in time, any help would be much appreciated, thanks!
Use simple numbers:
first='2'
second='5'
third='1'
fourth='3'
fifth='4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' \
'{print $a, $b, $c, $d, $e}' somefile.csv
Another way with a shorter example:
aa='$2'
bb='$1'
cc='$3'
awk -F '^|,|$' "{print $aa,$bb,$cc}" somefile.csv
You already got the answer to your specific question but have you considered just specifying the order as a string instead of each individual field? For example:
order="2 5 1 3 4"
awk -v order="$order" '
BEGIN{ FS=OFS=","; n=split(order,a," ") }
{ for (i=1;i<n;i++) printf "%s%s",$(a[i]),OFS; print $(a[i]) }
' somefile.csv
That way if you want to add/delete fields or change the order you just trivially rearrange the numbers in the first line instead of having to mess with a bunch of hard-coded variables, etc.
Note that I changed your FS as there was no need for it to be that complicated. Also, you don't need the shell variable, "order",you could just populate the awk variable of the same name explicitly, I just started with the shell variable since you had started with shell variables so maybe you have a reason.

How to parse a CSV in a Bash script?

I am trying to parse a CSV containing potentially 100k+ lines. Here is the criteria I have:
The index of the identifier
The identifier value
I would like to retrieve all lines in the CSV that have the given value in the given index (delimited by commas).
Any ideas, taking in special consideration for performance?
As an alternative to cut- or awk-based one-liners, you could use the specialized csvtool aka ocaml-csv:
$ csvtool -t ',' col "$index" - < csvfile | grep "$value"
According to the docs, it handles escaping, quoting, etc.
See this youtube video: BASH scripting lesson 10 working with CSV files
CSV file:
Bob Brown;Manager;16581;Main
Sally Seaforth;Director;4678;HOME
Bash script:
#!/bin/bash
OLDIFS=$IFS
IFS=";"
while read user job uid location
do
echo -e "$user \
======================\n\
Role :\t $job\n\
ID :\t $uid\n\
SITE :\t $location\n"
done < $1
IFS=$OLDIFS
Output:
Bob Brown ======================
Role : Manager
ID : 16581
SITE : Main
Sally Seaforth ======================
Role : Director
ID : 4678
SITE : HOME
First prototype using plain old grep and cut:
grep "${VALUE}" inputfile.csv | cut -d, -f"${INDEX}"
If that's fast enough and gives the proper output, you're done.
CSV isn't quite that simple. Depending on the limits of the data you have, you might have to worry about quoted values (which may contain commas and newlines) and escaping quotes.
So if your data are restricted enough can get away with simple comma-splitting fine, shell script can do that easily. If, on the other hand, you need to parse CSV ‘properly’, bash would not be my first choice. Instead I'd look at a higher-level scripting language, for example Python with a csv.reader.
In a CSV file, each field is separated by a comma. The problem is, a field itself might have an embedded comma:
Name,Phone
"Woo, John",425-555-1212
You really need a library package that offer robust CSV support instead of relying on using comma as a field separator. I know that scripting languages such as Python has such support. However, I am comfortable with the Tcl scripting language so that is what I use. Here is a simple Tcl script which does what you are asking for:
#!/usr/bin/env tclsh
package require csv
package require Tclx
# Parse the command line parameters
lassign $argv fileName columnNumber expectedValue
# Subtract 1 from columnNumber because Tcl's list index starts with a
# zero instead of a one
incr columnNumber -1
for_file line $fileName {
set columns [csv::split $line]
set columnValue [lindex $columns $columnNumber]
if {$columnValue == $expectedValue} {
puts $line
}
}
Save this script to a file called csv.tcl and invoke it as:
$ tclsh csv.tcl filename indexNumber expectedValue
Explanation
The script reads the CSV file line by line and store the line in the variable $line, then it split each line into a list of columns (variable $columns). Next, it picks out the specified column and assigned it to the $columnValue variable. If there is a match, print out the original line.
Using awk:
export INDEX=2
export VALUE=bar
awk -F, '$'$INDEX' ~ /^'$VALUE'$/ {print}' inputfile.csv
Edit: As per Dennis Williamson's excellent comment, this could be much more cleanly (and safely) written by defining awk variables using the -v switch:
awk -F, -v index=$INDEX -v value=$VALUE '$index == value {print}' inputfile.csv
Jeez...with variables, and everything, awk is almost a real programming language...
For situations where the data does not contain any special characters, the solution suggested by Nate Kohl and ghostdog74 is good.
If the data contains commas or newlines inside the fields, awk may not properly count the field numbers and you'll get incorrect results.
You can still use awk, with some help from a program I wrote called csvquote (available at https://github.com/dbro/csvquote):
csvquote inputfile.csv | awk -F, -v index=$INDEX -v value=$VALUE '$index == value {print}' | csvquote -u
This program finds special characters inside quoted fields, and temporarily replaces them with nonprinting characters which won't confuse awk. Then they get restored after awk is done.
index=1
value=2
awk -F"," -v i=$index -v v=$value '$(i)==v' file
I was looking for an elegant solution that support quoting and wouldn't require installing anything fancy on my VMware vMA appliance. Turns out this simple python script does the trick! (I named the script csv2tsv.py, since it converts CSV into tab-separated values - TSV)
#!/usr/bin/env python
import sys, csv
with sys.stdin as f:
reader = csv.reader(f)
for row in reader:
for col in row:
print col+'\t',
print
Tab-separated values can be split easily with the cut command (no delimiter needs to be specified, tab is the default). Here's a sample usage/output:
> esxcli -h $VI_HOST --formatter=csv network vswitch standard list |csv2tsv.py|cut -f12
Uplinks
vmnic4,vmnic0,
vmnic5,vmnic1,
vmnic6,vmnic2,
In my scripts I'm actually going to parse tsv output line by line and use read or cut to get the fields I need.
Parsing CSV with primitive text-processing tools will fail on many types of CSV input.
xsv is a lovely and fast tool for doing this properly. To search for all records that contain the string "foo" in the third column:
cat file.csv | xsv search -s 3 foo
A sed or awk solution would probably be shorter, but here's one for Perl:
perl -F/,/ -ane 'print if $F[<INDEX>] eq "<VALUE>"`
where <INDEX> is 0-based (0 for first column, 1 for 2nd column, etc.)
Awk (gawk) actually provides extensions, one of which being csv processing.
Assuming that extension is installed, you can use awk to show all lines where a specific csv field matches 123.
Assuming test.csv contains the following:
Name,Phone
"Woo, John",425-555-1212
"James T. Kirk",123
The following will print all lines where the Phone (aka the second field) is equal to 123:
gawk -l csv 'csvsplit($0,a) && a[2] == 123 {print $0}'
The output is:
"James T. Kirk",123
How does it work?
-l csv asks gawk to load the csv extension by looking for it in $AWKLIBPATH;
csvsplit($0, a) splits the current line, and stores each field into a new array named a
&& a[2] == 123 checks that the second field is 123
if both conditions are true, it { print $0 }, aka prints the full line as requested.

Resources