Use awk to convert GPS Position to Latitude & Longitude - bash

I am writing a bash script that renames files based on EXIF headers. exiftool returns the following GPS Position string, which I need to format into Latitude/Longitude coordinates for use with Google Maps API.
GPS Position : 40 deg 44' 49.36" N, 73 deg 56' 28.18" W
Google Maps:,-073.9411611
This is my code
awk -v FS="[ \t]" '{print $0,substr($1,length($1),1)substr($2,length($2),1)}' $1 \
| sed 's/\xc2\xb0\([0-9]\{1,2\}\).\([NEWS]\)/ \1 0 \2/g;s/\xc2\xb0\([NEWS]\)/ 0 0 \1/g;s/[^0-9NEWS]/ /g' \
| awk '{if ($9=="NE") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,$5+$6/60+$7/3600)} \
else if ($9=="NW") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,-($5+$6/60+$7/3600))} \
else if ($9=="SE") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),$5+$6/60+$7/3600)} \
else if ($9=="SW") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),-($5+$6/60+$7/3600))}}'
I’m getting this error:
sed: RE error: illegal byte sequence
What I need is a valid awk command to strip the “deg” and NSEW text, and divide by 3600 and 60 per this post:
how to convert gps north and gps west to lat/long in objective c
40 deg 44' 49.36" N, 73 deg 56' 28.18" W > 40.7470444,-073.9411611
Please help!

For this particular case, I would write it as below if special characters were a problem. Ofcourse it has the disadvantage that error checks would not be as stringent.
# remove the string till the colon and characters other than numbers, dots, spaces and NSEW
sed 's!^.*:\|[^0-9\. NSEW]!!g' filename |
# calculate the latitude and longitude with some error checks
awk '/^\s*([0-9.]+\s+){3}[NS]\s+([0-9.]+\s+){3}[EW]\s*$/ {
lat=($1+$2/60+$3/3600); if ($4 == "S") lat=-lat;
lon=($5+$6/60+$7/3600); if ($8 == "W") lon=-lon;
printf("%.4f,%.4f\n", lat, lon); next }
{ print "Error on line " NR; exit 1 }'

Here it is in PHP:
$parts = explode(" ",str_replace(array("deg",",","'","\""),"",$argv[1]));
$lat_deg = $parts[0];
$lat_min = $parts[1];
$lat_sec = $parts[2];
$lat_dir = $parts[3];
$lon_deg = $parts[4];
$lon_min = $parts[5];
$lon_sec = $parts[6];
$lon_dir = $parts[7];
if ($lat_dir == "N") {
$lat_sin = "+";
} else {
$lat_sin = "-";
if ($lon_dir == "N") {
$lon_sin = "+";
} else {
$lon_sin = "-";
$latitiude = $lat_sin.($lat_deg+($lat_min/60)+($lat_sec/3600));
$longitude = $lon_sin.($lon_deg+($lon_min/60)+($lon_sec/3600));
echo substr($latitiude,0,-5).",".substr($longitude,0,-5);


Textdeskew bad array subscript

I am getting the above error when using Freds ImageMagick Textdeskew Script The error looks like this:
awk: line 38: syntax error at or near *
/home/work/textdeskew: line 468: regression_Arr: bad array subscript
/home/work/textdeskew: line 474: regression_Arr: bad array subscript
The lines the errors fall on look like this:
angle=`echo ${regression_Arr[rnum-1]} | cut -d: -f2` line 468
# set rotation to be correct for -90<=angle<90 (+90 will be upside downs)
rotation=`convert xc: -format "%[fx:$angle<0?-($angle+90):-($angle-90)]" info:`
rotation=`convert xc: -format "%[fx:abs($rotation)<0.00001?0:$rotation]" info:`
# remove outliers, if res_ave > res_thresh
res_ave=`echo ${regression_Arr[rnum-7]} | cut -d: -f2` line 474
Im assuming the error is because rnum is 0. But im unsure on how to read and debug the script to resolve the error as this may not even be the case. Here is where rnum and regression_Arr are declared:
regression_Arr=(`echo "${Arr[*]}" | awk \
'BEGIN { FS = ","; RS = " "; pi = atan2(0, -1); }
NF == 2 { x_sum += $1
y_sum += $2
xy_sum += $1*$2
x2_sum += $1*$1
y2_sum += $2*$2
num += 1
x[NR] = $1
y[NR] = $2
END { mean_x = x_sum / num
mean_y = y_sum / num
for (i = 1; i <= num; i++) {
delx = (x[i]-mean_x)
dely = (y[i]-mean_y)
numerator += delx*dely
denominator += dely*dely - delx*delx
phi = 0.5*atan2(-2*numerator,denominator)
r = mean_x*cos(phi)+mean_y*sin(phi)
if ( sqrt(phi*phi) < 0.0001 ) {
angle = -90
else {
slope = -cos(phi)/sin(phi)
inter = r/sin(phi)
angle = (180/pi)*atan2(slope,1)
for (j = 1; j <= num; j++) {
delr = (x[j]*cos(phi)+y[j]*sin(phi)-r)
res_sq = delr*delr
sum_res_sq += res_sq
res = sqrt(delr*delr)
sum_res += res
print "Residual"j":"res
res_ave = sum_res/num
res_std = sqrt((sum_res_sq/num)-(sum_res/num)**2)
print "res_ave:"res_ave
print "res_std:"res_std
print "phi:"phi
print "r:"r
print "Slope:"slope
print "Intercept:"inter
print "Angle:"angle
if $debug; then
echo ""
echo "rnum=$rnum;"
# list regression data
for ((ii=0; ii<rnum; ii++)); do
echo "${regression_Arr[$ii]}"
I wonder if this script used to work and now doesnt due to updates in the code?
This was fixed by installing gawk and following a new error bc. Huge thanks to #fmw42 for helping through this on the ImageMagick forums.

AWK performance tune for writing multiple files

gawk -v ff=${fileB} '
/^1017/ { print $0 >> ff; next; }
!(/^#/||/^1016/||/^1018/||/^1013/||/^1014/||/^1013/||/^1014/) {
f=substr($0,11,2)".csv"; print $0 >>"../../" f;
' ${csvfiles}
The big file contains various 20 million lines.and we have to read each line if it starts with 1017 it will be printed in fileB irrespective of line content
if it starts not starting with the skip list above(1016,1013..) it will be written in file, where the filename is taken from the line content. for example the line
1010,abcdefg,123453,343,3434, written in fg.csv. we do substring and take the fg from the second column.
The problem is the performance is like 35k lines per second. is it possible to make it faster?
sample input
Exclusion List 1016 1013 ..
Include line number 1010,1017...
sample output
Try this:
gawk -v ff="$fileB" '
!/^(#|10(1[6834]|24|55))/{ print > (/^1017/ ? ff : "../../" substr($0,20,2) ".csv") }
' "$csvfiles"
This MAY speed things up if all the time is being spent on file opens/closes:
awk '!/^(#|10(1[6834]|24|55))/{print substr($0,20,2), $0}' "$csvfiles" |
sort -t ' ' |
awk -v ff="$fileB" '
curr = substr($0,1,2)
str = substr($0,3)
if ( index(str,"1017") == 1 ) {
out = ff
else if ( curr != prev ) {
out = "../../" curr ".csv"
prev = curr
print str > out
' "$csvfiles"
I'm really not sure if it'll be any faster but it might be due to the simpler regexp at least it's concise.

How to use awk or anything else to number of shared x values of 2 different y values in a csv file consists of column a and b?

Let me be specific. We have a csv file consists of 2 columns x and y like this:
And we want to count how many shared x values two y values have, which means we want to get this:
And b6,a2,2 should not show up. Does anyone know how to do this by awk? Or anything else?
Thx ahead!
Try this executable awk script:
#!/usr/bin/awk -f
NR==1 { print "y1" OFS "y2" OFS "share" }
NR>1 {last=a[$1]; a[$1]=(last!=""?last",":"")$2}
for(i in a) {
cnt = split(a[i], arr, FS)
if( cnt>1 ) {
for(k=1;k<cnt;k++) {
for(i=2;i<=cnt;i++) {
if( arr[k] != arr[i] ) {
key=arr[k] OFS arr[i]
if(out[key]=="") {order[++ocnt]=key}
for(i=1;i<=ocnt;i++) {
print order[i] OFS out[order[i]]
When put into a file called awko and made executable, running it like awko data yields:
I'm assuming the file is sorted by y values in the second column as in the question( after the header ). If it works for you, I'll add some explanations tomorrow.
Additionally for anyone who wants more test data, here's a silly executable awk script for generating some data similar to what's in the question. Makes about 10K lines when run like gen.awk.
#!/usr/bin/awk -f
function randInt(max) {
return( int(rand()*max)+1 )
a[1]="a"; a[2]="b"; a[3]="c"; a[4]="d"; a[5]="e"; a[6]="f"
a[7]="g"; a[8]="h"; a[9]="i"; a[10]="j"; a[11]="k"; a[12]="l"
a[13]="m"; a[14]="n"; a[15]="o"; a[16]="p"; a[17]="q"; a[18]="r"
a[19]="s"; a[20]="t"; a[21]="u"; a[22]="v"; a[23]="w"; a[24]="x"
a[25]="y"; a[26]="z"
print "x,y"
for(i=1;i<=26;i++) {
amultiplier = randInt(1000) # vary this to change the output size
r = randInt(amultiplier)
anum = 1
for(j=1;j<=amultiplier;j++) {
if( j == r ) { anum++; r = randInt(amultiplier) }
print a[randInt(26)] randInt(5) "," a[i] anum
I think if you can get the input into a form like this, it's easy:
1h a2 b6
2e a2
4f a2 b6 c9
7v a2 c9
In fact, you don't even need the x value. You can convert this:
a2 b6
a2 b6 c9
a2 c9
Into this:
That output can be sorted and piped to uniq -c to get approximately the output you want, so we only need to think much about how to get from your input to the first and second states. Once we have those, the final step is easy.
Step one:
sort /tmp/values.csv \
| awk '
BEGIN { FS="," }
if (x != $1) {
if (x) print values
x = $1
values = $2
} else {
values = values " " $2
END { print values }
Step two:
| awk '
for (i = 1; i < NF; ++i) {
for (j = i+1; j <= NF; ++j) {
print $i "," $j
Step three:
| sort | awk '
combination = $0
print "y1,y2,share"
if (combination == $0) {
count = count + 1
} else {
if (count) print combination "," count
count = 1
combination = $0
END { print combination "," count }
This awk script does the job:
BEGIN { FS=OFS="," }
NR==1 { print "y1","y2","share" }
NR>1 { ++seen[$1,$2]; ++x[$1]; ++y[$2] }
for (y1 in y) {
for (y2 in y) {
if (y1 != y2 && !(y2 SUBSEP y1 in c)) {
for (i in x) {
if (seen[i,y1] && seen[i,y2]) {
for (key in c) {
split(key, a, SUBSEP)
print a[1],a[2],c[key]
Loop through the input, recording both the original elements and the combinations. Once the file has been processed, look at each pair of y values. The if statement does two things: it prevents equal y values from being compared and it saves looping through the x values twice for every pair. Shared values are stored in c.
Once the shared values have been aggregated, the final output is printed.
This sed script does the trick:
echo y1,y2,share
x=$(wc -l < file)
b=$(echo "$x -2" | bc)
for i in $(eval echo "{2..$b}")
var_x_1=$(sed -n ''"$i"p'' file | sed 's/,.*//')
var_y_1=$(sed -n ''"$i"p'' file | sed 's/.*,//')
a=$(echo "$i + 1" | bc)
for j in $(eval echo "{$a..$x}")
var_x_2=$(sed -n ''"$j"p'' file | sed 's/,.*//')
var_y_2=$(sed -n ''"$j"p'' file | sed 's/.*,//')
if [ "$var_x_1" = "$var_x_2" ] ; then
index=$(echo "$index + 1" | bc)
for (( k=1; k<$index; k++ ))
if [ ${array[k]} = ${array[k-1]} ] ; then
counter=$(echo "$counter + 1" | bc)
echo ${array[k-1]},$counter
if [ "$k" = $(echo "$index-1"|bc) ] && [ $counter = 1 ]; then
echo ${array[k]},$counter

Parsing iw wlan0 scan output

I wrote wlan manager script to handle open/ad-hoc/wep/wpa2 networks. Now im trying to parse iw wlan0 scan output to get nice scan feature to my script. My goal is to get output like this :
SSID channel signal encryption
wlan-ap 6 70% wpa2-psk
test 1 55% wep
What i have achived already is output like this :
$ iw wlan0 scan | grep 'SSID\|freq\|signal\|capability' | tac
SSID: Koti783
signal: -82.00 dBm
capability: ESS Privacy ShortPreamble SpectrumMgmt ShortSlotTime (0x0531)
freq: 2437
I have been trying to study bash/sed/awk but havent found yet a way to achieve what im trying. So what is good way to achieve that?
Here is my final solution based of Sudo_O answer:
$1 == "BSS" {
MAC = $2
wifi[MAC]["enc"] = "Open"
$1 == "SSID:" {
wifi[MAC]["SSID"] = $2
$1 == "freq:" {
wifi[MAC]["freq"] = $NF
$1 == "signal:" {
wifi[MAC]["sig"] = $2 " " $3
$1 == "WPA:" {
wifi[MAC]["enc"] = "WPA"
$1 == "WEP:" {
wifi[MAC]["enc"] = "WEP"
printf "%s\t\t%s\t%s\t\t%s\n","SSID","Frequency","Signal","Encryption"
for (w in wifi) {
printf "%s\t\t%s\t\t%s\t%s\n",wifi[w]["SSID"],wifi[w]["freq"],wifi[w]["sig"],wifi[w]["enc"]
$ sudo iw wlan0 scan | awk -f scan.awk
SSID Frequency Signal Encryption
netti 2437 -31.00 dBm Open
Koti783 2437 -84.00 dBm WPA
WLAN-AP 2462 -85.00 dBm WPA
it's generally bad practice to try parsing complex output of programs intended for humans to read (rather than machines to parse).
e.g. the output of iw might change depending on the language settings of the system and/or the version of iw, leaving you with a "manager" that only works on your development machine.
instead you might use the same interface that iw uses to get it's information: the library backend libnl
you might also want to have a look at the wireless-tools (iwconfig, iwlist,...) that use the libiw library.
Here is an GNU awk script to get you going that grabs the SSIDs and the channel for each unique BSS:
/^BSS / {
MAC = $2
/SSID/ {
wifi[MAC]["SSID"] = $2
/primary channel/ {
wifi[MAC]["channel"] = $NF
# Insert new block here
printf "%s\t\t%s\n","SSID","channel"
for (w in wifi) {
printf "%s\t\t%s\n",wifi[w]["SSID"],wifi[w]["channel"]
It should be easy for you to add the new blocks for signal and encryption considering all the studying you have been doing.
Save the script to file such as wifi.awk and run like:
$ sudo iw wlan0 scan | awk -f wifi.awk
The output will be in the formatted requested:
SSID channel
wlan-ap 6
test 1
Here is a simple Bash function which uses exclusively Bash internals and spawns only one sub-shell:
function iwScan() {
# disable globbing to avoid surprises
set -o noglob
# make temporary variables local to our function
local AP S
# read stdin of the function into AP variable
while read -r AP; do
## print lines only containing needed fields
[[ "${AP//'SSID: '*}" == '' ]] && printf '%b' "${AP/'SSID: '}\n"
[[ "${AP//'signal: '*}" == '' ]] && ( S=( ${AP/'signal: '} ); printf '%b' "${S[0]},";)
set +o noglob
iwScan <<< "$(iw wlan0 scan)"
-66.00,FRITZ!Box 7312
The function can be easily modified to provide additional fields by adding a necessary filter into the while read -r AP while-loop, eg:
[[ "${AP//'last seen: '*}" == '' ]] && ( S=( ${AP/'last seen: '} ); printf '%b' "${S[0]},";)
-64.00,1000,FRITZ!Box 7312
I am using such solution for openwrt:
sudo iw dev wlan0 scan | awk -f wlan_scan.awk | sort
/^BSS/ {
mac = gensub ( /^BSS[[:space:]]*([0-9a-fA-F:]+).*?$/, "\\1", "g", $0 );
/^[[:space:]]*signal:/ {
signal = gensub ( /^[[:space:]]*signal:[[:space:]]*(\-?[0-9.]+).*?$/, "\\1", "g", $0 );
/^[[:space:]]*SSID:/ {
ssid = gensub ( /^[[:space:]]*SSID:[[:space:]]*([^\n]*).*?$/, "\\1", "g", $0 );
printf ( "%s %s %s\n", signal, mac, ssid );
-62.00 c8:64:c7:54:d9:05 a
-72.00 70:72:3c:1c:af:17 b
-81.00 78:f5:fd:be:33:cb c
There is a bug in the awk script above.
The following code will not work if the SSID has spaces in the name. The received result will be the first token of the SSID name only.
$1 == "SSID:" {
wifi[MAC]["SSID"] = $2
When printing $0, $1, $2:
$0: SSID: DIRECT-82-HP OfficeJet 8700
$1: SSID:
$2: DIRECT-82-HP
One possibly solution is to take a substr of $0 which contains leading spaces, the token "SSID: " and the provided multi-token network name.
Any other suggestions?
I've taken awk code from Ari Malinen and reworked it a bit, because iw output is not stable and changes, also there are other issues like spaces in SSID. I put it on github in case if I'll change it in the future.
#!/usr/bin/env awk -f
$1 ~ /^BSS/ {
if($2 !~ /Load:/) { #< Escape "BBS Load:" line
gsub("(\\(.*|:)", "", $2)
MAC = toupper($2)
wifi[MAC]["enc"] = "OPEN"
wifi[MAC]["WPS"] = "no"
wifi[MAC]["wpa1"] = ""
wifi[MAC]["wpa2"] = ""
wifi[MAC]["wep"] = ""
$1 == "SSID:" {
# Workaround spaces in SSID
FS=":" #< Changing field separator on ":", it should be
# forbidded sign for SSID name
sub(" ", "", $2) #< remove first whitespace
wifi[MAC]["SSID"] = $2
FS=" "
$1 == "capability:" {
for(i=2; i<=NF; i++) {
if($i ~ /0x[0-9]{4}/) {
gsub("(\\(|\\))", "", $i)
if (and(strtonum($i), 0x10))
wifi[MAC]["wep"] = "WEP"
$1 == "WPA:" {
wifi[MAC]["wpa1"] = "WPA1"
$1 == "RSN:" {
wifi[MAC]["wpa2"] = "WPA2"
$1 == "WPS:" {
wifi[MAC]["WPS"] = "yes"
$1 == "DS" {
wifi[MAC]["Ch"] = $5
$1 == "signal:" {
match($2, /-([0-9]{2})\.00/, m)
wifi[MAC]["Sig"] = m[1]
$1 == "TSF:" {
gsub("(\\(|d|,)", "", $4)
match($5, /([0-9]{2}):([0-9]{2}):/, m)
day = $4
hour = m[1]
min = m[2]
wifi[MAC]["TSF"] = day"d"hour"h"min"m"
for (w in wifi) {
if (wifi[w]["wep"]) {
if (wifi[w]["wpa1"] || wifi[w]["wpa2"])
wifi[w]["enc"] = wifi[w]["wpa1"]wifi[w]["wpa2"]
wifi[w]["enc"] = "WEP"
printf "%s:%s:%s:%s:%s:%s:%s\n", w, wifi[w]["SSID"], wifi[w]["enc"], \
wifi[w]["WPS"], wifi[w]["Ch"], wifi[w]["Sig"], wifi[w]["TSF"]
if you wonder what if($2 !~ /Load:/) does, well on some routers there might be "BSS Load:" string.

Extracting multiple parts of a string using bash

I have a caret delimited (key=value) input and would like to extract multiple tokens of interest from it.
For example: Given the following input
$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"
I would like the following output
I have tried the following
$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"|egrep -o "35=[^/^]*\^|150=[^/^]*\^"
My problem is that egrep returns each match on a separate line. Is it possible to get one line of output for one line of input? Please note that due to the constraints of the larger script, I cannot simply do a blind replace of all the \n characters in the output.
Thank you for any suggestions.This script is for bash 3.2.25. Any egrep alternatives are welcome. Please note that the tokens of interest (35 and 150) may change and I am already generating the egrep pattern in the script. Hence a one liner (if possible) would be great
You have two options. Option 1 is to change the "white space character" and use set --:
IFS="^ "
set -- 1=A00^35=D^150=1^33=1 # No quotes here!!
Now you have your values in $1, $2, etc.
Or you can use an array:
tmp=$(echo "1=A00^35=D^150=1^33=1" | sed -e 's:\([0-9]\+\)=: [\1]=:g' -e 's:\^ : :g')
eval value=($tmp)
echo "35=${value[35]}^150=${value[150]}"
To get rid of the newline, you can just echo it again:
$ echo $(echo "1=A00^35=D^150=1^33=1"|egrep -o "35=[^/^]*\^|150=[^/^]*\^")
35=D^ 150=1^
If that's not satisfactory (I think it may give you one line for the whole input file), you can use awk:
pax> echo '
' | awk -vLIST=35,150 -F^ ' {
sep = "";
split (LIST, srch, ",");
for (i = 1; i <= NF; i++) {
for (idx in srch) {
split ($i, arr, "=");
if (arr[1] == srch[idx]) {
printf sep "" arr[1] "=" arr[2];
sep = "^";
if (sep != "") {
print sep;
pax> echo '
' | awk -vLIST=1,33 -F^ ' {
sep = "";
split (LIST, srch, ",");
for (i = 1; i <= NF; i++) {
for (idx in srch) {
split ($i, arr, "=");
if (arr[1] == srch[idx]) {
printf sep "" arr[1] "=" arr[2];
sep = "^";
if (sep != "") {
print sep;
This one allows you to use a single awk script and all you need to do is to provide a comma-separated list of keys to print out.
And here's the one-liner version :-)
echo '1=A00^35=D^150=1^33=1
' | awk -vLST=1,33 -F^ '{s="";split(LST,k,",");for(i=1;i<=NF;i++){for(j in k){split($i,arr,"=");if(arr[1]==k[j]){printf s""arr[1]"="arr[2];s="^";}}}if(s!=""){print s;}}'
given a file 'in' containing your strings :
$ for i in $(cut -d^ -f2,3 < in);do echo $i^;done
