Textdeskew bad array subscript - bash

I am getting the above error when using Freds ImageMagick Textdeskew Script The error looks like this:
awk: line 38: syntax error at or near *
/home/work/textdeskew: line 468: regression_Arr: bad array subscript
/home/work/textdeskew: line 474: regression_Arr: bad array subscript
The lines the errors fall on look like this:
angle=`echo ${regression_Arr[rnum-1]} | cut -d: -f2` line 468
# set rotation to be correct for -90<=angle<90 (+90 will be upside downs)
rotation=`convert xc: -format "%[fx:$angle<0?-($angle+90):-($angle-90)]" info:`
rotation=`convert xc: -format "%[fx:abs($rotation)<0.00001?0:$rotation]" info:`
# remove outliers, if res_ave > res_thresh
res_ave=`echo ${regression_Arr[rnum-7]} | cut -d: -f2` line 474
Im assuming the error is because rnum is 0. But im unsure on how to read and debug the script to resolve the error as this may not even be the case. Here is where rnum and regression_Arr are declared:
linearRegression()
{
Arr="$1"
regression_Arr=(`echo "${Arr[*]}" | awk \
'BEGIN { FS = ","; RS = " "; pi = atan2(0, -1); }
NF == 2 { x_sum += $1
y_sum += $2
xy_sum += $1*$2
x2_sum += $1*$1
y2_sum += $2*$2
num += 1
x[NR] = $1
y[NR] = $2
}
END { mean_x = x_sum / num
mean_y = y_sum / num
for (i = 1; i <= num; i++) {
delx = (x[i]-mean_x)
dely = (y[i]-mean_y)
numerator += delx*dely
denominator += dely*dely - delx*delx
}
phi = 0.5*atan2(-2*numerator,denominator)
r = mean_x*cos(phi)+mean_y*sin(phi)
if ( sqrt(phi*phi) < 0.0001 ) {
angle = -90
}
else {
slope = -cos(phi)/sin(phi)
inter = r/sin(phi)
angle = (180/pi)*atan2(slope,1)
}
for (j = 1; j <= num; j++) {
delr = (x[j]*cos(phi)+y[j]*sin(phi)-r)
res_sq = delr*delr
sum_res_sq += res_sq
res = sqrt(delr*delr)
sum_res += res
print "Residual"j":"res
}
res_ave = sum_res/num
res_std = sqrt((sum_res_sq/num)-(sum_res/num)**2)
print "res_ave:"res_ave
print "res_std:"res_std
print "phi:"phi
print "r:"r
print "Slope:"slope
print "Intercept:"inter
print "Angle:"angle
}'`)
rnum=${#regression_Arr[*]}
if $debug; then
echo ""
echo "rnum=$rnum;"
# list regression data
for ((ii=0; ii<rnum; ii++)); do
echo "${regression_Arr[$ii]}"
done
fi
}
I wonder if this script used to work and now doesnt due to updates in the code?

This was fixed by installing gawk and following a new error bc. Huge thanks to #fmw42 for helping through this on the ImageMagick forums.

Related

How to get n random "paragraphs" (groups of ordered lines) from a file

I have a file (originally compressed) with a known structure - every 4 lines, the first line starts with the character "#" and defines an ordered group of 4 lines. I want to select randomly n groups (half) of lines in the most efficient way (preferably in bash/another Unix tool).
My suggestion in python is:
path = "origin.txt.gz"
unzipped_path = "origin_unzipped.txt"
new_path = "/home/labs/amit/diklag/subset.txt"
subprocess.getoutput("""gunzip -c %s > %s """ % (path, unzipped_path))
with open(unzipped_path) as f:
lines = f.readlines()
subset_size = round((len(lines)/4) * 0.5)
l = random.sample(list(range(0, len(lines), 4)),subset_size)
selected_lines = [line for i in l for line in list(range(i,i+4))]
new_lines = [lines[i] for i in selected_lines]
with open(new_path,'w+') as f2:
f2.writelines(new_lines)
Can you help me find another (and faster) way to do it?
Right now it takes ~10 seconds to run this code
The following script might be helpful. This is however, untested as we do not have an example file:
attempt 1 (awk and shuf) :
#!/usr/bin/env bash
count=30
path="origin.txt.gz"
new_path="subset.txt"
nrec=$(gunzip -c $path | awk '/^#/{c++}{END print c})'
awk '(NR==FNR){a[$1]=1;next}
!/^#/{next}
((++c) in a) { for(i=1;i<=4;i++) { print; getline } }' \
<(shuf -i 1-$nrec -n $count) <(gunzip -c $path) > $new_path
attempt 2 (sed and shuf) :
#!/usr/bin/env bash
count=30
path="origin.txt.gz"
new_path="subset.txt"
gunzip -c $path | sed ':a;N;$!ba;s/\n/__END_LINE__/g;s/__END_LINE__#/\n#/g' \
| shuf -n $count | sed 's/__END_LINE__/\n/g' > $new_path
In this example, the sed line will replace all newlines with the string __END_LINE__, except if it is followed by #. The shuf command will then pick $count random samples out of that list. Afterwards we replace the string __END_LINE__ again by \n.
attempt 3 (awk) :
Create a file called subset.awk containing :
# Uniform(m) :: returns a random integer such that
# 1 <= Uniform(m) <= m
function Uniform(m) { return 1+int(m * rand()) }
# KnuthShuffle(m) :: creates a random permutation of the range [1,m]
function KnuthShuffle(m, i,j,k) {
for (i = 1; i <= m ; i++) { permutation[i] = i }
for (i = 1; i <= m-1; i++) {
j = Uniform(i-1)
k = permutation[i]
permutation[i] = permutation[j]
permutation[j] = k
}
}
BEGIN{RS="\n#"; srand() }
{a[NR]=$0}
END{ KnuthShuffle(NR);
sub("#","",a[1])
for(r = 1; r <= count; r++) {
print "#"a[permutation[r]]
}
}
And then you can run :
$ gunzip -c <file.gz> | awk -c count=30 -f subset.awk > <output.txt>

Use awk to convert GPS Position to Latitude & Longitude

I am writing a bash script that renames files based on EXIF headers. exiftool returns the following GPS Position string, which I need to format into Latitude/Longitude coordinates for use with Google Maps API.
GPS Position : 40 deg 44' 49.36" N, 73 deg 56' 28.18" W
Google Maps:
https://maps.googleapis.com/maps/api/geocode/json?latlng=40.7470444,-073.9411611
This is my code
awk -v FS="[ \t]" '{print $0,substr($1,length($1),1)substr($2,length($2),1)}' $1 \
| sed 's/\xc2\xb0\([0-9]\{1,2\}\).\([NEWS]\)/ \1 0 \2/g;s/\xc2\xb0\([NEWS]\)/ 0 0 \1/g;s/[^0-9NEWS]/ /g' \
| awk '{if ($9=="NE") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,$5+$6/60+$7/3600)} \
else if ($9=="NW") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,-($5+$6/60+$7/3600))} \
else if ($9=="SE") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),$5+$6/60+$7/3600)} \
else if ($9=="SW") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),-($5+$6/60+$7/3600))}}'
I’m getting this error:
sed: RE error: illegal byte sequence
What I need is a valid awk command to strip the “deg” and NSEW text, and divide by 3600 and 60 per this post:
how to convert gps north and gps west to lat/long in objective c
40 deg 44' 49.36" N, 73 deg 56' 28.18" W > 40.7470444,-073.9411611
Please help!
For this particular case, I would write it as below if special characters were a problem. Ofcourse it has the disadvantage that error checks would not be as stringent.
# remove the string till the colon and characters other than numbers, dots, spaces and NSEW
sed 's!^.*:\|[^0-9\. NSEW]!!g' filename |
# calculate the latitude and longitude with some error checks
awk '/^\s*([0-9.]+\s+){3}[NS]\s+([0-9.]+\s+){3}[EW]\s*$/ {
lat=($1+$2/60+$3/3600); if ($4 == "S") lat=-lat;
lon=($5+$6/60+$7/3600); if ($8 == "W") lon=-lon;
printf("%.4f,%.4f\n", lat, lon); next }
{ print "Error on line " NR; exit 1 }'
Here it is in PHP:
$parts = explode(" ",str_replace(array("deg",",","'","\""),"",$argv[1]));
print_r($parts);
$lat_deg = $parts[0];
$lat_min = $parts[1];
$lat_sec = $parts[2];
$lat_dir = $parts[3];
$lon_deg = $parts[4];
$lon_min = $parts[5];
$lon_sec = $parts[6];
$lon_dir = $parts[7];
if ($lat_dir == "N") {
$lat_sin = "+";
} else {
$lat_sin = "-";
}
if ($lon_dir == "N") {
$lon_sin = "+";
} else {
$lon_sin = "-";
}
$latitiude = $lat_sin.($lat_deg+($lat_min/60)+($lat_sec/3600));
$longitude = $lon_sin.($lon_deg+($lon_min/60)+($lon_sec/3600));
echo substr($latitiude,0,-5).",".substr($longitude,0,-5);

How to use awk or anything else to number of shared x values of 2 different y values in a csv file consists of column a and b?

Let me be specific. We have a csv file consists of 2 columns x and y like this:
x,y
1h,a2
2e,a2
4f,a2
7v,a2
1h,b6
4f,b6
4f,c9
7v,c9
...
And we want to count how many shared x values two y values have, which means we want to get this:
y1,y2,share
a2,b6,2
a2,c9,2
b6,c9,1
And b6,a2,2 should not show up. Does anyone know how to do this by awk? Or anything else?
Thx ahead!
Try this executable awk script:
#!/usr/bin/awk -f
BEGIN {FS=OFS=","}
NR==1 { print "y1" OFS "y2" OFS "share" }
NR>1 {last=a[$1]; a[$1]=(last!=""?last",":"")$2}
END {
for(i in a) {
cnt = split(a[i], arr, FS)
if( cnt>1 ) {
for(k=1;k<cnt;k++) {
for(i=2;i<=cnt;i++) {
if( arr[k] != arr[i] ) {
key=arr[k] OFS arr[i]
if(out[key]=="") {order[++ocnt]=key}
out[key]++
}
}
}
}
}
for(i=1;i<=ocnt;i++) {
print order[i] OFS out[order[i]]
}
}
When put into a file called awko and made executable, running it like awko data yields:
y1,y2,share
a2,b6,2
a2,c9,2
b6,c9,1
I'm assuming the file is sorted by y values in the second column as in the question( after the header ). If it works for you, I'll add some explanations tomorrow.
Additionally for anyone who wants more test data, here's a silly executable awk script for generating some data similar to what's in the question. Makes about 10K lines when run like gen.awk.
#!/usr/bin/awk -f
function randInt(max) {
return( int(rand()*max)+1 )
}
BEGIN {
a[1]="a"; a[2]="b"; a[3]="c"; a[4]="d"; a[5]="e"; a[6]="f"
a[7]="g"; a[8]="h"; a[9]="i"; a[10]="j"; a[11]="k"; a[12]="l"
a[13]="m"; a[14]="n"; a[15]="o"; a[16]="p"; a[17]="q"; a[18]="r"
a[19]="s"; a[20]="t"; a[21]="u"; a[22]="v"; a[23]="w"; a[24]="x"
a[25]="y"; a[26]="z"
print "x,y"
for(i=1;i<=26;i++) {
amultiplier = randInt(1000) # vary this to change the output size
r = randInt(amultiplier)
anum = 1
for(j=1;j<=amultiplier;j++) {
if( j == r ) { anum++; r = randInt(amultiplier) }
print a[randInt(26)] randInt(5) "," a[i] anum
}
}
}
I think if you can get the input into a form like this, it's easy:
1h a2 b6
2e a2
4f a2 b6 c9
7v a2 c9
In fact, you don't even need the x value. You can convert this:
a2 b6
a2
a2 b6 c9
a2 c9
Into this:
a2,b6
a2,b6
a2,c9
a2,c9
That output can be sorted and piped to uniq -c to get approximately the output you want, so we only need to think much about how to get from your input to the first and second states. Once we have those, the final step is easy.
Step one:
sort /tmp/values.csv \
| awk '
BEGIN { FS="," }
{
if (x != $1) {
if (x) print values
x = $1
values = $2
} else {
values = values " " $2
}
}
END { print values }
'
Step two:
| awk '
{
for (i = 1; i < NF; ++i) {
for (j = i+1; j <= NF; ++j) {
print $i "," $j
}
}
}
'
Step three:
| sort | awk '
BEGIN {
combination = $0
print "y1,y2,share"
}
{
if (combination == $0) {
count = count + 1
} else {
if (count) print combination "," count
count = 1
combination = $0
}
}
END { print combination "," count }
'
This awk script does the job:
BEGIN { FS=OFS="," }
NR==1 { print "y1","y2","share" }
NR>1 { ++seen[$1,$2]; ++x[$1]; ++y[$2] }
END {
for (y1 in y) {
for (y2 in y) {
if (y1 != y2 && !(y2 SUBSEP y1 in c)) {
for (i in x) {
if (seen[i,y1] && seen[i,y2]) {
++c[y1,y2]
}
}
}
}
}
for (key in c) {
split(key, a, SUBSEP)
print a[1],a[2],c[key]
}
}
Loop through the input, recording both the original elements and the combinations. Once the file has been processed, look at each pair of y values. The if statement does two things: it prevents equal y values from being compared and it saves looping through the x values twice for every pair. Shared values are stored in c.
Once the shared values have been aggregated, the final output is printed.
This sed script does the trick:
#!/bin/bash
echo y1,y2,share
x=$(wc -l < file)
b=$(echo "$x -2" | bc)
index=0
for i in $(eval echo "{2..$b}")
do
var_x_1=$(sed -n ''"$i"p'' file | sed 's/,.*//')
var_y_1=$(sed -n ''"$i"p'' file | sed 's/.*,//')
a=$(echo "$i + 1" | bc)
for j in $(eval echo "{$a..$x}")
do
var_x_2=$(sed -n ''"$j"p'' file | sed 's/,.*//')
var_y_2=$(sed -n ''"$j"p'' file | sed 's/.*,//')
if [ "$var_x_1" = "$var_x_2" ] ; then
array[$index]=$var_y_1,$var_y_2
index=$(echo "$index + 1" | bc)
fi
done
done
counter=1
for (( k=1; k<$index; k++ ))
do
if [ ${array[k]} = ${array[k-1]} ] ; then
counter=$(echo "$counter + 1" | bc)
else
echo ${array[k-1]},$counter
counter=1
fi
if [ "$k" = $(echo "$index-1"|bc) ] && [ $counter = 1 ]; then
echo ${array[k]},$counter
fi
done

awk assign field string to variable not working

Just wondering why this is not working?
this is my awk code, converting "hh:mm:ss" format to seconds
a.awk
3 BEGIN {
4 FS=":";
5 }
6
7 {
8 retval = 0;
9 for (i = 1; i <= NF; i++) {
10 retval += $i * 60 ** (NF-i);
11 }
12 print $retval;
13 }
14
input.txt
59:22:40
$ cat input.txt | awk -f a.awk
//<empty>
$
however, I try it on command line:
$ echo "00:59:30" | awk 'BEGIN { FS=":" } { retval = 0; for (i = 1; i <= NF; i++) { retval += $i * 60 ** (NF-i); } print retval;}'
3570
what's wrong with a.awk ?
update just for clarifcation
$ awk --version
GNU Awk 4.0.1
Copyright (C) 1989, 1991-2012 Free Software Foundation.
Since your question has already been answered by the other 2 posts, here's something cute you can do with date to accomplish the same conversion from hh:mm:ss to time in seconds:
# GNU date
string_time="01:01:01"
string_time_in_seconds=$(date -u -d "1970-01-01 ${string_time}" +"%s")
echo ${string_time_in_seconds}
3661
That for loop is cute, but this seems more direct and easier to understand.
BEGIN {
FS=":";
}
{
retval = 0;
in_hours = $1
in_minutes = $2;
in_seconds = $3;
retval = (in_hours * 3600) + (in_minutes * 60) + in_seconds
print retval;
}
I think the problem with your loop is in the exponentiation. My version, at least, doesn't support any ** operator. This might work better for you. Also, be careful with your dollar signs. You need them for fields; you don't need them for variables.
for (i = 1; i <= NF; i++) {
retval += i * (60^(NF-i));
}
it was a typo
a.awk
3 BEGIN {
4 FS=":";
5 }
6
7 {
8 retval = 0;
9 for (i = 1; i <= NF; i++) {
10 retval += $i * 60 ** (NF-i);
11 }
12 print retval; ///<<<< notice here.
13 }
14
Or, using bash only:
IFS=: read -a a < input.txt
((retval=${a[0]}*3600+${a[1]}*60+${a[2]}))
echo "$retval"

bash/sed/awk: change first alphabet in string to uppercase

Let say I have this list:
39dd809b7a36
d83f42ab46a9
9664e29ac67c
66cf165f7e32
51b9394bc3f0
I want to convert the first occurrence of alphabet to uppercase, for example
39dd809b7a36 -> 39Dd809b7a36
bash/awk/sed solution should be ok.
Thanks for the help.
GNU sed can do it
printf "%s\n" 39dd809b7a36 d83f42ab46a9 9664e29ac67c 66cf165f7e32 51b9394bc3f0 |
sed 's/[[:alpha:]]/\U&/'
gives
39Dd809b7a36
D83f42ab46a9
9664E29ac67c
66Cf165f7e32
51B9394bc3f0
Pure Bash 4.0+ using parameter substitution:
string=( "39dd809b7a36" "d83f42ab46a9"
"9664e29ac67c" "66cf165f7e32" "51b9394bc3f0" )
for str in ${string[#]}; do
# get the leading digits by removing everything
# starting from the first letter:
head="${str%%[a-z]*}"
# and the rest of the string starting with the first letter
tail="${str:${#head}}"
# compose result : head + tail with 1. letter to upper case
result="$head${tail^}"
echo -e "$str\n$result\n"
done
Result:
39dd809b7a36
39Dd809b7a36
d83f42ab46a9
D83f42ab46a9
9664e29ac67c
9664E29ac67c
66cf165f7e32
66Cf165f7e32
51b9394bc3f0
51B9394bc3f0
I can't think of any clever way to do this with the basic SW tools, but the BFI solution isn't too bad.
In the One True awk(1) or in gawk:
{ n = split($0, a, "")
for(i = 1; i <= n; ++i) {
s = a[i]
t = toupper(s)
if (s != t) {
a[i] = t
break
}
}
r = ""
for(i = 1; i <= n; ++i) {
r = r a[i]
}
print r
}
It's not too bad in Ruby:
ruby -p -e '$_ = $_.split(/(?=[a-z])/, 2); $_[1].capitalize!'
Here is my solution. It will not allow patterns in form ###.### but can be tweaked as needed.
A=$(cat); B=$(echo $A | sed 's/([a-z])/###\1###/' | sed 's/.###(.)###./\1/' | sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/') ; C="echo $A | sed 's/[a-z]/$B/'" ; eval $C

Resources