Optimizing a Vectorized Matlab Function - performance

When i run profiler it tell me that the most time consuming code is the function vdist. Its a program that measures distance between two points on earth considering earth as an ellipsoid. The code looks standard and i don't know where and how it can be improved upon. The initial comments say, it has already been vectorized. Is there a counterpart to it in some other language which can be used as a MEX file. All i want is improvement in terms of time efficiency. Here is a link to the code from Matlab FEX.
http://www.mathworks.com/matlabcentral/fileexchange/8607-vectorized-geodetic-distance-and-azimuth-on-the-wgs84-earth-ellipsoid/content/vdist.m
The function is called from within a loop as- (You can find the function as its the most time consuming line here)
109 for i=1:polySize
110 % find the two vectors needed
11755 111 if i~=1
0.02 11503 112 if i<polySize
0.02 11251 113 p0=Polygon(i,:); p1=Polygon(i-1,:); p2=Polygon(i+1,:);
252 114 else
252 115 p0=Polygon(i,:); p1=Polygon(i-1,:); p2=Polygon(1,:); %special case for i=polySize
252 116 end
252 117 else
252 118 p0=Polygon(i,:); p1=Polygon(polySize,:); p2=Polygon(i+1,:); %special case for i=1
252 119 end
0.02 11755 120 Vector1=(p0-p1); Vector2=(p0-p2);
0.06 11755 121 if ~(isequal(Vector1,Vector2) || isequal(Vector1,ZeroVec) || isequal(Vector2,ZeroVec));
122 %determine normals and normalise and
0.17 11755 123 NV1=rotateVector(Vector1, pi./2); NV2=rotateVector(Vector2, -pi./2);
0.21 11755 124 NormV1=normaliseVector(NV1); NormV2=normaliseVector(NV2);
125 %determine rotation by means of the atan2 (because sign matters!)
11755 126 totalRotation = vectorAngle(NormV2, NormV1); % Bestimme den Winkel totalRotation zwischen den normierten Vektoren
11755 127 if totalRotation<10
11755 128 totalRotation=totalRotation*50;
11755 129 end
0.01 11755 130 for res=1:6
0.07 70530 131 U_neu=p0+NV1;
17.01 70530 132 [pos,a12] = vdist(p0(:,2),p0(:,1),U_neu(:,2),U_neu(:,1));
0.02 70530 133 a12=a12+1/6.*res*totalRotation;
70530 134 ddist=1852*safety_distance;
4.88 70530 135 [lat2,lon2] = vreckon(p0(:,2),p0(:,1),ddist, a12);
0.15 70530 136 extendedPoly(f,:)=[lon2,lat2];f=f+1;
< 0.01 70530 137 end
11755 138 end
11755 139 end

No matter how hard I study the code that's been posted, I can't see why the call to vdist is made inside the loop.
When I'm trying to optimise a block of code inside a loop one of the things I look for are statements which are invariant, that is which are the same at each call, and which can therefore be lifted out of the loop.
Looking at
130 for res=1:6
131 U_neu=p0+NV1;
132 [pos,a12] = vdist(p0(:,2),p0(:,1),U_neu(:,2),U_neu(:,1));
133 a12=a12+1/6.*res*totalRotation;
134 ddist=1852*safety_distance;
135 [lat2,lon2] = vreckon(p0(:,2),p0(:,1),ddist, a12);
136 extendedPoly(f,:)=[lon2,lat2];f=f+1;
137 end
I see
in l131 the variables p0, NV1 appear only on the rhs, and they only appear on the rhs elsewhere inside the loop, so this statement is loop-invariant and can be lifted out of the loop; only a small time saving perhaps;
in l134 again, I see another loop-invariant statement, which can again be lifted out of the loop for another small time saving;
but then I started to look very closely, and I can't see why l132, where the call to vdist is made, is inside the loop either. None of the values on the rhs of that assignment are modified in the loop (other than U_neu but I've already lifted that out of the loop).
Tidying up what was left a bit, this is what I ended up with:
U_neu=p0+NV1;
[pos,a12] = vdist(p0(:,2),p0(:,1),U_neu(:,2),U_neu(:,1));
ddist=1852*safety_distance;
for res=1:6
extendedPoly(f,:) = vreckon(p0(:,2),p0(:,1),ddist, a12+1/6.*res*totalRotation);
f=f+1;
end

An option will be to rewrite this FEX file in a way that you would be able to use it GPUs. A smooth way into it for example is a toolbox called Jacket.

Related

Testing for a sudoku

I'm currently / was programming a program which tests if the given sudoku is a real sudoku. It works. My only problem is that I don't know if it meets all the required criteria. I am testing that every number is 9x in the sudoku and that it is only 1x in every 3x3 field. Adding to that I'm testing that every vertical and horizontal line (when added together) have the same result(45) which includes that the overall result has to be 405. My question is if I need to test for every number to only once in every horizontal and vertical line once or is it not needed anymore?
Counterexample:
453 456 189
126 789 723
789 123 456
231 564 897
564 897 231
897 231 564
312 645 978
645 978 312
978 312 645

Joining two matrices, one with numbers and the other percentages

I have two matrices, cases and percentages. I want to combine both with the columns alternating between the two i.e. cases [c1] percent [c1] cases [c2] percent [c2]...
tab year region if sex==1, matcell(cases)
tab year region, matcell(total)
mata:st_matrix("percent", 100 * st_matrix("cases"):/st_matrix("total"))
matrix list cases
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 1313 1289 1121 1176 1176 1150 1190 1184 1042 940
r2 340 359 357 366 383 332 406 367 352 272
r3 260 246 266 265 270 259 309 306 266 283
r4 271 267 293 277 317 312 296 285 265 253
r5 218 249 246 213 264 255 247 221 229 220
r6 215 202 157 202 200 204 220 183 176 180
r7 178 193 218 199 194 195 201 187 172 159
r8 127 111 107 130 133 99 142 143 131 114
r9 64 68 85 74 70 60 59 70 76 61
. matrix list percent, format(%2.1f)
percent[9,10]
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 70.1 71.2 67.3 67.2 66.9 71.5 72.6 72.5 74.9 73.2
r2 65.3 65.2 69.1 64.4 68.0 70.5 72.0 64.8 66.4 64.9
r3 74.7 73.7 74.7 69.2 68.9 67.6 70.5 72.3 79.4 80.9
r4 66.3 72.6 72.9 74.9 72.7 73.8 72.2 73.3 74.9 71.7
r5 68.8 67.1 66.0 63.6 67.2 67.1 65.2 67.4 68.6 73.8
r6 73.1 72.9 69.2 63.7 67.6 68.0 72.4 68.8 74.9 78.9
r7 64.5 60.3 69.9 70.6 69.3 78.3 72.3 65.8 71.4 71.3
r8 66.1 64.2 63.3 74.7 69.3 56.9 70.6 70.1 63.9 57.9
r9 77.1 73.9 70.2 74.0 71.4 73.2 81.9 72.9 87.4 74.4
How do I combine both the matrices?
currently I have tried: matrix final=cases, percent but it just puts them beside each other? I want it so each column alternates between cases and percent.
I will then use putexcel command to put them into an already formatted table with columns of cases and percentages.
Let me start by supporting Nick Cox's comments.
The problem is, there is no simple solution for combining matrices as you desire. Nevertheless, it is simple to achieve the results you want, by taking a very much different path from the one you outlined. It's no fun to write an essay describing the technique in natural language; it's much simpler to demonstrate it using code, as I do below, and as I expect Nick might have been inclined to do.
By not providing a Minimal, Complete, and Verifiable example, as described in the link Nick provided to you, you've discouraged others from showing you where you've gone off the tracks.
// create a minimal amount of sample data hopefully similar to actual data
clear
input year region sex
2001 1 1
2001 1 2
2001 1 2
2002 1 1
2002 1 2
2001 2 1
2002 2 1
2002 2 2
end
list, clean noobs
// use collapse to generate summaries equivalent to two tabs
generate male = sex==1
collapse (count) total=male (sum) cases=male, by(year region)
list, clean noobs
generate percent = 100*cases/total
keep year region total percent
// flatten and interleave the columns
reshape wide total percent, i(year) j(region)
drop year
list, clean noobs
// now use export excel to output,
// or use mkmat to load into a matrix and use putexcel to output

How to calculate classification error rate

Alright. Now this question is pretty hard. I am going to give you an example.
Now the left numbers are my algorithm classification and the right numbers are the original class numbers
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 89
177 89
177 89
177 89
177 89
177 89
177 89
So here my algorithm merged 2 different classes into 1. As you can see it merged class 86 and 89 into one class. So what would be the error at the above example ?
Or here another example
203 7
203 7
203 7
203 7
16 7
203 7
17 7
16 7
203 7
At the above example left numbers are my algorithm classification and the right numbers are original class ids. As can be seen above it miss classified 3 products (i am classifying same commercial products). So at this example what would be the error rate? How would you calculate.
This question is pretty hard and complex. We have finished the classification but we are not able to find correct algorithm for calculating success rate :D
Here's a longish example, a real confuson matrix with 10 input classes "0" - "9"
(handwritten digits),
and 10 output clusters labelled A - J.
Confusion matrix for 5620 optdigits:
True 0 - 9 down, clusters A - J across
-----------------------------------------------------
A B C D E F G H I J
-----------------------------------------------------
0: 2 4 1 546 1
1: 71 249 11 1 6 228 5
2: 13 5 64 1 13 1 460
3: 29 2 507 20 5 9
4: 33 483 4 38 5 3 2
5: 1 1 2 58 3 480 13
6: 2 1 2 294 1 1 257
7: 1 5 1 546 6 7
8: 415 15 2 5 3 12 13 87 2
9: 46 72 2 357 35 1 47 2
----------------------------------------------------
580 383 496 1002 307 670 549 557 810 266 estimates in each cluster
y class sizes: [554 571 557 572 568 558 558 566 554 562]
kmeans cluster sizes: [ 580 383 496 1002 307 670 549 557 810 266]
For example, cluster A has 580 data points, 415 of which are "8"s;
cluster B has 383 data points, 249 of which are "1"s; and so on.
The problem is that the output classes are scrambled, permuted;
they correspond in this order, with counts:
A B C D E F G H I J
8 1 4 3 6 7 0 5 2 6
415 249 483 507 294 546 546 480 460 257
One could say that the "success rate" is
75 % = (415 + 249 + 483 + 507 + 294 + 546 + 546 + 480 + 460 + 257) / 5620
but this throws away useful information —
here, that E and J both say "6", and no cluster says "9".
So, add up the biggest numbers in each column of the confusion matrix
and divide by the total.
But, how to count overlapping / missing clusters,
like the 2 "6"s, no "9"s here ?
I don't know of a commonly agreed-upon way
(doubt that the Hungarian algorithm
is used in practice).
Bottom line: don't throw away information; look at the whole confusion matrix.
NB such a "success rate" will be optimistic for new data !
It's customary to split the data into say 2/3 "training set" and 1/3 "test set",
train e.g. k-means on the 2/3 alone,
then measure confusion / success rate on the test set — generally worse than on the training set alone.
Much more can be said; see e.g.
Cross-validation.
You have to define the error criteria if you want to evaluate the performance of an algorithm, so I'm not sure exactly what you're asking. In some clustering and machine learning algorithms you define the error metric and it minimizes it.
Take a look at this
https://en.wikipedia.org/wiki/Confusion_matrix
to get some ideas
You have to define a error metric to measure yourself. In your case, a simple method should be to find the properties mapping of your product as
p = properties(id)
where id is the product id, and p is likely be a vector with each entry of different properties. Then you can define the error function e (or distance) between two products as
e = d(p1, p2)
Sure, each properties must be evaluated to a number in this function. Then this error function can be used in the classification algorithm and learning.
In your second example, it seems that you treat the pair (203 7) as successful classification, so I think you have already a metric yourself. You may be more specific to get better answer.
Classification Error Rate(CER) is 1 - Purity (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
ClusterPurity <- function(clusters, classes) {
sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}
Code of #john-colby
Or
CER <- function(clusters, classes) {
1- sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}

Suggest optimal algorithm to find min number of days to purchase all toys

Note: I am still looking for a fast solution. Two of the solutions below are wrong and the third one is terribly slow.
I have N toys from 1....N. Each toy has an associated cost with it. You have to go on a shopping spree such that on a particular day, if you buy toy i, then the next toy you can buy on the same day should be i+1 or greater. Moreover, the absolute cost difference between any two consecutively bought toys should be greater than or equal to k. What is the minimum number of days can I buy all the toys.
I tried a greedy approach by starting with toy 1 first and then seeing how many toys can I buy on day 1. Then, I find the smallest i that I have not bought and start again from there.
Example:
Toys : 1 2 3 4
Cost : 5 4 10 15
let k be 5
On day 1, buy 1,3, and 4
on day 2, buy toy 2
Thus, I can buy all toys in 2 days
Note greedy not work for below example: N = 151 and k = 42
the costs of the toys 1...N in that order are :
383 453 942 43 27 308 252 721 926 116 607 200 195 898 568 426 185 604 739 476 354 533 515 244 484 38 734 706 608 136 99 991 589 392 33 615 700 636 687 625 104 293 176 298 542 743 75 726 698 813 201 403 345 715 646 180 105 732 237 712 867 335 54 455 727 439 421 778 426 107 402 529 751 929 178 292 24 253 369 721 65 570 124 762 636 121 941 92 852 178 156 719 864 209 525 942 999 298 719 425 756 472 953 507 401 131 150 424 383 519 496 799 440 971 560 427 92 853 519 295 382 674 365 245 234 890 187 233 539 257 9 294 729 313 152 481 443 302 256 177 820 751 328 611 722 887 37 165 739 555 811
You can find the optimal solution by solving the asymmetric Travelling Salesman.
Consider each toy as a node, and build the complete directed graph (that is, add an edge between each pair of nodes). The edge has cost 1 (has to continue on next day) if the index is smaller or the cost of the target node is less than 5 plus the cost of the source node, and 0 otherwise. Now find the shortest path covering this graph without visiting a node twice - i.e., solve the Travelling Salesman.
This idea is not very fast (it is in NP), but should quickly give you a reference implementation.
This is not as difficult as ATSP. All you need to do is look for increasing subsequences.
Being a mathematician, the way I would solve the problem is to apply RSK to get a pair of Young tableaux, then the answer for how many days is the height of the tableau and the rows of the second tableau tell you what to purchase on which day.
The idea is to do Schensted insertion on the cost sequence c. For the example you gave, c = (5, 4, 10, 15), the insertion goes like this:
Step 1: Insert c[1] = 5
P = 5
Step 2: Insert c[2] = 4
5
P = 4
Step 3: Insert c[3] = 10
5
P = 4 10
Step 4: Insert c[4] = 15
5
P = 4 10 15
The idea is that you insert the entries of c into P one at a time. When inserting c[i] into row j:
if c[i] is bigger than the largest element in the row, add it to the end of the row;
otherwise, find the leftmost entry in row j that is larger than c[i], call it k, and replace k with c[i] then insert k into row j+1.
P is an array where the lengths of the rows are weakly decreasing and The entries in each of row P (these are the costs) weakly increase. The number of rows is the number of days it will take.
For a more elaborate example (made by generating 9 random numbers)
1 2 3 4 5 6 7 8 9
c = [ 5 4 16 7 11 4 13 6 5]
16
7
5 6 11
P = 4 4 5 13
So the best possible solution takes 4 days, buying 4 items on day 1, 3 on day 2, 1 on day 3, and 1 on day 4.
To handle the additional constraint that consecutive costs must increase by at least k involves redefining the (partial) order on costs. Say that c[i] <k< c[j] if and only if c[j]-c[i] >= k in the usual ordering on numbers. The above algorithm works for partial orders as well as total orders.
I somewhat feel that a greedy approach would give a fairly good result.
I think your approach is not optimal just because you always pick toy 1 to start while you should really pick the least expensive toy. Doing so would give you the most room to move to the next toy.
Each move being the least expensive one, it is just DFS problem where you always follow the least expensive path constrained by k.

Algorithm to find average of group of numbers

I have a quite small list of numbers (a few hundred max) like for example this one:
117 99 91 93 95 95 91 97 89 99 89 99
91 95 89 99 89 99 89 95 95 95 89 948
189 99 89 189 189 95 186 95 93 189 95
189 89 193 189 93 91 193 89 193 185 95
89 194 185 99 89 189 95 189 189 95 89
189 189 95 189 95 89 193 101 180 189
95 89 195 185 95 89 193 89 193 185 99
185 95 189 95 89 193 91 190 94 190 185
99 89 189 95 189 189 95 185 95 185 99
89 189 95 189 186 99 89 189 191 95 185
99 89 189 189 96 89 193 189 95 185 95
89 193 95 189 185 95 93 189 189 95 186
97 185 95 189 95 185 99 185 95 185 99
185 95 190 95 185 95 95 189 185 95 189
2451
If you create a graph with X=the number and Y=number of times we see the number, we'll have something like this:
What I want is to know the average number of each group of numbers. In the example, there's 4 groups and the resulting numbers are 92, 187, 948 and 2451
The number of groups of number is not known.
Do you have any idea of how to create a (simple if possible) algorithm do extract these resulting numbers (if possible in c or pseudo code or English :)
What you want to do is called clustering. If the data you've shown is typical, a gready approach, such as neighbor joining, should be sufficient. So the procedure is:
1) Apply neighbor joining
2) Apply an (empirically identified) threshold to define the clusters
3) Calculate average of each cluster
Using a package that already has clustering algorithms, such as R, would probably be the easiest course, though neighbor joining is not a particularly hard algorithm.
I think std::map<int,int> can easily solve this problem. The key of the map would be the number, and value would be the times/frequency the number occurs.
So the average can be calculated as,
int average = (m[key] * key) / count;
Where count is total number of numbers, so it calculates the average of each group over all numbers, as you didn't clearly mention what you mean by average. I'm also assuming that each distinct number forms its own group!
Here's a way:
Decide what width your bins will be. Let's say 10 (i.e. e.g. numbers > -5 and <= 5 go into bin 0, numbers > 5 and <= 15 go into bin 1, ...).
Create a list which holds lists to the number in each bin. I'd go with something like map<unsigned int, vector<unsigned int> * > in C++.
Now iterate over the numbers, decide what bin they belong to. Check if there's already a vector for this bin in your map, if not create one. Add the number to the vector.
After iterating over all the numbers, simply calculate the average of each vector.
So you are looking for "spikes" in the graph. I'm guessing you are interested in the size and position of each group?
You might use something like this:
Sort the numbers
Loop:
Take the highest number you have
Investigate more numbers until you find a number that is too small to belong to the group (maybe 5% smaller)
Calculate the average of the selected numbers
Let the discarded number be the last number
End loop
In PHP you could do it like this:
$array = array(//an array of numbers);
$average = array_sum($array) / count($array);
With multiple groups of numbers you can do something like:
$array = array(
array(array of numbers, group1),
array(array of numbers, group2),
//etc.
);
foreach($array as $numbers)
{
$average[] = array_sum($numbers) / count($numbers);
}
Unless you're looking for the median or mode.
Ah, I see what you're asking now, you're not asking how to find the average, you're asking how to group the numbers up and find the average of each group.
Lets see, you'd have to find the mode, $counts = array_count_values($array)); array_keys(max($counts)); will do that and the keys in $counts will be the values of the original array, with the values in $counts being the number of times that each number shows up. Then you need to figure out where the bigger gaps in the keys in $counts are. You could also array_unique() the array original array and find the gaps in the values.
Wish my statistics teacher had done a bit more than play poker with us, or I could probably figure out the exact statistical method to determine how big the range checked to determine the groups should be.

Resources