Speeding up Nested Loop; Can it be vectorized? - performance

I am trying to match some data on what could be a fairly large data set and even on the medium sized data set it is taking too long.
The task I am performing is to take a mechanical problem, then go back 6 months and look for procedural problems (failures on the part of individual employees). I match first on machine and location, so I want to match the same place with the same machine. Then I require that the procedural error comes before the mechanical one, since its in the future. Finally, I limit it to 180 days to keep things comparable.
In the data construction phase, I limit the mechanical issues to exclude the first 6 months, so I have the same 180 day block for each.
I have read a fair bit on optimizing loops. I know that you want to create a storage variable outside of the loop and then just add to it, but I don't actually have any idea how many matches it will return, so initially I had been using rbind inside of the loop. I know the upper bound on the storage variables is the number of mechanical issues * number of procedural issues, but this is gigantic and I can't allocate a vector that large. The code I have places here has my max sized storage variable approach, but I think I will have to go back to something like this:
if (counter == 1) {
pro = procedural[i, ]
other = mechanical[j, ]
}
if (counter != 1) {
pro = rbind(pro, procedural[i, ])
other = rbind(other, mechanical[j, ])
}
I have also read a fair bit about vectorization, but I have never actually managed to get it to work. I have tried a few different things on the vectorization front, but I think I must be doing something wrong.
I also tried removing the second loop and just using the which command, but that doesn't seem to work with a full column of data (from the procedural data) being compared to a single value (from the mechanical data).
Here is the code I have currently. It works for small sets of data fine, but for anything remotely large it takes forever.
maxval = mechrow * prorow
pro = matrix(nrow = maxval, ncol = ncol(procedural))
other = matrix(nrow = maxval, ncol = ncol(procedural))
numprocissues = matrix(nrow = mechrow, ncol = 1)
counter = 1
for (j in 1:mechrow) {
for (i in 1:prorow) {
if (procedural[i, 16] == mechanical[j, 16] &
procedural[i, 17] < mechanical[j, 17] &
procedural[i, 2] == mechanical[j, 2] &
abs(procedural[i, 17] - mechanical[j, 17]) < 180) {
pro[counter, ] = procedural[i, ]
other[counter, ] = mechanical[j, ]
counter = counter + 1
}
}
numprocissues[j, 1] = counter
}
The places I imagine improvement can be made is in my storage variable, potential vectorization, changing the conditions in the if statement or maybe a fancy which statement to remove a loop.
Any advice would be greatly appreciated!
Thank you.

Untested...
xy <- expand.grid(mech=1:mechrow, pro=1:prorow)
ok <- (procedural[xy$pro, 16] == mechanical[xy$mech, 16] &
procedural[xy$pro, 17] < mechanical[xy$mech, 17] &
procedural[xy$pro, 2] == mechanical[xy$mech, 2] &
abs(procedural[xy$pro, 17] - mechanical[xy$mech, 17]) < 180)
pro <- procedural[xy$pro[ok],]
other <- mechanical[xy$mech[ok],]
numprocissues <- tapply(ok, xy$mech, sum)

Related

Optimizing matrix multiplication with varying sizes

Suppose I have the following data generating process
using Random
using StatsBase
m_1 = [1.0 2.0]
m_2 = [1.0 2.0; 3.0 4.0]
DD = []
y = zeros(2,200)
for i in 1:100
rand!(m_1)
rand!(m_2)
push!(DD, m_1)
push!(DD, m_2)
end
idxs = sample(1:200,10)
for i in idxs
DD[i] = DD[1]
end
and suppose given the data, I have the following function
function test(y, DD, n)
v_1 = [1 2]
v_2 = [3 4]
for j in 1:n
for i in 1:size(DD,1)
if size(DD[i],1) == 1
y[1:size(DD[i],1),i] .= (v_1 * DD[i]')[1]
else
y[1:size(DD[i],1),i] = (v_2 * DD[i]')'
end
end
end
end
I'm struggling to optimize the speed of test. In particular, memory allocation increases as I increase n. However, I'm not really allocating anything new.
The data generating process captures the fact that I don't know for sure the size of DD[i] beforehand. That is, the first time I call test, DD[1] could be a 2x2 matrix. The second time I call test, DD[1] could be a 1x2 matrix. I think this could be part of the issue with memory allocation: Julia doesn't know the sizes beforehand.
I'm completely stuck. I've tried #inbounds but that didn't help. Is there a way to improve this?
One important thing to check for performance is that Julia can understand the types. You can check this by running #code_warntype test(y, DD, 1), the output will make it clear that DD is of type Any[] (since you declared it that way). Working with Any can incur quite a performance penalty so declaring DD = Matrix{Float64}[] cuts the time to a third in my testing.
I'm not sure how close this example is to the actual code you want to write but in this particular case the size(DD[i],1) == 1 branch can be replaced by a call to LinearAlgebra.dot:
y[1:size(DD[i],1),i] .= dot(v_1, DD[i])
this cuts the time by another 50% for me. Finally you can squeeze out just a tiny bit more by using mul! to perform the other multiplication in place:
mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2')
Full example:
using Random
using LinearAlgebra
DD = [rand(i,2) for _ in 1:100 for i in 1:2]
y = zeros(2,200)
shuffle!(DD)
function test(y, DD, n)
v_1 = [1 2]
v_2 = [3 4]'
for j in 1:n
for i in 1:size(DD,1)
if size(DD[i],1) == 1
y[1:size(DD[i],1),i] .= dot(v_1, DD[i])
else
mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2)
end
end
end
end

A simple Increasing Mathematical Algorithm

I actually tried to search this, I'm sure this basic algorithm is everywhere on internet, CS textbooks etc, but I cannot find the right words to search it.
What I want from this algorithm to do is write "A" and "B" with the limit always increasing by 2. Like I want it to write A 3 times, then B 5 times, then A 7 times, then B 9 times and so on. And I plan to have 100 elements in total.
Like: AAABBBBBAAAAAAABBBBBBBBB...
I only want to use a single "for loop" for the entire 100 elements starting from 1 to 100. And just direct/sort "A" and "B" through "if/else if/ else".
I'm just asking for the basic mathematical algorithm behind it, showing it through any programming language would be better or redirecting me to such topic would also be fine.
You can do something like this:
There might be shorter answers, but I find this one easy to understand.
Basically, you keep a bool variable that will tell you if it's A's turn or Bs. Then we keep a variable switch that will tell us when we should switch between them. times is being updated with the repeated times we need to print the next character.
A_B = true
times = 3 // 3,5,7,9,...
switch = 3 // 3,8,15,24,...
for (i from 1 to 100)
if (A_B)
print 'A'
else
print 'B'
if (i == switch)
times += 2
switch += times
A_B = !A_B
Python:
for n in range(1, 101):
print "BA"[(int(sqrt(n)) % 2)],
The parity of the square roots of the integers follows that pattern. (Think that (n+1)²-n² = 2n+1.)
If you prefer to avoid the square root, it suffices to use an extra variable that represents the integer square root and keep it updated
r= 1
for n in range(1, 101):
if r * r <= n:
r+= 1
print "AB"[r % 2],
Here is the snippet you can test on this page. It is an example for about 500 letters totally, sure you can modify it for 100 letters. It is quite flexible that you can change the constants to produce lot of different strings in the same manner.
var toRepeat = ['A', 'B'];
var result='', j, i=3;
var sum=i;
var counter = 0;
while (sum < 500) {
j = counter % 2;
result = result + toRepeat[j].repeat(i);
sum = sum + i;
i = i + 2;
counter++;
}
document.getElementById('hLetters').innerHTML=result;
console.log(result);
<div id="hLetters"></div>
If you want it to be exactly 500 / 100 letters, just use a substring function to trim off the extra letters from the end.
To get 100 groups of A and B with increasing length of 3, 5, 7 and so on, you can run this Python code:
''.join(('B' if i % 2 else 'A') * (2 * i + 3) for i in range(100))
The output is a string of 10200 characters.
If you want the output to have only 100 characters, you can use:
import math
''.join(('B' if math.ceil(math.sqrt(i)) % 2 else 'A') for i in range(2, 102))
In js you can start with somethink like this :
$res ="";
count2 = 0;
for (i=2;i<100; i = i+2) {
count = 0;
alert(i);
while (count < i ) {
$res = $res.concat(String.fromCharCode(65+count2));
count++;
}
count2++;
}
alert ($res);

My loops are slow. Is that because of if statements?

I read this post and realized that loops are faster in Julia. Thus, I decided to change my vectorized code into loops. However, I had to use a few if statements in my loop but my loops slowed down after I added more such if statements.
Consider this excerpt, which I directly copied from the post:
function devectorized()
a = [1.0, 1.0]
b = [2.0, 2.0]
x = [NaN, NaN]
for i in 1:1000000
for index in 1:2
x[index] = a[index] + b[index]
end
end
return
end
function time(N)
timings = Array(Float64, N)
# Force compilation
devectorized()
for itr in 1:N
timings[itr] = #elapsed devectorized()
end
return timings
end
I then added a few if statements to test the speed:
function devectorized2()
a = [1.0, 1.0]
b = [2.0, 2.0]
x = [NaN, NaN]
for i in 1:1000000
for index in 1:2
####repeat this 6 times
if index * i < 20
x[index] = a[index] - b[index]
else
x[index] = a[index] + b[index]
end
####
end
end
return
end
I repeated this block six times:
if index * i < 20
x[index] = a[index] - b[index]
else
x[index] = a[index] + b[index]
end
For the sake of conciseness, I'm not repeating this block in my sample code. After repeating the if statements 6 times, devectorized2() took 3 times as long.
I have two questions:
Are there better ways to implement if statements?
Why are if statements so slow? I know that Julia is trying to do loops in a way that matches C. Is Julia providing better "translation" between Julia and C and these if statements just made the translation process more difficult?
Firstly, I don't think the performance here is very odd, since you're adding a lot of work to your function.
Secondly, you should actually return x here, otherwise the compiler might decide that you're not using x, and just skip the whole computation, which would thoroughly confuse the timings.
Thirdly, to answer your question 1: You can implement it like this:
x[index] = a[index] + ifelse(index * i < 20, -1, 1) * b[index]
This can be faster in some cases, but not necessarily in your case, where the branch is very easy to predict. Sometimes you can also get speedups by using Bools, for example like this:
x[index] = a[index] + (2*(index * i >= 20)-1) * b[index]
Again, in your example this doesn't help much, but there are cases when this approach can give you a decent speedup.
BTW: It isn't necessarily always true that loops are preferable to vectorized code any longer. The post you linked to is quite old. Take a look at this blog post, which shows how vectorized code can achieve similar performance to loopy code. In many cases, though, a loop is the clearest, easiest and fastest way to accomplish your goal.

Poisson Solver using Mathematica

I am looking for some help with a Poisson Solver I am writing in Mathematica. The code is quite long with Arrays plugged in, but the full details can be found at http://pastebin.com/uSrSDcW6
I am calculating voltages given charge densities using the central difference method derived from Poisson's Eqn. After calculating the voltage, I test the data set for convergence. I am setting convergence thresholds on the order of 10^-1000+. I have the loop set up to kick out after 10000 iterations incase something goes awry, as a fail safe. I have a loop counter in place for sanity. The program seems to run fine as long as the convergence threshold is set to 10^-100.
My question is this: No matter what I update the threshold too, ex, 10^-100, 10^-150, the computation stops after 633 iterations and kicks out of the loop. I would appreciate any help with this, I am completely stuck. I've added comments to the program that should be explanatory for anyone on this forum. Again, I know this description is limited, so please see the attached url http://pastebin.com/uSrSDcW6 for the full program.
*Update10/9/12***I've isolated my issue down to the 16 digit machine precision. I need to open that up to my machine max precision of 10^309. Mathematica Help is sparse on how to do this. ex "N[MachinePrecision, 50]". Where would I set this in my program to apply it to all computation? Ill paste the loop here if that helps
Vnew / Vold / RHO are 10x10x34 Matrices
Epsilon is a constant
(Initialize ConvergenceLoop to O - This will serve as a fail safe to kick out of the loop if necessary)
ConvergenceLoop = 0;
(Initialize Convergence to zero)
Convergence = 0;
While[Convergence == 0 && ConvergenceLoop < 10000,
(Run through all i,j,k elements,calculating new voltage values)
Do[Vnew[[i]][[j]][[k]] = (1/(2/deltaX^2 + 2/deltaY^2 +
2/deltaZ^2)) *(((Vold[[i + 1]][[j]][[k]] +
Vold[[i - 1]][[j]][[k]])/(deltaX^2)) + ((Vold[[i]][[j + 1]][[k]] +
Vold[[i]][[j - 1]][[k]])/(deltaY^2)) + ((Vold[[i]][[j]][[k + 1]] +
Vold[[i]][[j]][[k - 1]])/(deltaZ^2)) + ((Rho[[i]][[j]][[k]]/Epsilon))), {i, 2, 9}, {j, 2,9}, {k, 2, 33}];
(Assume converged so the loop is triggered when the test hits the first value exceeding the defined convergence threshold)
Convergence = 1;
(This is the convergence test. User defined Convergence threshold)
Do[If[Vold[[i]][[j]][[k]] == 0, Null,
If[(Vnew[[i]][[j]][[k]] - Vold[[i]][[j]][[k]])/Vold[[i]][[j]][[k]] > .0000001, Convergence = 0;
(*This is purely diagnostic. I added a Tracker point to follow the convergence along.
user defined at any element*)
If[i == 5 && j == 5 && k == 10,
Print[ "Tracker Point" (Vnew[[i]][[j]][[k]] -
Vold[[i]][[j]][[k]])/Vold[[i]][[j]][[k]]], Null],Null]], {i, 2, 9}, {j, 2, 9}, {k, 2, 33}];
(Ignore the first iteration until Vnew and Vold are nonzero)
If[ConvergenceLoop < 2, Convergence = 0, Null];
(Forces Vold to evolve with Vnew)
Vold = Vnew;
ConvergenceLoop ++;]
(Added SessionTime for future planning purposes)
If[ConvergenceLoop == 10000,
Print["Convergence Loop Limit Reached. " (SessionTime[]/3600) ],
Print["Convergence Loop Limit Not Reached."]];
(We broke out of the while loop,meaning our data converged,so print the converged values)
If[Convergence == 1,
Print[ ConvergenceLoop "Congratulations Converged!" MatrixForm [Vnew]], Print["Did Not Converge!"]];
Since based on the comments above you have narrowed this to a precision problem as I suspected, please read these:
Funny behaviour when plotting a polynomial of high degree and large coefficients
Global precision setting
Confused by (apparent) inconsistent precision

Identify important minima and maxima in time-series w/ Mathematica

I need a way to identify local minima and maxima in time series data with Mathematica. This seems like it should be an easy thing to do, but it gets tricky. I posted this on the MathForum, but thought I might get some additional eyes on it here.
You can find a paper that discusses the problem at: http://www.cs.cmu.edu/~eugene/research/full/compress-series.pdf
I've tried this so far…
Get and format some data:
data = FinancialData["SPY", {"May 1, 2006", "Jan. 21, 2011"}][[All, 2]];
data = data/First#data;
data = Transpose[{Range[Length#data], data}];
Define 2 functions:
First method:
findMinimaMaxima[data_, window_] := With[{k = window},
data[[k + Flatten#Position[Partition[data[[All, 2]], 2 k + 1, 1], x_List /; x[[k + 1]] < Min[Delete[x, k + 1]] || x[[k + 1]] > Max[Delete[x, k + 1]]]]]]
Now another approach, although not as flexible:
findMinimaMaxima2[data_] := data[[Accumulate#(Length[#] & /# Split[Prepend[Sign[Rest#data[[All, 2]] - Most#data[[All, 2]]], 0]])]]
Look at what each the functions does. First findMinimaMaxima2[]:
minmax = findMinimaMaxima2[data];
{Length#data, Length#minmax}
ListLinePlot#minmax
This selects all minima and maxima and results (in this instance) in about a 49% data compression, but it doesn't have the flexibility of expanding the window.
This other method does. A window of 2, yields fewer and arguably more important extrema:
minmax2 = findMinimaMaxima[data, 2];
{Length#data, Length#minmax2}
ListLinePlot#minmax2
But look at what happens when we expand the window to 60:
minmax2 = findMinimaMaxima[data, 60];
ListLinePlot[{data, minmax2}]
Some of the minima and maxima no longer alternate.
Applying findMinimaMaxima2[] to the output of findMinimaMaxima[] gives a workaround...
minmax3 = findMinimaMaxima2[minmax2];
ListLinePlot[{data, minmax2, minmax3}]
, but this seems like a clumsy way to address the problem.
So, the idea of using a fixed window to look left and right doesn't quite do everything one would like. I began thinking about an alternative that could use a range value R (e.g. a percent move up or down) that the function would need to meet or exceed to set the next minima or maxima. Here's my first try:
findMinimaMaxima3[data_, R_] := Module[{d, n, positions},
d = data[[All, 2]];
n = Transpose[{data[[All, 1]], Rest#FoldList[If[(#2 <= #1 + #1*R && #2 >= #1) || (#2 >= #1 - #1* R && #2 <= #1), #1, #2] &, d[[1]], d]}];
n = Sign[Rest#n[[All, 2]] - Most#n[[All, 2]]];
positions = Flatten#Rest[Most[Position[n, Except[0]]]];
data[[positions]]
]
minmax4 = findMinimaMaxima3[data, 0.1];
ListLinePlot[{data, minmax4}]
This too benefits from post processing with findMinimaMaxima2[]
ListLinePlot[{data, findMinimaMaxima2[minmax4]}]
But if you look closely, you see that it misses the extremes if they go beyond the R value in several positions - including the chart's absolute minimum and maximum as well as along the big moves up and down. Changing the R value shows how it misses the top and bottoms even more:
minmax4 = findMinimaMaxima3[data, 0.15];
ListLinePlot[{data, minmax4}]
So, I need to reconsider. Anyone can look at a plot of the data and easily identify the important minima and maxima. It seems harder to get an algorithm to do it. A window and/or an R value seem important to the solution, but neither on their own seems enough (at least not in the approaches above).
Can anyone extend any of the approaches shown or suggest an alternative to identifying the important minima and maxima?
Happy to forward a notebook with all of this code and discussion in it. Let me know if anyone needs it.
Thank you,
Jagra
I suggest to use an iterative approach. The following functions are taken from this post, and while they can be written more concisely without Compile, they'll do the job:
localMinPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] > pts[[i]] && pts[[i + 1]] > pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
localMaxPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] < pts[[i]] && pts[[i + 1]] < pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
Here is your data plot:
dplot = ListLinePlot[data]
Here we plot the mins, which are obtained after 3 iterations:
mins = ListPlot[Nest[#[[localMinPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Red]]
The same for maxima:
maxs = ListPlot[Nest[#[[localMaxPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Green]]
And the resulting plot:
Show[{dplot, mins, maxs}]
You may vary the number of iterations, to get more coarse-grained or finer minima/maxima.
Edit:
actually, I just noticed that a couple of points were still missed by this method, both for the
minima and maxima. So, I suggest it as a starting point, not as a complete solution. Perhaps, you
could analyze minima/maxima, coming from different iterations, and sometimes include those from a "previous", more fine-grained one. Also, the only "physical reason" that this kind of works, is that the nature of the financial data appears to be fractal-like, with several distinctly different scales. Each iteration in the above Nest-s targets a particular scale. This would not work so well for an arbitrary signal.

Resources