Speeding Monte Carlo in matlab - performance

I'm trying to speed up the following Monte Carlo simulation in matlab:
http://pastebin.com/nS0K7XXa
and this is the full result of the matlab profiler
http://i.imgur.com/bGFY5e7.png
I am pretty new at using matlab, but I spent a good deal of time already on this and I think I'm missing something somewhere, because I have the feeling that this should run much faster.
I'm concerned about the lines the profiler show in red of course... lets start with these ones:
time calls line code
37.59 19932184 54 radselec = fix(rand(1)*nr) + 1;
4.54 19932184 55 nm = nm - 1;
45.35 19932184 56 Rad2(radselec) = Rad2(radselec) + 1;
I have a very large vector (Rad2) which holds positive integer values, initially they are all zero but as the simulation progresses it fills up.
line 54 picks a random element of that vector, everytime I add a value to that vector I also add a value to the variable nr, so basically nr is numel(nr) and fix(rand(1)*nr)+1 will pick a random number between 1 and nr.
Question 1: Is there a better way of doing this? rand(1) alone seems to take a long time, as you can see from line 26:
31.50 20540616 26 r = rand(1);
Question 2: line 56 also called my attention... once I have a value for radselec, I need to add +1 to the value of Rad2(radselec).
Now I thought that doing Rad2(radselec) = Rad2(radselec) + 1; was just as fast as doing nm = nm - 1 or +1 for that matter... but the profiler shows that adding +1 to an element of a vector is 10 times slower.
Question 3:
31.50 20540616 26 r = rand(1);
27
22.72 20540616 28 if r > R1/Rt
3.39 20220062 29 reacselec = 2;
10.80 20220062 30 if r > (R1+R2)/Rt
rand(1) seems to be slow as it is... by definition I need that random number between 0 and 1. So I can't think of another way of speeding that line up.
Now... How come line 28 is 2 times slower than line 30 ??? I mean... they are practically the same line with the same calculation... if anything line 30 should be slightle slower for having R1+R2 in the numerator, instead of just R1.
What's happening there?
And finally,
24.26 20540616 79 end
why is that end statement chugging so many time? How can fix that?
Thank you for your time, and sorry if this questions are too basic. I just started programming a few months ago, and I do not have a computer science background. I'm thinking on taking some courses, but that's not a priority.
Any help will be very appreciated.

Related

Wanted: a algorithm or SAS function for local high density of ordered binary (many 1's in a row)

Goal: Given a series of binary digits, see where there's a longer bunch of 1's (even if there are a few 0's mixed in).
Background: I'm programming a monte carlo simulation in SAS, and have (say) 100k variables with a 0 or 1. I want to see if (somewhere) there is a cluster that is pretty dense. I don't think 5 in a row of one's would be sufficiently dense (00011111010...), but maybe 100 one's in (01111...11111) would be great. So, I guess I want a localized cluster.
I'm having the code for the variable be, so that n1, n2, etc. would be either 0 or 1,:
array var_of_binary{*} n1 - n100000;
Am I asking for a solution (impossible) of the SAT-CNF which is "a classic problem that is known to be NP-complete"? (PS: I don't understand what that is, but I know it's unsolvable and too complex.)
I think making multiple passes computing density of length 21, 22, 23, ..., 1000 would work (this is pseudo-code, which I did not try to run):
static_max_of_1k = 1000; /*check in lengths up to 1k, perhaps less*/
do i= 100 to static_max_of_1k ;
do j=1 to 999000;
density1 = sum(of var_of_binary{j} - var_of_binary{j+i })/i;
/* save value of density1, probably in an array */
end;
end;
Note 1: I don't want a C++ solution (unless it works immediately in SAS as a subroutine without alteration).
Note 2: I don't want recursive code, since if it blows up, I wouldn't have a clue to debug it. (I know my limitations.)
Note 3: I guess I'm doing a 1-dimensional variation of Detect High density pixel areas in a binary image which is sort of cool and a nice photo, but (again) beyond me. I appreciated from afar the metacode of SimpleBlobDetector Class Reference. I think I'm in over my head.
Maybe this will help. Generate data and then determine the runs and their size. FREQ is the size of the run, J is the ID within REP and OBS is the ID of the first obs in the run.
data simulate;
do rep=1 to 1e1;
do j = 1 to 1e1;
y = rand('BINOMIAL',.5,1);
output;
end;
end;
run;
proc summary data=simulate;
by rep y notsorted;
output out=runs(drop=_type_) idgroup(obs out[1](j)=);
run;
proc print;
run;

algorithm to find blocks of trends

Let's say I have a a 24 lines of data, such that each line represents an hour in the day. What I want to achieve is to implement an algorithm that can detect trends in the data and can divide it into 2 blocks - a "good" block and a "bad" block. for example, in the attached image you can see that at line 6 a good block begins and ends in line 19. line 0 also has a good score but it is not part of a block so the algorithm should know how to handle this situation.
I think it's about clustering but couldn't find something simple enough that fits our needs.
Looking forward for any advice.
start = -1
Append a below-threshold value to the end of the data array x[]
For i from 1 to n:
If x[i] >= thresholdValue:
if start == -1:
start = i
Else:
If start != -1 and i - start >= thresholdLength:
ReportGoodBlock(start, i-1)
start = -1

Subtract a number's digits from the number until it reaches 0

Can anyone help me with some algorithm for this problem?
We have a big number (19 digits) and, in a loop, we subtract one of the digits of that number from the number itself.
We continue to do this until the number reaches zero. We want to calculate the minimum number of subtraction that makes a given number reach zero.
The algorithm must respond fast, for a 19 digits number (10^19), within two seconds. As an example, providing input of 36 will give 7:
1. 36 - 6 = 30
2. 30 - 3 = 27
3. 27 - 7 = 20
4. 20 - 2 = 18
5. 18 - 8 = 10
6. 10 - 1 = 9
7. 9 - 9 = 0
Thank you.
The minimum number of subtractions to reach zero makes this, I suspect, a very thorny problem, one that will require a great deal of backtracking potential solutions, making it possibly too expensive for your time limitations.
But the first thing you should do is a sanity check. Since the largest digit is a 9, a 19-digit number will require about 1018 subtractions to reach zero. Code up a simple program to continuously subtract 9 from 1019 until it becomes less than ten. If you can't do that within the two seconds, you're in trouble.
By way of example, the following program (a):
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned long long x = strtoull(argv[1], NULL, 10);
x /= 1000000000;
while (x > 9)
x -= 9;
return x;
}
when run with the argument 10000000000000000000 (1019), takes a second and a half clock time (and CPU time since it's all calculation) even at gcc insane optimisation level of -O3:
real 0m1.531s
user 0m1.528s
sys 0m0.000s
And that's with the one-billion divisor just before the while loop, meaning the full number of iterations would take about 48 years.
So a brute force method isn't going to help here, what you need is some serious mathematical analysis which probably means you should post a similar question over at https://math.stackexchange.com/ and let the math geniuses have a shot.
(a) If you're wondering why I'm getting the value from the user rather than using a constant of 10000000000000000000ULL, it's to prevent gcc from calculating it at compile time and turning it into something like:
mov $1, %eax
Ditto for the return x which will prevent it noticing I don't use the final value of x and hence optimise the loop out of existence altogether.
I don't have a solution that can solve 19 digit numbers in 2 seconds. Not even close. But I did implement a couple of algorithms (including a dynamic programming algorithm that solves for the optimum), and gained some insight that I believe is interesting.
Greedy Algorithm
As a baseline, I implemented a greedy algorithm that simply picks the largest digit in each step:
uint64_t countGreedy(uint64_t inputVal) {
uint64_t remVal = inputVal;
uint64_t nStep = 0;
while (remVal > 0) {
uint64_t digitVal = remVal;
uint_fast8_t maxDigit = 0;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > maxDigit) {
maxDigit = digit;
}
digitVal = nextDigitVal;
}
remVal -= maxDigit;
++nStep;
}
return nStep;
}
Dynamic Programming Algorithm
The idea for this is that we can calculate the optimum incrementally. For a given value, we pick a digit, which adds one step to the optimum number of steps for the value with the digit subtracted.
With the target function (optimum number of steps) for a given value named optSteps(val), and the digits of the value named d_i, the following relationship holds:
optSteps(val) = 1 + min(optSteps(val - d_i))
This can be implemented with a dynamic programming algorithm. Since d_i is at most 9, we only need the previous 9 values to build on. In my implementation, I keep a circular buffer of 10 values:
static uint64_t countDynamic(uint64_t inputVal) {
uint64_t minSteps[10] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
uint_fast8_t digit0 = 0;
for (uint64_t val = 10; val <= inputVal; ++val) {
digit0 = val % 10;
uint64_t digitVal = val;
uint64_t minPrevStep = 0;
bool prevStepSet = false;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > 0) {
uint64_t prevStep = 0;
if (digit > digit0) {
prevStep = minSteps[10 + digit0 - digit];
} else {
prevStep = minSteps[digit0 - digit];
}
if (!prevStepSet || prevStep < minPrevStep) {
minPrevStep = prevStep;
prevStepSet = true;
}
}
digitVal = nextDigitVal;
}
minSteps[digit0] = minPrevStep + 1;
}
return minSteps[digit0];
}
Comparison of Results
This may be considered a surprise: I ran both algorithms on all values up to 1,000,000. The results are absolutely identical. This suggests that the greedy algorithm actually calculates the optimum.
I don't have a formal proof that this is indeed true for all possible values. It intuitively kind of makes sense to me. If in any given step, you choose a smaller digit than the maximum, you compromise the immediate progress with the goal of getting into a more favorable situation that allows you to catch up and pass the greedy approach. But in all the scenarios I thought about, the situation after taking a sub-optimal step just does not get significantly more favorable. It might make the next step bigger, but that is at most enough to get even again.
Complexity
While both algorithms look linear in the size of the value, they also loop over all digits in the value. Since the number of digits corresponds to log(n), I believe the complexity is O(n * log(n)).
I think it's possible to make it linear by keeping counts of the frequency of each digit, and modifying them incrementally. But I doubt it would actually be faster. It requires more logic, and turns a loop over all digits in the value (which is in the range of 2-19 for the values we are looking at) into a fixed loop over 10 possible digits.
Runtimes
Not surprisingly, the greedy algorithm is faster to calculate a single value. For example, for value 1,000,000,000, the runtimes on my MacBook Pro are:
greedy: 3 seconds
dynamic: 36 seconds
On the other hand, the dynamic programming approach is obviously much faster at calculating all the values, since its incremental approach needs them as intermediate results anyway. For calculating all values from 10 to 1,000,000:
greedy: 19 minutes
dynamic: 0.03 seconds
As already shown in the runtimes above, the greedy algorithm gets about as high as 9 digit input values within the targeted runtime of 2 seconds. The implementations aren't really tuned, and it's certainly possible to squeeze out some more time, but it would be fractional improvements.
Ideas
As already explored in another answer, there's no chance of getting the result for 19 digit numbers in 2 seconds by subtracting digits one by one. Since we subtract at most 9 in each step, completing this for a value of 10^19 needs more than 10^18 steps. We mostly use computers that perform in the rough range of 10^9 operations/second, which suggests that it would take about 10^9 seconds.
Therefore, we need something that can take shortcuts. I can think of scenarios where that's possible, but haven't been able to generalize it to a full strategy so far.
For example, if your current value is 9999, you know that you can subtract 9 until you reach 9000. So you can calculate that you will make 112 steps ((9999 - 9000) / 9 + 1) where you subtract 9, which can be done in a few operations.
As said in comments already, and agreeing with #paxdiablo’s other answer, I’m not sure if there is an algorithm to find the ideal solution without some backtracking; and the size of the number and the time constraint might be tough as well.
A general consideration though: You might want to find a way to decide between always subtracting the highest digit (which will decrease your current number by the largest possible amount, obviously), and by looking at your current digits and subtracting which of those will give you the largest “new” digit.
Say, your current number only consists of digits between 0 and 5 – then you might be tempted to subtract the 5 to decrease your number by the highest possible value, and continue with the next step. If the last digit of your current number is 3 however, then you might want to subtract 4 instead – since that will give you 9 as new digit at the end of the number, instead of “only” 8 you would be getting if you subtracted 5.
Whereas if you have a 2 and two 9 in your digits already, and the last digit is a 1 – then you might want to subtract the 9 anyway, since you will be left with the second 9 in the result (at least in most cases; in some edge cases it might get obliterated from the result as well), so subtracting the 2 instead would not have the advantage of giving you a “high” 9 that you would otherwise not have in the next step, and would have the disadvantage of not lowering your number by as high an amount as subtracting the 9 would …
But every digit you subtract will not only affect the next step directly, but the following steps indirectly – so again, I doubt there is a way to always chose the ideal digit for the current step without any backtracking or similar measures.

problem with power.roc.test in R

I anlaysing several different ROC analyses in my article. Therefore I am investigating whether my sample size is appropriate. I have created a data frame which consists all combinations of possible sample sizes for ROC analysis.
str(auc)
'data.frame': 93 obs. of 2 variables:
$ cases : int 10 11 12 13 14 15 16 17 18 19 ...
$ controls: int 102 101 100 99 98 97 96 95 94 93 ...
My aim is to create line plot cases/controls (ie. kappa) versus optimal AUC
Hence I would like to create third variable using power.roc.test to calculate optimal AUC
I ran to problem above, where lies to problem?
auc$auc<-power.roc.test(sig.level=.05,power=.8,ncases=auc$cases,ncontrols=auc$controls)$auc
Error in value[[3L]](cond) : AUC could not be solved:
Error in uniroot(power.roc.test.optimize.auc.function, interval = c(0.5, : invalid function value in 'zeroin'
In addition: Warning messages:
1: In if (is.na(f.lower)) stop("f.lower = f(lower) is NA") :
the condition has length > 1 and only the first element will be used
2: In if (is.na(f.upper)) stop("f.upper = f(upper) is NA") :
the condition has length > 1 and only the first element will be used
3: In if (f.lower * f.upper > 0) stop("f() values at end points not of opposite sign") :
the condition has length > 1 and only the first element will be used
I believe you are using the pROC package. The error message is not especially helpful here, but you basically need to pass scalar values, including to ncases and ncontrols.
power.roc.test(sig.level=.05,power=.8,ncases=10, ncontrols=102)
You can wrap that in some apply loop:
auc$auc<- apply(auc, 1, function(line) {
power.roc.test(sig.level=.05, power=.8, ncases=line[["cases"]], ncontrols=line[["controls"]])$auc
})
Then you will be able to plot this however you want:
plot(auc$cases / auc$controls, auc$auc, type = "l")
Note that the AUC here is not the "optimal AUC", but the AUC at which you can expect the given power at the given significance level with the given sample size for a test of the significance of the AUC (H0: AUC = 0.5). Note that you won't be able to perform this test with pROC anyway.

Whats the reasoning behind odd integers rounding down when divided by 2?

Wouldn't it be better to see an error when dividing an odd integer by 2 than an incorrect calculation?
Example in Ruby (I'm guessing its the same in other languages because ints and floats are common datatypes):
39 / 2 => 19
I get that the output isn't 19.5 because we're asking for the value of an integer divided by an integer, not a float (39.0) divided by an integer. My question is, if the limits of these datatypes inhibit it from calculating the correct value, why output the least correct value?
Correct = 19.5
Correct-ish = 20 (rounded up)
Least correct = 19
Wouldn't it be better to see an error?
Throwing an error would be usually be extremely counter-productive, and computationally inefficient in most languages.
And consider that this is often useful behaviour:
total_minutes = 563;
hours = total_minutes / 60;
minutes = total_minutes % 60;
Correct = 19.5
Correct-ish = 20 (rounded up)
Least correct = 19
Who said that 20 is more correct than 19?
Among other reasons to keep the following very useful relationship between the sibling operators of division and modulo.
Quotient: a / b = Q
Remainder: a % b = R
Awesome relationship: a = b*Q + R.
Also so that integer division by two returns the same result as a right shift by one bit and lots of other nice relationships.
But the secret, main reason is that C did it this way, and you simply don't argue with C!
If you divide through 2.0, you get the correct result in Ruby.
39 / 2.0 => 19.5

Resources