Attempt to "go back" without goto statement - performance

The code examples are gonna be in Lua, but the question is rather general - it's just an example.
for k=0,100 do
::again::
local X = math.random(100)
if X <= 30
then
-- do something
else
goto again
end
end
This code generates 100 pseudorandom numbers between 0-30. It should do it between 0-100, but doesn't let the loop go on if any of them is larger than 30.
I try to do this task without goto statement.
for k=0,100 do
local X = 100 -- may be put behind "for", in some cases, the matter is that we need an 'X' variable
while X >= 30 do --IMPORTANT! it's the opposite operation of the "if" condition above!
X = math.random(100)
end
-- do the same "something" as in the condition above
end
Instead, this program runs the random number generation until I get a desired value. In general, I put all the codes here that was between the main loop and the condition in the first example.
Theoretically, it does the same as the first example, only without gotos. However, I'm not sure in it.
Main question: are these program codes equal? They do the same? If yes, which is the faster (=more optimized)? If no, what's the difference?

It is bad practice to use Goto. Please see http://xkcd.com/292/
Anyway, I'm not much into Lua, but this looks simple enough;
For your first code: What you are doing is starting a loop to repeat 100 times. In the loop you make a random number between 0 and 100. If this number is less than or equal to 30, you do something with it. If this number is greater than 30, you actually throw it away and get another random number. This continues until you have 100 random numbers which will ALL be less than or equal to thirty.
The second code says: Start a loop from 0 to 100. Then you set X to be 100. Then you start another loop with this condition: As long as X is greater than 30, keep randomizing X. Only when X is less than 30 will your code exit and perform some action. When it has performed that action 100 times, the program ends.
Sure, both codes do the same thing, but the first one uses a goto - which is bad practice regardless of efficiency.
The second code uses loops, but is still not efficient - there are 2 levels of loops - and one is based on psuedo-random generation which can be extremely inefficient (maybe the CPU generates only numbers between 30-100 for a trillion iterations?) Then things get very slow. But this is also true for you're first piece of code - it has a 'loop' that is based on psuedo-random number generation.
TLDR; strictly speaking about efficiency, I do not see one of those being more efficient than the other. I could be wrong but it seems the same things is going on.

you can directly use math.random(lower, upper)
for k=0,100 do
local X = math.random(0, 30)
end
even faster.

As I see this pieces of code do the same, but using goto always isn't the best choice (in any programming language). For lua see details here

Related

Give an example of algorithm that is output sensitive, but not input sensitive

Could this be an example ?
x = randomNumber()
If x < 100 add x to the list and go to step 1.
Print list
Wikipedia says: "An output-sensitive algorithm is an algorithm whose running time depends on the size of the output." So, that's pretty easy:
Choose a random number in some range, say 10-1000.
Output that many dots.
The more dots, the longer it takes for your program to run. To make it more dramatic, you can pause for a second between dots.
Your example works too, since the longer the program runs (when the numbers happen to be small), the larger the output will be.

Random number generation from 1 to 7

I was going through Google Interview Questions. to implement the random number generation from 1 to 7.
I did write a simple code, I would like to understand if in the interview this question asked to me and if I write the below code is it Acceptable or not?
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = int(ret[-1])
if ret == 0 or ret == 1:
return 1
elif ret > 7:
ret = ret - 7
return ret
return ret
while 1:
print(generate_rand())
time.sleep(1) # Just to see the output in the STDOUT
(Since the question seems to ask for analysis of issues in the code and not a solution, I am not providing one. )
The answer is unacceptable because:
You need to wait for a second for each random number. Many applications need a few hundred at a time. (If the sleep is just for convenience, note that even a microsecond granularity will not yield true random numbers as the last microsecond will be monotonically increasing until 10us are reached. You may get more than a few calls done in a span of 10us and there will be a set of monotonically increasing pseudo-random numbers).
Random numbers have uniform distribution. Each element should have the same probability in theory. In this case, you skew 1 more (twice the probability for 0, 1) and 7 more (thrice the probability for 7, 8, 9) compared to the others in the range 2-6.
Typically answers to this sort of a question will try to get a large range of numbers and distribute the ranges evenly from 1-7. For example, the above method would have worked fine if u had wanted randomness from 1-5 as 10 is evenly divisible by 5. Note that this will only solve (2) above.
For (1), there are other sources of randomness, such as /dev/random on a Linux OS.
You haven't really specified the constraints of the problem you're trying to solve, but if it's from a collection of interview questions it seems likely that it might be something like this.
In any case, the answer shown would not be acceptable for the following reasons:
The distribution of the results is not uniform, even if the samples you read from time.time() are uniform.
The results from time.time() will probably not be uniform. The result depends on the time at which you make the call, and if your calls are not uniformly distributed in time then the results will probably not be uniformly distributed either. In the worst case, if you're trying to randomise an array on a very fast processor then you might complete the entire operation before the time changes, so the whole array would be filled with the same value. Or at least large chunks of it would be.
The changes to the random value are highly predictable and can be inferred from the speed at which your program runs. In the very-fast-computer case you'll get a bunch of x followed by a bunch of x+1, but even if the computer is much slower or the clock is more precise, you're likely to get aliasing patterns which behave in a similarly predictable way.
Since you take the time value in decimal, it's likely that the least significant digit doesn't visit all possible values uniformly. It's most likely a conversion from binary to some arbitrary number of decimal digits, and the distribution of the least significant digit can be quite uneven when that happens.
The code should be much simpler. It's a complicated solution with many special cases, which reflects a piecemeal approach to the problem rather than an understanding of the relevant principles. An ideal solution would make the behaviour self-evident without having to consider each case individually.
The last one would probably end the interview, I'm afraid. Perhaps not if you could tell a good story about how you got there.
You need to understand the pigeonhole principle to begin to develop a solution. It looks like you're reducing the time to its least significant decimal digit for possible values 0 to 9. Legal results are 1 to 7. If you have seven pigeonholes and ten pigeons then you can start by putting your first seven pigeons into one hole each, but then you have three pigeons left. There's nowhere that you can put the remaining three pigeons (provided you only use whole pigeons) such that every hole has the same number of pigeons.
The problem is that if you pick a pigeon at random and ask what hole it's in, the answer is more likely to be a hole with two pigeons than a hole with one. This is what's called "non-uniform", and it causes all sorts of problems, depending on what you need your random numbers for.
You would either need to figure out how to ensure that all holes are filled equally, or you would have to come up with an explanation for why it doesn't matter.
Typically the "doesn't matter" answer is that each hole has either a million or a million and one pigeons in it, and for the scale of problem you're working with the bias would be undetectable.
Using the same general architecture you've created, I would do something like this:
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = ret % 8 # will return pseudorandom numbers 0-7
if ret == 0:
return 1 # or you could also return the result of another call to generate_rand()
return ret
while 1:
print(generate_rand())
time.sleep(1)

Faster way of testing a condition in MATLAB

I need to run many many tests of the form a<0 where a is a vector (a relatively short one). I am currently doing it with
all(v<0)
Is there a faster way?
Not sure which one will be faster (that may depend on the machine and Matlab version), but here are some alternatives to all(v<0):
~any(v>0)
nnz(v>=0)==0 %// Or ~nnz(v>=0)
sum(v>=0)==0 %// Or ~sum(v>=0)
isempty(find(v>0, 1)) %// Or isempty(find(v>0))
I think the issue is that the conditional is executed on all elements of the array first, then the condition is tested... That is, for the test "any(v<0)", matlab does the following I believe:
Step 1: compute v<0 for every element of v
Step 2: search through the results of step 1 for a true value
So even if the first element of v is less than zero, the conditional was first computed for all elements, hence wasting a lot of time. I think this is also true for any of the alternative solutions offered above.
I don't know of a faster way to do it easily, but wish I did. In some cases, breaking the array v up into smaller chunks and testing incrementally could speed things up, particularly if the condition is common. For example:
function result = anyLessThanZero(v);
w = v(:);
result = true;
for i=1:numel(w)
if ( w(i) < 0 )
return;
end
end
result = false;
end
but that can be very inefficient if the condition is rare. (If you were to really do this, there is probably a better way than I illustrate above to handle any condition, not just <0, but I show it this way to make it clear).

Lua string to number parsing speed optimization

I am trying to make a speedtest using Lua as one of the languages and I just wanted some advice on how I could make my code a bit faster if possible. It is important that I do my own speedtest, since I am looking at very specific parameters.
The code is reading from a file which looks something like this, but the numbers are randomly generated and range from 1 zu 1 000 000. There are between 100 and 10 000 numbers in one list:
type
(123,124,364,5867,...)
type
(14224,234646,5686,...)
...
The type is meant for another language, so it can be ignored. I just put this here so you know why I am not parsing every line. This is my Lua code:
incr = 1
for line in io.lines(arg[1]) do
incr = incr +1
if incr % 3 == 0 then
line:gsub('([%d]+),?',function(n)tonumber(n)end)
end
end
Now, the code works and does exactly what I want it to do. This is not about getting it to work, this is simply about speed. I need ideas and advice to make the code work at optimal speed.
Thanks in advance for any answers.
IMHO, this tonumber() benchmarking is rather strange. Most of CPU time would be spent on other tasks (regexp parsing, file reading, ...).
Instead of converting to number and ignoring result it would be more logical to calculate sum of all the numbers in input file:
local gmatch, s = string.gmatch, 0
for line in io.lines(arg[1]) do
for n in gmatch(line, '%d+') do
s = s + n -- converting string to number is automatic here
end
end
print(s)

for loop and if statement positioning efficiency

I'm doing some simple logic with for loop and if statement, and I was wondering which of the following two positioning is better, or whether there is a significant performance difference between the two.
Case 1:
if condition-is-true:
for loop of length n:
common code
do this
else:
another for loop of length n
common code
do that
Case 2:
for loop of length n:
common code
if condition-is-true:
do this
else:
do that
Basically, I have a for loop that needs to be executed slightly differently based on a condition, but there is certain stuff that needs to happen in the for loop no matter what. I would prefer the second one because I don't have to repeat the commond code twice, but I'm wondering if case 1 would perform significantly better?
I know in terms of big-O notation it doesn't really matter because the if-else statement is a constant anyway, but I'm wondering realistically on a dataset that is not way too big (maybe n = a few thousands), if the two cases make a difference.
Thank you!
First one is good one because there is no need to check the condition every time but in second case you have to check the condition on very iteration. But your length of code will be long . If code size matters then put the common code into the method and just call the method instead of block of common code.

Resources