Give an example of algorithm that is output sensitive, but not input sensitive - algorithm

Could this be an example ?
x = randomNumber()
If x < 100 add x to the list and go to step 1.
Print list

Wikipedia says: "An output-sensitive algorithm is an algorithm whose running time depends on the size of the output." So, that's pretty easy:
Choose a random number in some range, say 10-1000.
Output that many dots.
The more dots, the longer it takes for your program to run. To make it more dramatic, you can pause for a second between dots.
Your example works too, since the longer the program runs (when the numbers happen to be small), the larger the output will be.

Related

Random number generation from 1 to 7

I was going through Google Interview Questions. to implement the random number generation from 1 to 7.
I did write a simple code, I would like to understand if in the interview this question asked to me and if I write the below code is it Acceptable or not?
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = int(ret[-1])
if ret == 0 or ret == 1:
return 1
elif ret > 7:
ret = ret - 7
return ret
return ret
while 1:
print(generate_rand())
time.sleep(1) # Just to see the output in the STDOUT
(Since the question seems to ask for analysis of issues in the code and not a solution, I am not providing one. )
The answer is unacceptable because:
You need to wait for a second for each random number. Many applications need a few hundred at a time. (If the sleep is just for convenience, note that even a microsecond granularity will not yield true random numbers as the last microsecond will be monotonically increasing until 10us are reached. You may get more than a few calls done in a span of 10us and there will be a set of monotonically increasing pseudo-random numbers).
Random numbers have uniform distribution. Each element should have the same probability in theory. In this case, you skew 1 more (twice the probability for 0, 1) and 7 more (thrice the probability for 7, 8, 9) compared to the others in the range 2-6.
Typically answers to this sort of a question will try to get a large range of numbers and distribute the ranges evenly from 1-7. For example, the above method would have worked fine if u had wanted randomness from 1-5 as 10 is evenly divisible by 5. Note that this will only solve (2) above.
For (1), there are other sources of randomness, such as /dev/random on a Linux OS.
You haven't really specified the constraints of the problem you're trying to solve, but if it's from a collection of interview questions it seems likely that it might be something like this.
In any case, the answer shown would not be acceptable for the following reasons:
The distribution of the results is not uniform, even if the samples you read from time.time() are uniform.
The results from time.time() will probably not be uniform. The result depends on the time at which you make the call, and if your calls are not uniformly distributed in time then the results will probably not be uniformly distributed either. In the worst case, if you're trying to randomise an array on a very fast processor then you might complete the entire operation before the time changes, so the whole array would be filled with the same value. Or at least large chunks of it would be.
The changes to the random value are highly predictable and can be inferred from the speed at which your program runs. In the very-fast-computer case you'll get a bunch of x followed by a bunch of x+1, but even if the computer is much slower or the clock is more precise, you're likely to get aliasing patterns which behave in a similarly predictable way.
Since you take the time value in decimal, it's likely that the least significant digit doesn't visit all possible values uniformly. It's most likely a conversion from binary to some arbitrary number of decimal digits, and the distribution of the least significant digit can be quite uneven when that happens.
The code should be much simpler. It's a complicated solution with many special cases, which reflects a piecemeal approach to the problem rather than an understanding of the relevant principles. An ideal solution would make the behaviour self-evident without having to consider each case individually.
The last one would probably end the interview, I'm afraid. Perhaps not if you could tell a good story about how you got there.
You need to understand the pigeonhole principle to begin to develop a solution. It looks like you're reducing the time to its least significant decimal digit for possible values 0 to 9. Legal results are 1 to 7. If you have seven pigeonholes and ten pigeons then you can start by putting your first seven pigeons into one hole each, but then you have three pigeons left. There's nowhere that you can put the remaining three pigeons (provided you only use whole pigeons) such that every hole has the same number of pigeons.
The problem is that if you pick a pigeon at random and ask what hole it's in, the answer is more likely to be a hole with two pigeons than a hole with one. This is what's called "non-uniform", and it causes all sorts of problems, depending on what you need your random numbers for.
You would either need to figure out how to ensure that all holes are filled equally, or you would have to come up with an explanation for why it doesn't matter.
Typically the "doesn't matter" answer is that each hole has either a million or a million and one pigeons in it, and for the scale of problem you're working with the bias would be undetectable.
Using the same general architecture you've created, I would do something like this:
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = ret % 8 # will return pseudorandom numbers 0-7
if ret == 0:
return 1 # or you could also return the result of another call to generate_rand()
return ret
while 1:
print(generate_rand())
time.sleep(1)

Are pseudo random number generators less likely to repeat?

So they say if you flip a coin 50 times and get heads all 50 times, you're still 50/50 the next flip and 1/4 for the next two. Do you think/know if this same principle applies to computer pseudo-random number generators? I theorize they're less likely to repeat the same number for long stretches.
I ran this a few times and the results are believable, but I'm wondering how many times I'd have to run it to get an anomaly output.
def genString(iterations):
mystring = ''
for _ in range(iterations):
mystring += str(random.randint(0,9))
return mystring
def repeatMax(mystring):
tempchar = ''
max = 0
for char in mystring:
if char == tempchar:
count += 1
if count > max:
max = count
else:
count = 0
tempchar = char
return max
for _ in range(10):
stringer = genString()
print repeatMax(stringer)
I got all 7's and a couple 6's. If I run this 1000 times, will it approximate a normal distribution or should I expect it to stay relatively predictable? I'm trying to understand the predictability of pseudo random number generation.
Failure to produce specific patterns is a typical weakness of PRNGs, but the probability of hitting a substantial run of repeated digits at random is so small it's hard to demonstrate that weakness.
It's perfectly reasonable for a PRNG to use only a 32-bit state, which (traditionally) means producing a sequence of four billion numbers and then repeating from the start again. In that case your sequence of 50 coin-flips coming out the same is probably never going to happen (four billion tries at something that has a one in a quadrillion chance is unlikely to succeed); but if it does, then it's going to appear way too often.
Superficially you're looking for k-dimensional equidistribution as a test for whether or not you can expect to find a prescribed pattern in the output without deeper analysis of the specific generator. If your generator claims at least 50-dimensional equidistribution then you're guaranteed to see the 50-heads state at least once.
However, if your generator emits 32-bit results but you only test whether each result maps to heads or tails, you have some chance at success even if the generator fails the k-dimension test, and that chance depends on the specifics of the generator and the mapping function.
If you adjust the implementation of your generator to return just one bit at a time, then you have an opportunity to try to squeeze 50 heads out of just 50 bits of state (or potentially as few as 18, but that generator would probably be faulty). Provided the generator visits all 2**50 possible states, one of those states will produce 50 heads in a row. You may get a few more heads when adjacent states start or end with more zeroes.

Attempt to "go back" without goto statement

The code examples are gonna be in Lua, but the question is rather general - it's just an example.
for k=0,100 do
::again::
local X = math.random(100)
if X <= 30
then
-- do something
else
goto again
end
end
This code generates 100 pseudorandom numbers between 0-30. It should do it between 0-100, but doesn't let the loop go on if any of them is larger than 30.
I try to do this task without goto statement.
for k=0,100 do
local X = 100 -- may be put behind "for", in some cases, the matter is that we need an 'X' variable
while X >= 30 do --IMPORTANT! it's the opposite operation of the "if" condition above!
X = math.random(100)
end
-- do the same "something" as in the condition above
end
Instead, this program runs the random number generation until I get a desired value. In general, I put all the codes here that was between the main loop and the condition in the first example.
Theoretically, it does the same as the first example, only without gotos. However, I'm not sure in it.
Main question: are these program codes equal? They do the same? If yes, which is the faster (=more optimized)? If no, what's the difference?
It is bad practice to use Goto. Please see http://xkcd.com/292/
Anyway, I'm not much into Lua, but this looks simple enough;
For your first code: What you are doing is starting a loop to repeat 100 times. In the loop you make a random number between 0 and 100. If this number is less than or equal to 30, you do something with it. If this number is greater than 30, you actually throw it away and get another random number. This continues until you have 100 random numbers which will ALL be less than or equal to thirty.
The second code says: Start a loop from 0 to 100. Then you set X to be 100. Then you start another loop with this condition: As long as X is greater than 30, keep randomizing X. Only when X is less than 30 will your code exit and perform some action. When it has performed that action 100 times, the program ends.
Sure, both codes do the same thing, but the first one uses a goto - which is bad practice regardless of efficiency.
The second code uses loops, but is still not efficient - there are 2 levels of loops - and one is based on psuedo-random generation which can be extremely inefficient (maybe the CPU generates only numbers between 30-100 for a trillion iterations?) Then things get very slow. But this is also true for you're first piece of code - it has a 'loop' that is based on psuedo-random number generation.
TLDR; strictly speaking about efficiency, I do not see one of those being more efficient than the other. I could be wrong but it seems the same things is going on.
you can directly use math.random(lower, upper)
for k=0,100 do
local X = math.random(0, 30)
end
even faster.
As I see this pieces of code do the same, but using goto always isn't the best choice (in any programming language). For lua see details here

Writing a program that writes a program

Its well known in theoretical computer science that the "Hello world tester" program is an undecidable problem.(Here is a link what i mean by hello world tester
My question is since given a program as input we can't say what the program will do,can we solve the reverse problem:
Given set of input and output,is there an algorithm for writing a program that writes a program to achieve a one to one mapping between the given input and output.
I know about metaprogramming but my question is more of theoretical interest. Something which can apply for a generic case.
With these kind of musings one has to be very careful. A lot of confusion arises from not clearly distinguishing about a program x for which proposition P(x) holds or any program x for which proposition P(x) hold. As long as the set of programs for which P(x) holds is finite there always is a proof, of their correctness (although this proof may not be known).
At this point you also have to distinguish between programs, which are and can be known and programs which can only be shown to exist by full enumeration of all posibilities. Let's make an example:
Take 10 Programs, which take no input and may or may not terminate and produce "hello World". Then there is a program which decides exactly which of these programs are correct, and which aren't. Lets call these programs (x_1,...,x_10). Then take the programs (X_0,...,X_{2^10}) where X_i output true for program x_j if the j-th bit in the binary representation of i is set. One of these programs has to be the one which decides correctly for all ten x_i, there just might not be any way to ever figure out which one of these 100 X_js is the correct one (a meta-problem at this point).
This goes to show that considering finite sets of programs and input/output pairs one can always resolve to full enumeration and all halting-problem type of paradoxies instantly disappear. In your case the set of generated programs for each input is of size one and the set of input/output pairs is of finite size (because you have to supply it to the meta-program). Hence full enumeration solves your problem very simple and you can also easily proof both the correctness of the corrected program as well as the correctness of the meta-program.
Note: Since the set of generated programs is infinite, this is one of the few cases where you can proof P(x) for a infinite set of programs (actually you can proof P(x,input,output) for this set). This shows that the set being infinite is only a necessary, not a sufficient condition for this type of paradoxes to appear.
Your question is ambiguously phrased.
How would you specify "what a program should do"?
Any precise, complete, and machine-readable specification of a program's functionality is already a program.
Thus, the answer to your question is, a compiler.
Now, you're asking how to find a function based on a sample of its input and output.
That is a question about statistical analysis that I cannot answer.
Sounds like you want to generate a state machine that learns by being given an input sequence and then updates itself to produce the appropriate output sequence. Assuming your output sequences are always the same for the same input sequence it should be simple enough to write. If your output is not deterministic, such as changing the output depending on the time of day, then you cannot automatically generate a state machine.
Depends on what you mean by "one to one mapping". (And also, I suppose, "input" and "output".)
My guess is that you're asking whether, given an example of inputs and outputs for a given arbitrary program, can one devise an algorithm to write an equivalent program? If so, the answer is no. Eg, you could have a program with the inputs/outputs of 1/1, 2/2, 3/3, 4/4, and yet, if the next input value was 5, the output would be 3782. There's no way to know, from a given set of results, what the next result might be.
The question is underspecified since you did not say how the input and output are presented. For finite lists, the answer is "yes", as in this Python code:
def f(input,output):
print "def g():"
print " x = {" + ",".join(repr(x) + ":" + repr(y) for x,y in zip(input,output)) + "}"
print " print x[raw_input()]"
>>> f(['2','3','4'],['a','b','x'])
def g():
x = {'2':'a','3':'b','4':'x'}
print x[raw_input()]
>>> def g():
... x = {'2':'a','3':'b','4':'x'}
... print x[raw_input()]
...
>>> g()
3
b
for infinite sets how are you going to present them? If you show only a small sample of input this does not specify the whole algorithm. Guessing the best fit is undecidable. If you have a "magic blackbox" then there are continuum many mappings but only a countable number of programs, so it's impossible.
I think I agree with SLaks, but taking things from a different angle, what does a compiler do?
(EDIT: I see SLaks edited his original answer, which was essentially 'you're describing the identity function').
It takes a program in one source language that describes the intended behaviour of a program, and "writes" another program in a target language that exhibits that behaviour.
We could also think of this in terms of things like process refinement --- given an abstract specification, we can construct a refinement mapping to some "more concrete" (read: less non-deterministic, usually) implementation.
But based on your question, it's really very difficult to tell which of these you meant, if any.

Print 1 followed by googolplex number of zeros

Assuming we are not concerned about running time of the program (which is practically infinite for human mortals) and using limited amount of memory (2^64 bytes), we want to print out in base 10, the exact value of 10^(googolplex), one digit at a time on screen (mostly zeros).
Describe an algorithm (which can be coded on current day computers), or write a program to do this.
Since we cannot practically check the output, so we will rely on collective opinion on the correctness of the program.
NOTE : I do not know the solution, or whether a solution exists or not. The problem is my own invention. To those readers who are quick to mark this offtopic... kindly reconsider. This is difficult and bit theoretical but definitely CS.
This is impossible. There are more states (10^(10^100)) in the program than there are electrons in the universe (~10^80). Therefore, in our universe, there can be no such realization of a machine capable of executing the task.
First of all, we note that 10^(10^100) is equivalent to ((((10^10)^10)^...)^10), 100 times.
Or 10↑↑↑↑↑↑↑↑↑↑10.
This gives rise to the following solution:
print 1
for i in A(10, 100)
print 0
in bash:
printf 1
while true; do
printf 0
done
... close enough.
Here's an algorithm that solves this:
print 1
for 1 to 10^(10^100)
print 0
One can trivially prove correctness using Hoare logic:
There are no pre-conditions
The post condition is that a one followed by 10^(10^100) zeros are printed
The cycle's invariant is that the number of zeros printed so far is equal to i
EDIT: A machine to solve the problem needs the ability to distinguish between one googolplex of distinct states: each state is the result of printing one more zero than the previous. The amount of memory needed to do this is the same needed to store the number one googolplex. If there isn't that much memory available, this problem cannot be solved.
This does not mean it isn't a computable problem: it can be solved by a Turing machine because a Turing machine has a limitless amount of memory.
There definitely is a solution to this problem in theory, assuming of course you have a machine that is capable of producing that sort of output. I'm pretty sure that a googolplex is larger than the number of atoms in the universe, at least according to what the physicists tell us, so I don't think that any physically realizable model of computation could print it out. However, mathematically speaking, you could define a Turing machine capable of printing out the value by just giving it a googolplex-ish number of states and having each write a zero and then move to the next lower state.
Consider the following:
The console window to which you are printing the output will have a maximum buffer size.
When this buffer size is exceeded, anything printed earlier is discarded, and the user will not be able to scroll back to see it.
The maximum buffer size will be minuscule compared to a googolplex.
Therefore, if you want to mimic the user experience of your program running to completion, find the maximum buffer size of the console you will print to and print that many zeroes.
Hurray laziness!

Resources