Algorithm to find an arbitrarily large number - algorithm

Here's something I've been thinking about: suppose you have a number, x, that can be infinitely large, and you have to find out what it is. All you know is if another number, y, is larger or smaller than x. What would be the fastest/best way to find x?
An evil adversary chooses a really large number somehow ... say:
int x = 9^9^9^9^9^9^9^9^9^9^9^9^9^9^9
and provides isX, isBiggerThanX, and isSmallerThanx functions. Example code might look something like this:
int c = 2
int y = 2
while(true)
if isX(y) return true
if(isBiggerThanX(y)) fn()
else y = y^c
where fn() is a function that, once a number y has been found (that's bigger than x) does something to determine x (like divide the number in half and compare that, then repeat). The thing is, since x is arbitrarily large, it seems like a bad idea to me to use a constant to increase y.
This is just something that I've been wondering about for a while now, I'd like to hear what other people think

Use a binary search as in the usual "try to guess my number" game. But since there is no finite upper end point, we do a first phase to find a suitable one:
Initially set the upper end point arbitrarily (e.g. 1000000, though 1 or 1^100 would also work -- given the infinite space to work in, all finite values are equally disproportionate).
Compare the mystery number X with the upper end point.
If it's not big enough, double it, and try again.
Once the upper end point is bigger than the mystery number, proceed with a normal binary search.
The first phase is itself similar to a binary search. The difference is that instead of halving the search space with each step, it's doubling it! The cost for each phase is O(log X). A small improvement would be to set the lower end point at each doubling step: we know X is at least as high as the previous upper end point, so we can reuse it as the lower end point. The size of the search space still doubles at each step, but in the end it will be half as large as would have been. The cost of the binary search will be reduced by only 1 step, so its overall complexity remains the same.
Some notes
A couple of notes in response to other comments:
It's an interesting question, and computer science is not just about what can be done on physical machines. As long as the question can be defined properly, it's worth asking and thinking about.
The range of numbers is infinite, but any possible mystery number is finite. So the above method will eventually find it. Eventually is defined such as that, for any possible finite input, the algorithm will terminate within a finite number of steps. However since the input is unbounded, the number of steps is also unbounded (it's just that, in every particular case, it will "eventually" terminate.)

If I understand your question correctly (advise if I do not), you're asking about how to solve "pick a number from 1 to 10", except that instead of 10, the upper bound is infinity.
If your number space is truly infinite, the following are true:
The value will never be held in an int (or any other data type) on any physical hardware
You will NEVER find your number
If the space is immensely large but bound, I think the best you can do is a binary search. Start at the middle of the number range. If the desired number turns out to be higher or lower, divide that half of the number space, and repeat until the desired number is found.
In your suggested implementation you raise y ^ c. However, no matter how large c is chosen to be, it will not even move the needle in infinite space.

Infinity isn't a number. Thus you can't find it, even with a computer.

That's funny. I've wondered the same thing for years, though I've never heard anyone else ask the question.
As simple as your scenario is, it still seems to provide insufficient information to allow the choice of an optimal strategy. All one can choose is a suitable heuristic. My heuristic had been to double y, but I think that I like yours better. Yours doubles log(y).
The beauty of your heuristic is that, so long as the integer fits in the computer's memory, it finds a suitable y in logarithmic time.
Counter-question. Once you find y, how do you proceed?

I agree with using binary search, though I believe that a ONE-SIDED binary search would be more suitable, since here the complexity would NOT be O( log n ) [ Where n is the range of allowable numbers ], but O( log k ) - where k is the number selected by your adversary.
This would work as follows : ( Pseudocode )
k = 1;
while( isSmallerThanX( k ) )
{
k = k*2;
}
// At this point, once the loop is exited, k is bigger than x
// Now do normal binary search for the range [ k/2, k ] to find your number :)
So even if the allowable range is infinity, as long as your number is finite, you should be able to find it :)

Your method of tetration is guaranteed to take longer than the age of the universe to find an answer, if the opponent merely uses a paradigm which is better (for example, pentation). This is how you should do it:
You can only do this with symbolic representations of numbers, because it is trivial to name a number your computer cannot store in floating-point representation, even if it used arbitrary-precision arithmetic and all its memory.
Required reading: http://www.scottaaronson.com/writings/bignumbers.html - that pretty much sums it up
How do you represent a number then? You represent it by a program which will, if run to completion, print out that number. Even then, your computer is incapable of computing BusyBeaver(10^100) (if you dictated a program 1 terabyte in size, this well over the maximum number of finite clock cycles it could run without looping forever). You can see that we could easily have the computer print out 1 0 0... each clock cycle, making the maximum number it could say (if we waited nearly an eternity) would be 10^BusyBeaver(10^100). If you allowed it to say more complicated expressions like eval(someprogram), power-towers, Ackermann's function, whatever-- then I believe that would be no better than increasing the original 10^100 by some constant proportional to the complexity of what you described (plus some logarithmic interpreter factor, see Kolmogorov complexity).
So let's frame this another way:
Your opponent picks a finite computable number, and gives you a function tells you if the number is smaller/larger/equal by computing it. He also gives you a representation for the output (in a sane world this would be "you can only print numbers like 99999", but he can make it more complicated; it actually doesn't matter). Proceed to measure the size of this function in bits.
Now, answer with your own function, which is twice the size of his function (in bits), and prints out the largest number it can while keeping the code to less than 2N bits in length. (You use the same representation he chose: In a world where you can only print out numbers like "99999", that's what you do. If you can define functions, it gets slightly more complicated.)

I do not understand the purpose here, but I this is what I thought of:
Reading your comments, I suppose you aren't looking for infinitely large number, but a "super large number" instead. And whatever be the number, it will have a large no. of digits. How you got them, isn't the concern. Keeping this in mind:
No complex computation is required. Just type random keys on your numeric keyboard to have a super large number, and then have a program randomly add/remove/modify digits of that number. You get a list of very large numbers - select any one out of them.
e.g: 3672036025039629036790672927305060260103610831569252706723680972067397267209
and keep modifying/adding digits to get more numbers
PS: If you state the purpose in your question clearly, we might be able to give better answers.

Related

Does O(1) mean an algorithm takes one step to execute a required task?

I thought it meant it takes a constant amount of time to run. Is that different than one step?
O(1) is a class of functions. Namely, it includes functions bounded by a constant.
We say that an algorithm has the complexity of O(1) iff the amount of steps it takes, as a function of the size of the input, is bounded by a(n arbirtary) constant. This function can be a constant, or it can grow, or behave chaotically, or undulate as a sine wave. As long as it never exceeds some predefined constant, it's O(1).
For more information, see Big O notation.
It means that even if you increase the size of whatever the algorithm is operating on, the number of calculations required to run remains the same.
More specifically it means that the number of calculations doesn't get larger than some constant no matter how big the input gets.
In contrast, O(N) means that if the size of the input is N, the number of steps required is at most a constant times N, no matter how big N gets.
So for example (in python code since that's probably easy for most to interpret):
def f(L, index): #L a list, index an integer
x = L[index]
y=2*L[index]
return x + y
then even though f has several calculations within it, the time taken to run is the same regardless of how long the list L is. However,
def g(L): #L a list
return sum(L)
This will be O(N) where N is the length of list L. Even though there is only a single calculation written, the system has to add all N entries together. So it has to do at least one step for each entry. So as N increases, the number of steps increases proportional to N.
As everyone has already tried to answer it, it simply means..
No matter how many mangoes you've got in a box, it'll always take you the same amount of time to eat 1 mango. How you plan on eating it is irrelevant, there maybe a single step or you might go through multiple steps and slice it nicely to consume it.

a puzzle about definition of the time complexity

Wikipedia defines time complexity as
In computer science, the time complexity of an algorithm quantifies
the amount of time taken by an algorithm to run as a function of the
length of the string representing the input.
What's mean of the strong part?
I know algorithm may be treated as a function but why its input must be "the length of the string representing"?
The function in the bolder part means the time complexity of the algorithm, not the algorithm itself. An algorithm may be implemented in a programming language that has a function keyword, but that's something else.
Algorithm MergeSort has as input a list of 32m bits (assuming m 32-bit values). It's time complexity T(n) is a function of n = 32m, the input size, and in the worst case is bound from above by O(n log n). MergeSort could be implemented as a function in C or JavaScript.
The defition is derived from the context of Turing machines where you define different states. Every function which you can compute with a computer is also computeable with turing machine.(i would say that computer computes a function on the basis of turing machine)
Every fucntion is just a mapping from one domain to other domain or same domain.
Before going to Turing machines look at the concept of finite-automata.It has finite states.If your input is of length n then it's possible that it only needs two stats but it has to visit those states n times where n is legnth of the string.
Not a good sketch but look at the image below ,our final state is C , means if
end with a string in c our string will be accepted.
We use unary numberal system. We want to check if this this string gets accepted by our automata: string is 010101010
When we read 0 from A we move to B and if we read again a 0 we will move to C and if we end with 0 our string gets accepted otherwise we move to A again.
In computer you represent numbers as strings with length n and in order to compute it you have to visit each character of the string.
Turing machines work on the same way but finite-automata only is limited to regular languages. How this is a big theory
Did you ever try to think how computer computes a function 2*x where x is your input.
It's fun :D . Suppose i want to compute 20*2 and i represent this number with unary numeral system because it's easy. so we represetn 0 with , 1 with 11 , 2 with 111 and etc so if we convert 20 to unary sytem we get 1111. You can think of a turing machine or computer(not advanced) , a system with linear memory.
Suppose empty spots in your memory are presented with # .
With you input you have something like this: ###1111#### where # means empty slot in memory, with your input head of your turing machine is at first 1 so you keep moving forward until you find first # once you find this you just replace it with * which is just a helping symbol and change the right side of # with 1 now move back and change one more 1 to * and write one 1 on the right side when you find a # keep doing this and you will be left with all * on the left hand side and all 1s on the right side, Now change all *s back to 1 and you have 2*x. Here is the trace and you have 2*x where x was your input.
The point is that the only thing these machines remember is the state.
#####1111#######
#####111*1######
#####11**11#####
#####1***111####
#####****1111###
#####11111111###
Summary
If there is some input, it is expressed as a string. So you have a length of that string. Then you have a function F that maps the length of the input (as a string) to the time needed by A to compute this input (in the worst case).
We call this F time complexity.
Say we have an algorithm A. What is its time complexity?
The very easy case
If A has constant complexity, the input doesn't matter. The input could be a single value, or a list, or a map from strings to lists of lists. The algorithm will run for the same amount of time. 3 seconds or 1000 ticks or a million years or whatever. Constant time values not depending on the input.
Not much complexity at all to be honest.
Adding complexity
Now let's say for example A is an algorithm for sorting list of integer numbers. It's clear that the time needed by A now depends on the length of the list. A list of length 0 is literally sorted in no time (but checking the length of the list) but this changes if the length of the input list grows.
You could say there exists a function F that maps the list length to the seconds needed by A to sort a list of that length. But wait! What if the list is already sorted? So for simplicity let's always assume a worst case scenario: F maps list length to the maximum of seconds needed by A to sort a list of that length.
You could measure in seconds, CPU cycles, ticks, or whatever. It doesn't depend on the units.
Generalizing a bit
What with all the other algorithms? How to measure time complexity for an algorithm that cooks me a nice meal?
If you cannot define any input parameter then we're back in the easy case: constant time. If there is some input it is expressed as a string. So you have a length of that string. And - similar to what has been said above - then you have a function F that maps the length of the input (as a string) to the time needed by A to compute this input (in the worst case).
We call this F time complexity.
That's too simple
Yeah, I know. There is the average case and the best case, there is the big O notation and asymptotic complexity. But for explaining the bold part in the original question this is sufficient, I think.

Guessing a number knowing only if the number proposed is lower or higher?

I need to guess a number. I can only see if the number I'm proposing is lower or higher. Performance matters a whole lot, so I thought of the following algorithm:
Let's say the number I'm trying to guess is 600.
I start out with the number 1000 (or for even higher performance, the average result of previous numbers).
I then check if 1000 is higher or lower than 600. It is higher.
I then divide the number by 2 (so that it is now 500), and check if it is lower or higher than 600. It is lower.
I then find the difference and divide it by 2 in the following way to retrieve a new number: (1000 + 500) / 2. The result is 750. I then check that number.
And so on.
Is this the best approach or is there a smarter way of doing this? For my case, every guess takes approximately 500 milliseconds, and I need to guess quite a lot of numbers in as low time as possible.
I can roughly assume that the average result of previous guesses is close to the upcoming numbers too, so there's a pattern there which I can use for my own advantage.
yes binary search is the most effective way of doing this. Binary Search is what you described. For a number between 1 and N Binary Search runs in O(log(n)) time.
So here is the algorithm to find a number between 1-N
int a = 1, b = n, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while
Well, you're taking the best possible approach without the extra information - it's a binary search, basically.
Exactly how you use the "average result of previous guesses" is up to you; it would suggest biasing the results towards that average, but you'd need to perform analysis of just how indicative previous results are in order to work out the best approach. Don't just use the average: use the complete distribution.
For example, if all the results have been in the range 600-700 (even though the hypothetical range is up to 1000) with an average of 670, you might start with 670 but if it says "guess higher" then you would probably want to choose a value between 670 and 700 as your next guess, rather than 835 (which is very likely to be higher than the real result).
I suggest you log all the results from previous enquiries, so you can then use that as test data for alternative approaches.
In general, binary search starting at the middle point of the range is the optimal strategy. However, you have additional specific information which may make this a suboptimal strategy. This depends critically in what exactly "close to the average of the previous results" means.
If numbers are close to the previous average then dividing by 2 in the second step is not optimal.
Example: Previous numbers 630, 650, 620, 660. You start with 640.
Your number is actually closer. Imagine that it is 634.
The number is lower. If in the second step you divide by 2, you get 320, thus losing any advantage about the previous average numbers.
You should analyze the behaviour further. It may be optimal, in your specific case, to start at the mean of the N previous numbers and then add or substract some quantity related to the standard deviation of the previous numbers.
Yes, binary search (your algorithm) is correct here. However there is one thing missing in the standard binary search:
For binary search you normally need to know the maximum and minimum between which you are searching. In case you do not know this, you have to iteratively find the maximum in the beginning, like so:
Start with zero
if it is higher than the number searched, zero is your maximum and you have to find a minimum
if it is lower than the number searched, zero is your minimum and you have to find a maximum
You can search for your maximum/minimum by starting at 1 or -1 and always multiplying by two until you find a number which is greater/smaller
When you always multiply by two, you will be much faster than when you search linearly.
Do you know the range of possible values? If yes, always start in the middle and do exactly what you describe.
A standard binary search between 0 and N(N is the given number) will give you the answer in logN time.
int a = 1, b = n+1, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while
You got to add +1 to n else you can never get n since it's an int.
I gave an answer to a similar question "Optimal algorithm to guess any random integer without limits?"
Actually, provided there algorithm not just searches for the conceived number, but it estimates a median of the distribution of the number that you may re-conceive at each step! And also the number could be even from the real domain ;)

Is there a Sorting Algorithm that sorts in O(∞) permutations?

After reading this question and through the various Phone Book sorting scenarios put forth in the answer, I found the concept of the BOGO sort to be quite interesting. Certainly there is no use for this type of sorting algorithm but it did raise an interesting question in my mind-- could their be a sorting algorithm that is infinitely impossible to complete?
In other words, is there a process where one could attempt to compare and re-order a fixed set of data and can yet never achieve an actual sorted list?
This is much more of a theoretical/philosophical question than a practical one and if I was more of a mathematician I'd probably be able to prove/disprove such a possibility. Has anyone asked this question before and if so, what can be said about it?
[edit:] no deterministic process with a finite amount of state takes "O(infinity)" since the slowest it can be is to progress through all possible states. this includes sorting.
[earlier, more specific answer:]
no. for a list of size n you only have state space of size n! in which to store progress (assuming that the entire state of the sort is stored in the ordering of the elements and it really is "doing something," deterministically).
so the worst possible behaviour would cycle through all available states before terminating and take time proportional to n! (at the risk of confusing matters, there must be a single path through the state - since that is "all the state" you cannot have a process move from state X to Y, and then later from state X to Z, since that requires additional state, or is non-deterministic)
Idea 1:
function sort( int[] arr ) {
int[] sorted = quicksort( arr ); // compare and reorder data
while(true); // where'd this come from???
return sorted; // return answer
}
Idea 2
How do you define O(infinity)? The formal definition of Big-O merely states that f(x)=O(g(x)) implies that M*g(x) is an upper bound of f(x) given sufficiently large x and some constant M.
Typically when you talking about "infinity", you are talking about some sort of unbounded limit. So in this case, the only reasonable definition is saying that O(infinity) is O(function that's larger than every function). Obviously a function that's larger than every function is an upper bound. Thus technically everything is "O(infinity)"
Idea 3
Assuming you mean theta notation (tight bound)...
If you impose the additional restriction that the algorithm is smart (returns when it finds a sorted permutation) and every permutation of the list must be visited in a finite amount of time, then the answer no. There are only N! permutations of a list. The upper bound for such a sorting algorithm is then a finite over finite numbers, which is finite.
Your question doesn't really have much to do with sorting. An algorithm which is guaranteed never to complete would be pretty dull. Indeed, even an algorithm which would might or might not ever complete would be pretty dull. Much more interesting would be an algorithm which would be guaranteed to complete, eventually, but whose worst-case computation time with respect to the size of the input would not be expressible as O(F(N)) for any function F that could itself be computed in bounded time. My hunch would be that such an algorithm could be devised, but I'm not sure how.
How about this one:
Start at the first item.
Flip a coin.
If it's heads, switch it with the next item.
If it's tails, don't switch them.
If list is sorted, stop.
If not, move onto the next pair ...
It's a sorting algorithm -- the kind a monkey might do. Is there any guarantee that you'll arrive at a sorted list? I don't think so!
Yes -
SortNumbers(collectionOfNumbers)
{
If IsSorted(collectionOfNumbers){
reverse(collectionOfNumbers(1:end/2))
}
return SortNumbers(collectionOfNumbers)
}
Input: A[1..n] : n unique integers in arbitrary order
Output: A'[1..n] : reordering of the elements of A
such that A'[i] R(A') A'[j] if i < j.
Comparator: a R(A') b iff A'[i] = a, A'[j] = b and i > j
More generally, make the comparator something that's either (a) impossible to reconcile with the output specification, so that no solution can exist, or (b) uncomputable (e.g., sort these (input, turing machine) pairs in order of the number of steps needed for the machine to halt on the input).
Even more generally, if you have a procedure that fails to halt on a valid input, the procedure is not an algorithm which solves the problem on that input/output domain... which means you don't have an algorithm at all, or that what you have is only an algorithm if you appropriately restrict the domain.
Let's suppose that you have a random coin flipper, infinite arithmetic, and infinite rationals. Then the answer is yes. You can write a sorting algorithm which has 100% chance of successfully sorting your data (so it really is a sorting function), but which on average will take infinite time to do so.
Here is an emulation of this in Python.
# We'll pretend that these are true random numbers.
import random
import fractions
def flip ():
return 0.5 < random.random()
# This tests whether a number is less than an infinite precision number in the range
# [0, 1]. It has a 100% probability of returning an answer.
def number_less_than_rand (x):
high = fractions.Fraction(1, 1)
low = fractions.Fraction(0, 1)
while low < x and x < high:
if flip():
low = (low + high) / 2
else:
high = (low + high) / 2
return high < x
def slow_sort (some_array):
n = fractions.Fraction(100, 1)
# This loop has a 100% chance of finishing, but its average time to complete
# is also infinite. If you haven't studied infinite series and products, you'll
# just have to take this on faith. Otherwise proving that is a fun exercise.
while not number_less_than_rand(1/n):
n += 1
print n
some_array.sort()

How to design an efficient algorithm for least upper bound search

Let's say you have some set of numbers with a known lower bound and unknown upper bound, i.e. 0, 1, 2, 3, ... 78 where 78 is the unknown. Assume for the moment there are no gaps in between numbers. There is a time-expensive function test() that tests if a number is in the set.
What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set?
What if you have the added knowledge that the upper bound is 75 +/- 25?
What if there are random gaps between numbers in the set, i.e. 0, 1, 3, 4, 7, ... 78?
For the "no gaps case":
I assume that this is a fixed size of number, e.g. a 32 bit int
We wish to find x such that test(x) == true, test(x+1) == false, right?
You basically do a binary chop between the lowest known "not in set" (e.g. the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. This would give an O(log N) solution (in terms of numbers of calls to test()) where X is the size of the potential set, not the actual set. This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones.
All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". Any other strategy will fail or be more expensive as far as I can see.
Your best bet is to simply run through the set with O(n) complexity, which is not bad.
Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places...
A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. This will be O(1) for all cases.
Set Step to 1
set Upper to Lower + Step
if test(Upper) is true then set Lower to Upper, multiply Step by 2 and go to point 2
at this point you know that Lower is in your set while Upper is not. You can now do a binary search between Lower and Upper to find the limit.
This looks like O(log n * O(test)) complexity.
If you know that Upper is between 50 and 100, Do a binary search between these two values.
If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set.
If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound.
Maybe you should just traverse through it? It would be O(n) complex. I think there is no other way to do this.
Do you know the set size, before hand?
Actually, I guess you probably don't - otherwise the first problem would be trivial.
It would help if you had some idea how big the set was though.
Take a guess at the top value
Test - if in then increment value by some amount
If not in then decrease value by some amount
Once you have upper and lower bounds for largest value, binary search till you find it (to required precision).
For the gaps you've no such ability - you can't even tell when you've found the largest element. (Unless you known the maximum gap size)
If there are no gaps, then you are probably best off with a binary search.
If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50).
If we can't assume a possible upper range, we just have to made educated guess at what it might be. If a value is not found, it becomes the high end. If it is found, it's the low end, and we double it to find the high end.
If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap.
If your set happens to be the set of prime numbers, let me know when you find the biggest one. I'm sure we can work something out. ;)
But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. Or, you're chopping it to a 32-bit integer.
A couple of suggestions:
1) Think of every case you can that would speed a result of test(x) == false. Then you can go on to the next one. If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead.
2) Can you gain any information from each test? For example, does test(x) == false imply that test(x+5679) == false as well?

Resources