Generate a valid expression that computes to given N - algorithm

This was asked to me in an interview,
Given a list of integer numbers, a list of symbols [+,-,*,/] and a target number N,
provide an expression which evaluates to N or return False if that is not possible.
e.g. let the list of numbers be [1,5,5] and the target number is 9, one possible
solution could be 5+5-1.
Now, my solution was a brute-force recursive solution that runs through all possible numbers and all possible operations and the recursion terminated either when the number exceeded N or was equal to N.
This got me wondering if there was a better, more refined solution. Any thoughts on this? I was thinking some kind of reverse construction of an expression tree.

I'm gonna go ahead and say this interview question cannot be about anything more than trying to narrow the problem down by asking questions. There is an extremely large list of questions you haven't covered that could be important to the solution, for
Do the numbers stay integers when you divide them, so is 1/5 a float, 0 or a big decimal
Can the numbers and operators repeat if only one is in the input, if so there seems to be no way to terminate if you can't find a solution
can you use parentheses or can the input have parentheses
can the numbers be negative
can you just print true or false or do you have to find a valid solution
One thing from those questions I notice is that if division works by rounding and you have a + and / in the operator list, you can always divide until it rounds to 1 and then just add. Also if you can repeat multiplication is essentially irrelevant because it can be replaced by many additions.
The reason I am sure that your interviewer wanted you to ask more clarifying questions is because even the small set of questions that I thought of change the problem in a big way.
One last thing to consider, this problem is a superset of the knapsack problem which is already known to be np-complete, so there is obviously no polynomial time solution.

Related

String analysis

Given a sequence of operations:
a*b*a*b*a*a*b*a*b
is there a way to get the optimal subdivision to enable reusage of substring.
making
a*b*a*b*a*a*b*a*b => c*a*c, where c = a*b*a*b
and then seeing that
a*b*a*b => d*d, where d = a*b
all in all reducing the 8 initial operations into the 4 described here?
(c = (d = a*b)*d)*a*c
The goal of course is to minimize the number of operations
I'm considering a suffixtree of sorts.
I'm especially interested in linear time heuristics or solutions.
The '*' operations are actually matrix multiplications.
This whole problem is known as "Common Subexpression Elimination" or CSE. It is a slightly smaller version of the problem called "Graph Reduction" faced by the implementer of compilers for functional programming languages. Googling "Common Subexpression elimination algorithm" gives lots of solutions, though none that I can see especially for the constraints given by matrix multiplication.
The pages linked to give a lot of references.
My old answer is below. However, having researched a bit more, the solution is simply building a suffix tree. This can be done in O(N) time (lots of references on the wikipedia page). Having done this, the sub-expressions (c, d etc. in your question) are just nodes in the suffix tree - just pull them out.
However, I think MarcoS is on to something with the suggestion of Longest repeating Substring, as graph reduction precedence might not allow optimisations that can be allowed here.
sketch of algorithm:
optimise(s):
let sub = longestRepeatingSubstring(s).
optimisedSub = optimise(sub)
return s with sub replaced by optimisedSub
Each run of longest repeating substring takes time N. You can probably re-use the suffix tree you build to solve the whole thing in time N.
edit: The orders-of-growth in this answer are needed in addition to the accepted answer in order to run CSE or matrix-chain multiplication
Interestingly, a compression algorithm may be what you want: a compression algorithm seeks to reduce the size of what it's compressing, and if the only way it can do that is substitution, you can trace it and obtain the necessary subcomponents for your algorithm. This may not give nice results though for small inputs.
What subsets of your operations are commutative will be an important consideration in choosing such an algorithm. [edit: OP says no operations are commutative in his/her situation]
We can also define an optimal solution, if we ignore effects such as caching:
input: [some product of matrices to compute]
given that multiplying two NxN matrices is O(N^2.376)
given we can visualize the product as follows:
[[AxB][BxC][CxD][DxE]...]
we must for example perform O(max(A,B,C)^2.376) or so operations in order to combine
[AxB][BxC] -> [AxC]
The max(...) is an estimate based on how fast it is to multiply two square matrices;
a better estimate of cost(A,B,C) for multiplying an AxB * BxC matrix can be gotten
from actually looking at the algorithm, or running benchmarks if you don't know the
algorithm used.
However note that multiplying the same matrix with itself, i.e. calculating
a power, can be much more efficient, and we also need to take that into account.
At worst, it takes log_2(power) multiplies each of O(N^2.376), but this could be
made more efficient by diagonalizing the matrix first.
There is the question about whether a greedy approach is feasible for not: whether one SHOULD compress repeating substrings at each step. This may not be the case, e.g.
aaaaabaab
compressing 'aa' results in ccabcb and compressing 'aab' is now impossible
However I have a hunch that, if we try all orders of compressing substrings, we will probably not run into this issue too often.
Thus having written down what we want (the costs) and considered possibly issues, we already have a brute-force algorithm which can do this, and it will run for very small numbers of matrices:
# pseudocode
def compress(problem, substring)
x = new Problem(problem)
x.string.replaceall(substring, newsymbol)
x.subcomputations += Subcomputation(newsymbol=substring)
def bestCompression(problem)
candidateCompressions = [compress(problem,substring) for each substring in problem.string]
# etc., recursively return problem with minimum cost
# dynamic programming may help make this more efficient, but one must watch
# out for the note above, how it may be hard to be greedy
Note: according to another answer by Asgeir, this is known as the Matrix Chain Multiplication optimization problem. Nick Fortescue notes this is also known more generally as http://en.wikipedia.org/wiki/Common_subexpression_elimination -- thus one could find any generic CSE or Matrix-Chain-Multiplication algorithm/library from the literature, and plug in the cost orders-of-magnitude I mentioned earlier (you will need those nomatter which solution you use). Note that the cost of the above calculations (multiplication, exponentiation, etc.) assume that they are being done efficiently with state-of-the-art algorithms; if this is not the case, replace the exponents with appropriate values which correspond to the way the operations will be carried out.
If you want to use the fewest arithmetic operations then you should have a look at matrix chain multiplication which can be reduced to O(n log n)
From the top of the head the problem seems in NP for me. Depending on the substitutions you are doing other substitions will be possible or impossible for example for the string
d*e*a*b*c*d*e*a*b*c*d*e*a there are several possibilities.
If you take the longest common string it will be:
f = d*e*a*b*c and you could substitute f*f*e*a leaving you with three multiplications in the end and four intermediate ones (total seven).
If you instead substitute the following way:
f = d*e*a you get f*b*c*f*b*c*f which you can further substitute using g = f*b*c to
g*g*f for a total of six multiplication.
There are other possible substitutions in this problem, but I do not have the time to count them all right now.
I am guessing for a complete minimal substitution it is not only necessary to figure out the longest common substring but also the number of times each substring repeats, which probably means you have to track all substitutions so far and do backtracking. Still it might be faster than the actual multiplications.
Isn't this the Longest repeated substring problem?

Algorithm question. I don't understand sentence quoted. Please help me out

Question from Programming Pearls, 2nd edition:
Given a sequential file containing 4,300,000,000 32-bit integers, how can you find one that appears at least twice?
Solution provided in the book:
Binary search find an element that occurs at least twice by recursively searching the subinterval that contains more than half of the integers. My original solution did not guarantee that the number of integers is halved in each iteration, so the worst case run time of its log2 n passes was proportional to n·log n. Jim Saxe reduced that to linear time by observing that the search can avoid carrying too many duplicates.
When his search knows that a
duplicate must be in a current range
of m integers, it will only store m+1
integers on its current work tape;
If more integers would have gone on the tape, his program discards them. Although his method frequently ignores input variables, its strategy is conservative enough to ensure that it finds at least one duplicate.
Above is content from the book. I don't understand the sentences quoted. How exactly can it be implemented? I mean, how can he know that "a duplicate must be in the current range of m integers"?
Thanks for your help!
2 ^ 32 = 4,294,967,296. You have a file with 4,300,000,000 integers, guaranteeing duplicates.
Find the central number, it should 2^31 = 2147483648. If it is less, the duplicates are most probably in the second half. If not, duplicates have occurred in the first half.
Find the central number again, it should 2^30 = 1073741824...
Repeat until you find the duplicate.
I think it refers to the pigeonhole principle, that if the difference between the minimum and the maximum element in a set is less than the cardinality of the set, there must be a duplicate.
And you can check this as you're building your subsets and stop building them as soon as you're certain a duplicate has to exist in that subset.
Wow. I think that book might be a bit old. This is a basic binary search problem.
And I think the book is kind of awkward. Maybe try wikipedia
http://en.wikipedia.org/wiki/Binary_search_algorithm

Programming Logic: Finding the smallest equation to a large number

I do not know a whole lot about math, so I don't know how to begin to google what I am looking for, so I rely on the intelligence of experts to help me understand what I am after...
I am trying to find the smallest string of equations for a particular large number. For example given the number
"39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306816"
The smallest equation is 64^64 (that I know of) . It contains only 5 bytes.
Basically the program would reverse the math, instead of taking an expression and finding an answer, it takes an answer and finds the most simplistic expression. Simplistic is this case means smallest string, not really simple math.
Has this already been created? If so where can I find it? I am looking to take extremely HUGE numbers (10^10000000) and break them down to hopefully expressions that will be like 100 characters in length. Is this even possible? are modern CPUs/GPUs not capable of doing such big calculations?
Edit:
Ok. So finding the smallest equation takes WAY too much time, judging on answers. Is there anyway to bruteforce this and get the smallest found thus far?
For example given a number super super large. Sometimes taking the sqaureroot of number will result in an expression smaller than the number itself.
As far as what expressions it would start off it, well it would naturally try expressions that would the expression the smallest. I am sure there is tons of math things I dont know, but one of the ways to make a number a lot smaller is powers.
Just to throw another keyword in your Google hopper, see Kolmogorov Complexity. The Kolmogorov complexity of a string is the size of the smallest Turing machine that outputs the string, given an empty input. This is one way to formalize what you seem to be after. However, calculating the Kolmogorov complexity of a given string is known to be an undecidable problem :)
Hope this helps,
TJ
There's a good program to do that here:
http://mrob.com/pub/ries/index.html
I asked the question "what's the point of doing this", as I don't know if you're looking at this question from a mathemetics point of view, or a large number factoring point of view.
As other answers have considered the factoring point of view, I'll look at the maths angle. In particular, the problem you are describing is a compressibility problem. This is where you have a number, and want to describe it in the smallest algorithm. Highly random numbers have very poor compressibility, as to describe them you either have to write out all of the digits, or describe a deterministic algorithm which is only slightly smaller than the number itself.
There is currently no general mathemetical theorem which can determine if a representation of a number is the smallest possible for that number (although a lower bound can be discovered by understanding shannon's information theory). (I said general theorem, as special cases do exist).
As you said you don't know a whole lot of math, this is perhaps not a useful answer for you...
You're doing a form of lossless compression, and lossless compression doesn't work on random data. Suppose, to the contrary, that you had a way of compressing N-bit numbers into N-1-bit numbers. In that case, you'd have 2^N values to compress into 2^N-1 designations, which is an average of 2 values per designation, so your average designation couldn't be uncompressed. Lossless compression works well on relatively structured data, where data we're likely to get is compressed small, and data we aren't going to get actually grows some.
It's a little more complicated than that, since you're compressing partly by allowing more information per character. (There are a greater number of N-character sequences involving digits and operators than digits alone.) Still, you're not going to get lossless compression that, on the average, is better than just writing the whole numbers in binary.
It looks like you're basically wanting to do factoring on an arbitrarily large number. That is such a difficult problem that it actually serves as the cornerstone of modern-day cryptography.
This really appears to be a mathematics problem, and not programming or computer science problem. You should ask this on https://math.stackexchange.com/
While your question remains unclear, perhaps integer relation finding is what you are after.
EDIT:
There is some speculation that finding a "short" form is somehow related to the factoring problem. I don't believe that is true unless your definition requires a product as the answer. Consider the following pseudo-algorithm which is just sketch and for which no optimization is attempted.
If "shortest" is a well-defined concept, then in general you get "short" expressions by using small integers to large powers. If N is my integer, then I can find an integer nearby that is 0 mod 4. How close? Within +/- 2. I can find an integer within +/- 4 that is 0 mod 8. And so on. Now that's just the powers of 2. I can perform the same exercise with 3, 5, 7, etc. We can, for example, easily find the nearest integer that is simultaneously the product of powers of 2, 3, 5, 7, 11, 13, and 17, call it N_1. Now compute N-N_1, call it d_1. Maybe d_1 is "short". If so, then N_1 (expressed as power of the prime) + d_1 is the answer. If not, recurse to find a "short" expression for d_1.
We can also pick integers that are maybe farther away than our first choice; even though the difference d_1 is larger, it might have a shorter form.
The existence of an infinite number of primes means that there will always be numbers that cannot be simplified by factoring. What you're asking for is not possible, sorry.

Recombine Number to Equal Math Formula

I've been thinking about a math/algorithm problem and would appreciate your input on how to solve it!
If I have a number (e.g. 479), I would like to recombine its digits or combination of them to a math formula that matches the original number. All digits should be used in their original order, but may be combined to numbers (hence, 479 allows for 4, 7, 9, 47, 79) but each digit may only be used once, so you can not have something like 4x47x9 as now the number 4 was used twice.
Now an example just to demonstrate on how I think of it. The example is mathematically incorrect because I couldn't come up with a good example that actually works, but it demonstrates input and expected output.
Example Input: 29485235
Example Output: 2x9+48/523^5
As I said, my example does not add up (2x9+48/523^5 doesn't result in 29485235) but I wondered if there is an algorithm that would actually allow me to find such a formula consisting of the source number's digits in their original order which would upon calculation yield the original number.
On the type of math used, I'd say parenthesis () and Add/Sub/Mul/Div/Pow/Sqrt.
Any ideas on how to do this? My thought was on simply brute forcing it by chopping the number apart by random and doing calculations hoping for a matching result. There's gotta be a better way though?
Edit: If it's any easier in non-original order, or you have an idea to solve this while ignoring some of the 'conditions' described above, it would still help tremendously to understand how to go about solving such a problem.
For numbers up to about 6 digits or so, I'd say brute-force it according to the following scheme:
1) Split your initial value into a list (array, whatever, according to language) of numbers. Initially, these are the digits.
2) For each pair of numbers, combine them together using one of the operators. If the result is the target number, then return success (and print out all the operations performed on your way out). Otherwise if it's an integer, recurse on the new, smaller list consisting of the number you just calculated, and the numbers you didn't use. Or you might want to allow non-integer intermediate results, which will make the search space somewhat bigger. The binary operations are:
Add
subtract
multiply
divide
power
concatenate (which may only be used on numbers which are either original digits, or have been produced by concatenation).
3) Allowing square root bloats the search space to infinity, since it's a unary operator. So you will need a way to limit the number of times it can be applied, and I'm not sure what that will be (loss of precision as the answer approaches 1, maybe?). This is another reason to allow only integer intermediate values.
4) Exponentiation will rapidly cause overflows. 2^(9^(4^8)) is far too large to store all the digits directly [although in base 2 it's pretty obvious what they are ;-)]. So you'll either have to accept that you might miss solutions with large intermediate values, or else you'll have to write a bunch of code to do your arithmetic in terms of factors. These obviously don't interact very well with addition, so you might have to do some estimation. For example, just by looking at the magnitude of the number of factors we see that 2^(9^(4^8)) is nowhere near (2^35), so there's no need to calculate (2^(9^(4^8)) + 5) / (2^35). It can't possibly be 29485235, even if it were an integer (which it certainly isn't - another way to rule out this particular example). I think handling these numbers is harder than the rest of the problem put together, so perhaps you should limit yourself to single-digit powers to begin with, and perhaps to results which fit in a 64bit integer, depending what language you are using.
5) I forgot to exclude the trivial solution for any input, of just concatenating all the digits. That's pretty easy to handle, though, just maintain a parameter through the recursion which tells you whether you have performed any non-concatenation operations on the route to your current sub-problem. If you haven't, then ignore the false match.
My estimate of 6 digits is based on the fact that it's fairly easy to write a Countdown solver that runs in a fraction of a second even when there's no solution. This problem is different in that the digits have to be used in order, but there are more operations (Countdown does not permit exponentiation, square root, or concatenation, or non-integer intermediate results). Overall I think this problem is comparable, provided you resolve the square root and overflow issues. If you can solve one case in a fraction of a second, then you can brute force your way through a million candidates in reasonable time (assuming you don't mind leaving your PC on).
By 10 digits, brute force appears impossible, because you have to consider 10 billion cases, each with a significant amount of recursion required. So I guess you'll hit the limit of brute force somewhere between the two.
Note also that my simple algorithm at the top still has a lot of redundancy - it doesn't stop you doing (4,7,9,1) -> (47,9,1) -> (47,91), and then later also doing (4,7,9,1) -> (4,7,91) -> (47,91). So unless you work out where those duplicates are going to occur and avoid them, you'll attempt (47,91) twice. Obviously that's not much work when there's only 2 numbers in the list, but when there are 7 numbers in the list, you probably do not want to e.g. add 4 of them together in 6 different ways and then solve the resulting 4-number problem 6 times. Cleverness here is not required for the Countdown game, but for all I know in this problem it might make the difference between brute-forcing 8 digits, and brute-forcing 9 digits, which is quite significant.
Numbers like that, as I recall, are exceedingly rare, if extant. Some numbers can be expressed by their component digits in a different order, such as, say, 25 (5²).
Also, trying to brute-force solutions is hopeless, at best, given that the number of permutations increase extremely rapidly as the numbers grow in digits.
EDIT: Partial solution.
A partial solution solving some cases would be to factorize the number into its prime factors. If its prime factors are all the same, and the exponent and factor are both present in the digits of the number (such as is the case with 25) you have a specific solution.
Most numbers that do fall into these kinds of patterns will do so either with multiplication or pow() as their major driving force; addition simply doesn't increase it enough.
Short of building a neural network that replicates Carol Voorderman I can't see anything short of brute force working - humans are quite smart at seeing patterns in problems such as this but encoding such insight is really tough.

Guessing an unbounded integer

If I say to you:
"I am thinking of a number between 0 and n, and I will tell you if your guess is high or low", then you will immediately reach for binary search.
What if I remove the upper bound? i.e. I am thinking of a positive integer, and you need to guess it.
One possible method would be for you to guess 2, 4, 8, ..., until you guess 2**k for some k and I say "lower". Then you can apply binary search.
Is there a quicker method?
EDIT:
Clearly, any solution is going to take time proportional to the size of the target number. If I chuck Graham's number through the Ackermann function, we'll be waiting a while whatever strategy you pursue.
I could offer this algorithm too: Guess each integer in turn, starting from 1.
It's guaranteed to finish in a finite amount of time, but yet it's clearly much worse than my "powers of 2" strategy. If I can find a worse algorithm (and know that it is worse), then maybe I could find a better one?
For example, instead of powers of 2, maybe I can use powers of 10. Then I find the upper bound in log_10(n) steps, instead of log_2(n) steps. But I have to then search a bigger space. Say k = ceil(log_10(n)). Then I need log_2(10**k - 10**(k-1)) steps for my binary search, which I guess is about 10+log_2(k). For powers of 2, I have roughly log_2(log_2(n)) steps for my search phase. Which wins?
What if I search upwards using n**n? Or some other sequence? Does the prize go to whoever can find the sequence that grows the fastest? Is this a problem with an answer?
Thank you for your thoughts. And my apologies to those of you suggesting I start at MAX_INT or 2**32-1, since I'm clearly drifting away from the bounds of practicality here.
FINAL EDIT:
Hi all,
Thank you for your responses. I accepted the answer by Norman Ramsey (and commenter onebyone) for what I understood to be the following argument: for a target number n, any strategy must be capable of distinguishing between (at least) the numbers from 0..n, which means you need (at least) O(log(n)) comparisons.
However seveal of you also pointed out that the problem is not well-defined in the first place, because it's not possible to pick a "random positive integer" under the uniform probability distribution (or, rather, a uniform probability distribution cannot exist over an infinite set). And once I give you a nonuniform distribution, you can split it in half and apply binary search as normal.
This is a problem that I've often pondered as I walk around, so I'm pleased to have two conclusive answers for it.
If there truly is no upper bound, and all numbers all the way to infinity are equally likely, then there is no optimum way to do this. For any finite guess G, the probability that the number is lower than G is zero and the probability that it is higher is 1 - so there is no finite guess that has an expectation of being higher than the number.
RESPONSE TO JOHN'S EDIT:
By the same reasoning that powers of 10 are expected to be better than powers of 2 (there's only a finite number of possible Ns for which powers of 2 are better, and an infinite number where powers of 10 are better), powers of 20 can be shown to be better than powers of 10.
So basically, yes, the prize goes to fastest-growing sequence (and for the same sequence, the highest starting point) - for any given sequence, it can be shown that a faster growing sequence will win in infinitely more cases. And since for any sequence you name, I can name one that grows faster, and for any integer you name, I can name one higher, there's no answer that can't be bettered. (And every algorithm that will eventually give the correct answer has an expected number of guesses that is infinite, anyway).
People (who have never studied probability) tend to think that "pick a number from 1 to N" means "with equal probability of each", and they act according to their intuitive understanding of probability.
Then when you say "pick any positive integer", they still think it means "with equal probability of each".
This is of course impossible - there exists no discrete probability distribution with domain the positive integers, where p(n) == p(m) for all n, m.
So, the person picking the number must have used some other probability distribution. If you know anything at all about that distribution, then you must base your guessing scheme on that knowledge in order to have the "fastest" solution.
The only way to calculate how "fast" a given guessing scheme is, is to calculate its expected number of guesses to find the answer. You can only do this by assuming a probability distribution for the target number. For example, if they have picked n with probability (1/2) ^ n, then I think your best guessing scheme is "1", "2", "3",... (average 2 guesses). I haven't proved it, though, maybe it's some other sequence of guesses. Certainly the guesses should start small and grow slowly. If they have picked 4 with probability 1 and all other numbers with probability 0, then your best guessing scheme is "4" (average 1 guess). If they have picked a number from 1 to a trillion with uniform distribution, then you should binary search (average about 40 guesses).
I say the only way to define "fast" - you could look at worst case. You have to assume a bound on the target, to prevent all schemes having the exact same speed, namely "no bound on the worst case". But you don't have to assume a distribution, and the answer for the "fastest" algorithm under this definition is obvious - binary search starting at the bound you selected. So I'm not sure this definition is terribly interesting...
In practice, you don't know the distribution, but can make a few educated guesses based on the fact that the picker is a human being, and what numbers humans are capable of conceiving. As someone says, if the number they picked is the Ackermann function for Graham's number, then you're probably in trouble. But if you know that they are capable of representing their chosen number in digits, then that actually puts an upper limit on the number they could have chosen. But it still depends what techniques they might have used to generate and record the number, and hence what your best knowledge is of the probability of the number being of each particular magnitude.
Worst case, you can find it in time logarithmic in the size of the answer using exactly the methods you describe. You might use Ackermann's function to find an upper bound faster than logarithmic time, but then the binary search between the number guessed and the previous guess will require time logarithmic in the size of the interval, which (if guesses grow very quickly) is close to logarithmic in the size of the answer.
It would be interesting to try to prove that there is no faster algorithm (e.g., O(log log n)), but I have no idea how to do it.
Mathematically speaking:
You cannot ever correctly find this integer. In fact, strictly speaking, the statement "pick any positive integer" is meaningless as it cannot be done: although you as a person may believe you can do it, you are actually picking from a bounded set - you are merely unconscious of the bounds.
Computationally speaking:
Computationally, we never deal with infinites, as we would have no way of storing or checking against any number larger than, say, the theoretical maximum number of electrons in the universe. As such, if you can estimate a maximum based on the number of bits used in a register on the device in question, you can carry out a binary search.
Binary search can be generalized: each time set of possible choices should be divided into to subsets of probability 0.5. In this case it's still applicable to infinite sets, but still requires knowledge about distribution (for finite sets this requirement is forgotten quite often)...
My main refinement is that I'd start with a higher first guess instead of 2, around the average of what I'd expect them to choose. Starting with 64 would save 5 guesses vs starting with 2 when the number's over 64, at the cost of 1-5 more when it's less. 2 makes sense if you expect the answer to be around 1 or 2 half the time. You could even keep a memory of past answers to decide the best first guess. Another improvement could be to try negatives when they say "lower" on 0.
If this is guessing the upper bound of a number being generated by a computer, I'd start with 2**[number of bits/2], then scale up or down by powers of two. This, at least, gets you the closest to the possible values in the least number of jumps.
However, if this is a purely mathematical number, you can start with any value, since you have an infinite range of values, so your approach would be fine.
Since you do not specify any probability distribution of the numbers (as others have correctly mentioned, there is no uniform distribution over all the positive integers), the No Free Lunch Theorem give the answer: any method (that does not repeat the same number twice) is as good as any other.
Once you start making assumptions about the distribution (f.x. it is a human being or binary computer etc. that chooses the number) this of course changes, but as the problem is stated any algorithm is as good as any other when averaged over all possible distributions.
Use binary search starting with MAX_INT/2, where MAX_INT is the biggest number your platform can handle.
No point in pretending we can actually have infinite possibilities.
UPDATE: Given that you insist on entering the realms of infinity, I'll just vote to close your question as not programming related :-)
The standard default assumption of a uniform distribution for all positive integers doesn't lead to a solution, so you should start by defining the probability distribution of the numbers to guess.
I'd probably start my guessing with Graham's Number.
The practical answer within a computing context would be to start with whatever is the highest number that can (realistically) be represented by the type you are using. In case of some BigInt type you'd probably want to make a judgement call about what is realistic... obviously ultimately the bound in that case is the available memory... but performance-wise something smaller may be more realistic.
Your starting point should be the largest number you can think of plus 1.
There is no 'efficient search' for a number in an infinite range.
EDIT: Just to clarify, for any number you can think of there are still infinitely more numbers that are 'greater' than your number, compared to a finite collection of numbers that are 'less' than your number. Therefore, assuming the chosen number is randomly selected from all positive numbers, you have zero | (approaching zero) chance of being 'above' the chosen number.
I gave an answer to a similar question "Optimal algorithm to guess any random integer without limits?"
Actually, provided there algorithm not just searches for the conceived number, but it estimates a median of the distribution of the number that you may re-conceive at each step! And also the number could be even from the real domain ;)

Resources