Guessing a number knowing only if the number proposed is lower or higher? - algorithm

I need to guess a number. I can only see if the number I'm proposing is lower or higher. Performance matters a whole lot, so I thought of the following algorithm:
Let's say the number I'm trying to guess is 600.
I start out with the number 1000 (or for even higher performance, the average result of previous numbers).
I then check if 1000 is higher or lower than 600. It is higher.
I then divide the number by 2 (so that it is now 500), and check if it is lower or higher than 600. It is lower.
I then find the difference and divide it by 2 in the following way to retrieve a new number: (1000 + 500) / 2. The result is 750. I then check that number.
And so on.
Is this the best approach or is there a smarter way of doing this? For my case, every guess takes approximately 500 milliseconds, and I need to guess quite a lot of numbers in as low time as possible.
I can roughly assume that the average result of previous guesses is close to the upcoming numbers too, so there's a pattern there which I can use for my own advantage.

yes binary search is the most effective way of doing this. Binary Search is what you described. For a number between 1 and N Binary Search runs in O(log(n)) time.
So here is the algorithm to find a number between 1-N
int a = 1, b = n, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while

Well, you're taking the best possible approach without the extra information - it's a binary search, basically.
Exactly how you use the "average result of previous guesses" is up to you; it would suggest biasing the results towards that average, but you'd need to perform analysis of just how indicative previous results are in order to work out the best approach. Don't just use the average: use the complete distribution.
For example, if all the results have been in the range 600-700 (even though the hypothetical range is up to 1000) with an average of 670, you might start with 670 but if it says "guess higher" then you would probably want to choose a value between 670 and 700 as your next guess, rather than 835 (which is very likely to be higher than the real result).
I suggest you log all the results from previous enquiries, so you can then use that as test data for alternative approaches.

In general, binary search starting at the middle point of the range is the optimal strategy. However, you have additional specific information which may make this a suboptimal strategy. This depends critically in what exactly "close to the average of the previous results" means.
If numbers are close to the previous average then dividing by 2 in the second step is not optimal.
Example: Previous numbers 630, 650, 620, 660. You start with 640.
Your number is actually closer. Imagine that it is 634.
The number is lower. If in the second step you divide by 2, you get 320, thus losing any advantage about the previous average numbers.
You should analyze the behaviour further. It may be optimal, in your specific case, to start at the mean of the N previous numbers and then add or substract some quantity related to the standard deviation of the previous numbers.

Yes, binary search (your algorithm) is correct here. However there is one thing missing in the standard binary search:
For binary search you normally need to know the maximum and minimum between which you are searching. In case you do not know this, you have to iteratively find the maximum in the beginning, like so:
Start with zero
if it is higher than the number searched, zero is your maximum and you have to find a minimum
if it is lower than the number searched, zero is your minimum and you have to find a maximum
You can search for your maximum/minimum by starting at 1 or -1 and always multiplying by two until you find a number which is greater/smaller
When you always multiply by two, you will be much faster than when you search linearly.

Do you know the range of possible values? If yes, always start in the middle and do exactly what you describe.

A standard binary search between 0 and N(N is the given number) will give you the answer in logN time.

int a = 1, b = n+1, guess = average of previous answers;
while(guess is wrong) {
if(guess lower than answer) {a = guess;}
else if(guess higher than answer) {b = guess;}
guess = (a+b)/2;
} //Go back to while
You got to add +1 to n else you can never get n since it's an int.

I gave an answer to a similar question "Optimal algorithm to guess any random integer without limits?"
Actually, provided there algorithm not just searches for the conceived number, but it estimates a median of the distribution of the number that you may re-conceive at each step! And also the number could be even from the real domain ;)

Related

a dynamic program about subarray sum

I've see a problem about dynamic program
like this:
let's say there is a array like this: [600, 500, 300, 220, 210]
I want to find a sub array whose sum is the most closest to 1000 and bigger than it.(>=1000).
how can I write the code? I already understand the 01 backpack problem but still cannot make out this problem
A few things:
First, I think you are referring to "dynamic programming", not "a dynamic program"; read up here if you want to know the difference: https://en.wikipedia.org/wiki/Dynamic_programming
Second, I think you mean "closest to 1000 but NOT bigger than it (< 1000)", since that is the general constraint. If you were allowed to go over 1000, then the problem doesn't make sense because there is no constraint.
Like the backpack problem, this is going to be a non-polynomial (NP) time problem (a problem where the time required to compute increases faster than polynomial growth - usually exponential or faster), where you would normally have to check every possible combination of numbers, which can take a long time for seemingly small set sizes.
I believe that the correct answer from the 5 you provided is 500+220+210, which sums to 930, the largest that you can make without going over 1000.
The basic idea of dynamic programming is to break the problem into smaller similar problems that are more easily computable; for example, if you had a million numbers and wanted to find the subset that is closest to 100000 but not over, you might divide the million into 100,000 subsets of 10 elements, and find the closest to a smaller number of each of those subsets, then use the resulting set of 100,000 sums to repeat with 10,000 sets, etc, until you reduce it to a close-but-not-perfect solution.
In any non-polynomial-time problem, dynamic programming can only be used in building a close approximation, since the solution isn't guaranteed to be optimal.
You can use transaction optimizer from the EmerCoin wallet.
It exacly does, what you're looking for.
An approach to solve this problem can be done in two steps:
define a function which takes a subarray and gives you an evaluation or a score of this subarray so that you can actually compare subarrays and take the best. A function could be simply
if(sum(subarray) < 1000) return INFINITY
else return sum(subarray) - 1000
note that you can also use dynamic programming to compute the sum of subarrays
Assuming that the length of your goal array is N, you will need to solve the problems of size 1 to N. If the array's length is 1 then obviously there is one possibility and it's the best. If size > 1 then we take the solution of the problem with length size - 1 and we compare it with every subarray containing the last element of the array and we take the best subarray as the solution of the problem with length size.
I hope my explanation makes sense

Algorithm to find an arbitrarily large number

Here's something I've been thinking about: suppose you have a number, x, that can be infinitely large, and you have to find out what it is. All you know is if another number, y, is larger or smaller than x. What would be the fastest/best way to find x?
An evil adversary chooses a really large number somehow ... say:
int x = 9^9^9^9^9^9^9^9^9^9^9^9^9^9^9
and provides isX, isBiggerThanX, and isSmallerThanx functions. Example code might look something like this:
int c = 2
int y = 2
while(true)
if isX(y) return true
if(isBiggerThanX(y)) fn()
else y = y^c
where fn() is a function that, once a number y has been found (that's bigger than x) does something to determine x (like divide the number in half and compare that, then repeat). The thing is, since x is arbitrarily large, it seems like a bad idea to me to use a constant to increase y.
This is just something that I've been wondering about for a while now, I'd like to hear what other people think
Use a binary search as in the usual "try to guess my number" game. But since there is no finite upper end point, we do a first phase to find a suitable one:
Initially set the upper end point arbitrarily (e.g. 1000000, though 1 or 1^100 would also work -- given the infinite space to work in, all finite values are equally disproportionate).
Compare the mystery number X with the upper end point.
If it's not big enough, double it, and try again.
Once the upper end point is bigger than the mystery number, proceed with a normal binary search.
The first phase is itself similar to a binary search. The difference is that instead of halving the search space with each step, it's doubling it! The cost for each phase is O(log X). A small improvement would be to set the lower end point at each doubling step: we know X is at least as high as the previous upper end point, so we can reuse it as the lower end point. The size of the search space still doubles at each step, but in the end it will be half as large as would have been. The cost of the binary search will be reduced by only 1 step, so its overall complexity remains the same.
Some notes
A couple of notes in response to other comments:
It's an interesting question, and computer science is not just about what can be done on physical machines. As long as the question can be defined properly, it's worth asking and thinking about.
The range of numbers is infinite, but any possible mystery number is finite. So the above method will eventually find it. Eventually is defined such as that, for any possible finite input, the algorithm will terminate within a finite number of steps. However since the input is unbounded, the number of steps is also unbounded (it's just that, in every particular case, it will "eventually" terminate.)
If I understand your question correctly (advise if I do not), you're asking about how to solve "pick a number from 1 to 10", except that instead of 10, the upper bound is infinity.
If your number space is truly infinite, the following are true:
The value will never be held in an int (or any other data type) on any physical hardware
You will NEVER find your number
If the space is immensely large but bound, I think the best you can do is a binary search. Start at the middle of the number range. If the desired number turns out to be higher or lower, divide that half of the number space, and repeat until the desired number is found.
In your suggested implementation you raise y ^ c. However, no matter how large c is chosen to be, it will not even move the needle in infinite space.
Infinity isn't a number. Thus you can't find it, even with a computer.
That's funny. I've wondered the same thing for years, though I've never heard anyone else ask the question.
As simple as your scenario is, it still seems to provide insufficient information to allow the choice of an optimal strategy. All one can choose is a suitable heuristic. My heuristic had been to double y, but I think that I like yours better. Yours doubles log(y).
The beauty of your heuristic is that, so long as the integer fits in the computer's memory, it finds a suitable y in logarithmic time.
Counter-question. Once you find y, how do you proceed?
I agree with using binary search, though I believe that a ONE-SIDED binary search would be more suitable, since here the complexity would NOT be O( log n ) [ Where n is the range of allowable numbers ], but O( log k ) - where k is the number selected by your adversary.
This would work as follows : ( Pseudocode )
k = 1;
while( isSmallerThanX( k ) )
{
k = k*2;
}
// At this point, once the loop is exited, k is bigger than x
// Now do normal binary search for the range [ k/2, k ] to find your number :)
So even if the allowable range is infinity, as long as your number is finite, you should be able to find it :)
Your method of tetration is guaranteed to take longer than the age of the universe to find an answer, if the opponent merely uses a paradigm which is better (for example, pentation). This is how you should do it:
You can only do this with symbolic representations of numbers, because it is trivial to name a number your computer cannot store in floating-point representation, even if it used arbitrary-precision arithmetic and all its memory.
Required reading: http://www.scottaaronson.com/writings/bignumbers.html - that pretty much sums it up
How do you represent a number then? You represent it by a program which will, if run to completion, print out that number. Even then, your computer is incapable of computing BusyBeaver(10^100) (if you dictated a program 1 terabyte in size, this well over the maximum number of finite clock cycles it could run without looping forever). You can see that we could easily have the computer print out 1 0 0... each clock cycle, making the maximum number it could say (if we waited nearly an eternity) would be 10^BusyBeaver(10^100). If you allowed it to say more complicated expressions like eval(someprogram), power-towers, Ackermann's function, whatever-- then I believe that would be no better than increasing the original 10^100 by some constant proportional to the complexity of what you described (plus some logarithmic interpreter factor, see Kolmogorov complexity).
So let's frame this another way:
Your opponent picks a finite computable number, and gives you a function tells you if the number is smaller/larger/equal by computing it. He also gives you a representation for the output (in a sane world this would be "you can only print numbers like 99999", but he can make it more complicated; it actually doesn't matter). Proceed to measure the size of this function in bits.
Now, answer with your own function, which is twice the size of his function (in bits), and prints out the largest number it can while keeping the code to less than 2N bits in length. (You use the same representation he chose: In a world where you can only print out numbers like "99999", that's what you do. If you can define functions, it gets slightly more complicated.)
I do not understand the purpose here, but I this is what I thought of:
Reading your comments, I suppose you aren't looking for infinitely large number, but a "super large number" instead. And whatever be the number, it will have a large no. of digits. How you got them, isn't the concern. Keeping this in mind:
No complex computation is required. Just type random keys on your numeric keyboard to have a super large number, and then have a program randomly add/remove/modify digits of that number. You get a list of very large numbers - select any one out of them.
e.g: 3672036025039629036790672927305060260103610831569252706723680972067397267209
and keep modifying/adding digits to get more numbers
PS: If you state the purpose in your question clearly, we might be able to give better answers.

Is there a name for this type of binary search?

In writing some code today, I have happened upon a circumstance that has caused me to write a binary search of a kind I have never seen before. Does this binary search have a name, and is it really a "binary" search?
Motivation
First of all, in order to make the search easier to understand, I will explain the use case that spawned its creation.
Say you have a list of ordered numbers. You are asked to find the index of the number in the list that is closest to x.
int findIndexClosestTo(int x);
The calls to findIndexClosestTo() always follow this rule:
If the last result of findIndexClosestTo() was i, then indices closer to i have greater probability of being the result of the current call to findIndexClosestTo().
In other words, the index we need to find this time is more likely to be closer to the last one we found than further from it.
For an example, imagine a simulated boy that walks left and right on the screen. If we are often querying the index of the boy's location, it is likely he is somewhere near the last place we found him.
Algorithm
Given the case above, we know the last result of findIndexClosestTo() was i (if this is actually the first time the function has been called, i defaults to the middle index of the list, for simplicity, although a separate binary search to find the result of the first call would actually be faster), and the function has been called again. Given the new number x, we follow this algorithm to find its index:
interval = 1;
Is the number we're looking for, x, positioned at i? If so, return i;
If not, determine whether x is above or below i. (Remember, the list is sorted.)
Move interval indices in the direction of x.
If we have found x at our new location, return that location.
Double interval. (i.e. interval *= 2)
If we have passed x, go back interval indices, set interval = 1, go to 4.
Given the probability rule stated above (under the Motivation header), this appears to me to be the most efficient way to find the correct index. Do you know of a faster way?
In the worst case, your algorithm is O((log n)^2).
Suppose you start at 0 (with interval = 1), and the value you seek actually resides at location 2^n - 1.
First you will check 1, 2, 4, 8, ..., 2^(n-1), 2^n. Whoops, that overshoots, so go back to 2^(n-1).
Next you check 2^(n-1)+1, 2^(n-1)+2, ..., 2^(n-1)+2^(n-2), 2^(n-1)+2^(n-1). That last term is 2^n, so whoops, that overshot again. Go back to 2^(n-1) + 2^(n-2).
And so on, until you finally reach 2^(n-1) + 2^(n-2) + ... + 1 == 2^n - 1.
The first overshoot took log n steps. The next took (log n)-1 steps. The next took (log n) - 2 steps. And so on.
So, worst case, you took 1 + 2 + 3 + ... + log n == O((log n)^2) steps.
A better idea, I think, is to switch to traditional binary search once you overshoot the first time. That will preserve the O(log n) worst case performance of the algorithm, while tending to be a little faster when the target really is nearby.
I do not know a name for this algorithm, but I do like it. (By a bizarre coincidence, I could have used it yesterday. Really.)
What you are doing is (IMHO) a version of Interpolation search
In a interpolation search you assume numbers are equally distributed, and then you try to guess the location of a number from first and last number and length of the array.
In your case, you are modifying the interpolation-algo such that you assume the Key is very close to the last number you searched.
Also note that your algo is similar to algo where TCP tries to find the optimal packet size. (dont remember the name :( )
Start slow
Double the interval
if Packet fails restart from the last succeeded packet./ Restart
from default packet size.. 3.
Your routine is typical of interpolation routines. You don't lose much if you call it with random numbers (~ standard binary search), but if you call it with slowly increasing numbers, it won't take long to find the correct index.
This is therefore a sensible default behavior for searching an ordered table for interpolation purposes.
This method is discussed with great length in Numerical Recipes 3rd edition, section 3.1.
This is talking off the top of my head, so I've nothing to back it up but gut feeling!
At step 7, if we've passed x, it may be faster to halve interval, and head back towards x - effectively, interval = -(interval / 2), rather than resetting interval to 1.
I'll have to sketch out a few numbers on paper, though...
Edit: Apologies - I'm talking nonsense above: ignore me! (And I'll go away and have a proper think about it this time...)

Bin-packing (or knapsack?) problem

I have a collection of 43 to 50 numbers ranging from 0.133 to 0.005 (but mostly on the small side). I would like to find, if possible, all combinations that have a sum between L and R, which are very close together.*
The brute-force method takes 243 to 250 steps, which isn't feasible. What's a good method to use here?
Edit: The combinations will be used in a calculation and discarded. (If you're writing code, you can assume they're simply output; I'll modify as needed.) The number of combinations will presumably be far too large to hold in memory.
* L = 0.5877866649021190081897311406, R = 0.5918521703507438353981412820.
The basic idea is to convert it to an integer knapsack problem (which is easy).
Choose a small real number e and round numbers in your original problem to ones representable as k*e with integer k. The smaller e, the larger the integers will be (efficiency tradeoff) but the solution of the modified problem will be closer to your original one. An e=d/(4*43) where d is the width of your target interval should be small enough.
If the modified problem has an exact solution summing to the middle (rounded to e) of your target interval, then the original problem has one somewhere within the interval.
You haven't given us enough information. But it sounds like you are in trouble if you actually want to OUTPUT every possible combination. For example, consistent with what you told us are that every number is ~.027. If this is the case, then every collection of half of the elements with satisfy your criterion. But there are 43 Choose 21 such sets, which means you have to output at least 1052049481860 sets. (too many to be feasible)
Certainly the running time will be no better than the length of the required output.
Actually, there is a quicker way around this:
(python)
sums_possible = [(0, [])]
# sums_possible is an array of tuples like this: (number, numbers_that_yield_this_sum_array)
for number in numbers:
sums_possible_for_this_number = []
for sum in sums_possible:
sums_possible_for_this_number.insert((number + sum[0], sum[1] + [number]))
sums_possible = sums_possible + sums_possible_for_this_number
results = [sum[1] for sum in sums_possible if sum[0]>=L and sum[1]<=R]
Also, Aaron is right, so this may or may not be feasible for you

How to design an efficient algorithm for least upper bound search

Let's say you have some set of numbers with a known lower bound and unknown upper bound, i.e. 0, 1, 2, 3, ... 78 where 78 is the unknown. Assume for the moment there are no gaps in between numbers. There is a time-expensive function test() that tests if a number is in the set.
What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set?
What if you have the added knowledge that the upper bound is 75 +/- 25?
What if there are random gaps between numbers in the set, i.e. 0, 1, 3, 4, 7, ... 78?
For the "no gaps case":
I assume that this is a fixed size of number, e.g. a 32 bit int
We wish to find x such that test(x) == true, test(x+1) == false, right?
You basically do a binary chop between the lowest known "not in set" (e.g. the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. This would give an O(log N) solution (in terms of numbers of calls to test()) where X is the size of the potential set, not the actual set. This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones.
All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". Any other strategy will fail or be more expensive as far as I can see.
Your best bet is to simply run through the set with O(n) complexity, which is not bad.
Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places...
A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. This will be O(1) for all cases.
Set Step to 1
set Upper to Lower + Step
if test(Upper) is true then set Lower to Upper, multiply Step by 2 and go to point 2
at this point you know that Lower is in your set while Upper is not. You can now do a binary search between Lower and Upper to find the limit.
This looks like O(log n * O(test)) complexity.
If you know that Upper is between 50 and 100, Do a binary search between these two values.
If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set.
If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound.
Maybe you should just traverse through it? It would be O(n) complex. I think there is no other way to do this.
Do you know the set size, before hand?
Actually, I guess you probably don't - otherwise the first problem would be trivial.
It would help if you had some idea how big the set was though.
Take a guess at the top value
Test - if in then increment value by some amount
If not in then decrease value by some amount
Once you have upper and lower bounds for largest value, binary search till you find it (to required precision).
For the gaps you've no such ability - you can't even tell when you've found the largest element. (Unless you known the maximum gap size)
If there are no gaps, then you are probably best off with a binary search.
If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50).
If we can't assume a possible upper range, we just have to made educated guess at what it might be. If a value is not found, it becomes the high end. If it is found, it's the low end, and we double it to find the high end.
If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap.
If your set happens to be the set of prime numbers, let me know when you find the biggest one. I'm sure we can work something out. ;)
But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. Or, you're chopping it to a 32-bit integer.
A couple of suggestions:
1) Think of every case you can that would speed a result of test(x) == false. Then you can go on to the next one. If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead.
2) Can you gain any information from each test? For example, does test(x) == false imply that test(x+5679) == false as well?

Resources