Number of sets of queries for the game guess the number? - algorithm

I have asked this problem on math.stackexchange, but I didn't get any answer.
The problem is to solve a two-player number-guessing game. Let the first person be A and the second person be B. A chooses a number x between 1 and an upper limit n. B gives queries to A in form of (L,R) and A will answer with Yes/NO if the number exists in the interval (L,R) both inclusive.
B will need to ask a set of queries to uniquely determine the number x. So the problem is to find the number of such distinct sets of queries such that B will be able to uniquely determine x, irrespective of what the value of x. A returns the answers to the queries as a whole series -- B gets the yes/no responses in a batch, only after making all of the queries.
For example, let's say n is 2. The sets of possible queries would be,
{(1,1)},
{(2,2)},
{(1,1),(2,2)},
{(1,1),(1,2)},
{(2,2),(1,2)},
{(1,1),(2,2),(1,2)}
Question: How can I determine which of these query sets will uniquely identify any integer that A might choose?
I could think was the we need to basically isolate all the possible numbers from 1 to n in some way, otherwise it's not possible to uniquely determine the number. But I have no idea what to do with this information.

Set #1 to detect a number:
1 set with all single pairs: {{1,1},{2,2},...,{N,N}}
Set #2 to detect a number:
N sets with single pairs except one pair: {{1,1},{2,2},..,{x-1,x-1},{x+1,x+1},..,{N,N}}
Pairs that can't help to identify number:
N-1 pairs with length 2: {1,2},{2,3},,{N-1,N}
N-2 pairs with length 3: {1,3},{2,4},,{N-2,N}
...
2 pairs with length N-1: {1,N-1},{2,N}
1 pairs with length N: {1,N}
Total number of useless pairs is:
K = (N-1) + (N-2) + ... 2 + 1 = N*(N-1)/2
Total number of useless sets is:
Z = C(K,0) + C(K, 1) + ... + C(K, K) = 2^K
Number of sets of queries
To find the answer we need to combine all correct sets with all other types of sets.
ANSWER = (Number of set #1 + Number of sets #2) * Z = (1 + N) * (2^K)
UDP: Answer is wrong, see comment below

Related

Given a number N and an array A. Check if N can be expressed as a product of one or more array elements

Given a number N (where N <= 10^18) and an array A(consisting of at most 20 elements). I have to tell if it is possible to form N by multiplying some elements of the array. Note that, I can use any element multiple times.
Example: N = 8 and A = {2, 3}. Here, 8 = 2 * 2 * 2. So the answer is YES. But if N = 15, then I can't form 15 as a product of one or more elements using them any number of times. So in this case the answer is NO.
How can I approach this problem?
Simple pseudocode:
A_divisors = set()
for x in A:
if num % x == 0:
A_divisors.add(x)
candidates = A_divisors.clone()
seen = set()
while(candidates.size()):
size = divisors.size()
new_candidates = set()
for y in candidates:
for x in A_divisors:
if num % (x * y) == 0 and (x * y) not in seen:
new_candidates.add(x * y)
seen.add(x * y)
if x * y == num:
return true
candidates = new_candidates
return false
Complexity: O(|A| * k * log k), with k being amount of divisors. The log k would be the cost of adding and checking if element is present in the set. With a hash based approach it would be O(1) and can be removed. I am also assuming %, * operations to be O(1).
Since you show no code or algorithm, I'll just give one idea. If you want more help, please show more of your own work on the problem.
Note that N can be at most 60 bits long. This is small enough that N could be decomposed into its prime factors pretty quickly. So first work up a good factoring algorithm for numbers of that size.
Your algorithm would factor N and each of the elements in your array A. If there is any prime factor of N that does not divide into any element of A then your answer is NO. This is the case in your example of N = 15.
Now you work with the prime factors and their exponents in N and in the elements of A. Now you want to find a subset (or, more properly, a sub-multiset) of A where the exponents for each prime add up to that in N. This greatly reduces the sizes of your numbers thus makes the problem easier.
That last part is not trivial. Work more on this problem and show us some of your work, then we can continue helping you.
You can follow below approach:
Form 2 queues: Q2 and Q3.
Add 2 in Q2 and 3 in Q3.
Get the minimum of the head of both queues, lets say h. Remove h from the corresponding queue. Check if it is equal to the number N. If yes, return true. If it is greater than N, return false.
If it is less than N, then add 2*h in Q2 and 3*h in Q3. Repeat steps 3 to 4.
Please note that when the minimum h comes from Q3, you need not to add 2*h into Q2. That is because you already have added that element in Q3 before. (I will leave it for you to deduce}. Keep on doing this procedure until your h is greater than N.
If you have more such numbers, you can form there queues as well. I think this is an optimal solution in case you have more numbers to process.
Can you guess the time and space complexity of this?

Calculating limits in dynamic programming

I found this question on topcoder:
Your friend Lucas gave you a sequence S of positive integers.
For a while, you two played a simple game with S: Lucas would pick a number, and you had to select some elements of S such that the sum of all numbers you selected is the number chosen by Lucas. For example, if S={2,1,2,7} and Lucas chose the number 11, you would answer that 2+2+7 = 11.
Lucas now wants to trick you by choosing a number X such that there will be no valid answer. For example, if S={2,1,2,7}, it is not possible to select elements of S that sum up to 6.
You are given the int[] S. Find the smallest positive integer X that cannot be obtained as the sum of some (possibly all) elements of S.
Constraints: - S will contain between 1 and 20 elements, inclusive. - Each element of S will be between 1 and 100,000, inclusive.
But in the editorial solution it has been written:
How about finding the smallest impossible sum? Well, we can try the following naive algorithm: First try with x = 1, if this is not a valid sum (found using the methods in the previous section), then we can return x, else we increment x and try again, and again until we find the smallest number that is not a valid sum.
Let's find an upper bound for the number of iterations, the number of values of x we will need to try before we find a result. First of all, the maximum sum possible in this problem is 100000 * 20 (All numbers are the maximum 100000), this means that 100000 * 20 + 1 will not be an impossible value. We can be certain to need at most 2000001 steps.
How good is this upper bound? If we had 100000 in each of the 20 numbers, 1 wouldn't be a possible sum. So we actually need one iteration in that case. If we want 1 to be a possible sum, we should have 1 in the initial elements. Then we need a 2 (Else we would only need 2 iterations), then a 4 (3 can be found by adding 1+2), then 8 (Numbers from 5 to 7 can be found by adding some of the first 3 powers of two), then 16, 32, .... It turns out that with the powers of 2, we can easily make inputs that require many iterations. With the first 17 powers of two, we can cover up to the first 262143 integer numbers. That should be a good estimation for the largest number. (We cannot use 2^18 in the input, smaller than 100000).
Up to 262143 times, we need to query if a number x is in the set of possible sums. We can just use a boolean array here. It appears that even O(log(n)) data structures should be fast enough, however.
I did understand the first paragraph. But after that they have explained something about "How good is this upper bound?...". I couldnt understand that paragraph. How did they deduce to the fact that we need to query 262143 times if a number x is in the set of possible sums?
I am a newbie at dynamic programming and so it would be great if somebody could explain this to me.
Thank you.
The idea is as follows:
If the input sequence contains the first k powers of two: 2^0, 2^1, ... 2^(k-1), then the sum can be any integer between 0 and (2^k) - 1. Since the greatest power of two that can appear in the sequence is 2^17, the greatest sum that you can build from 18 numbers is 2^18 - 1=262,143. If a power of two would be missing, there would be a smaller sum that was not possible to achieve.
However, the statement is missing that there may be 2 more numbers in the sequence (at most 20). From these two numbers, you can repeat the same process. Hence, the maximum number to check is actually (2^18) - 1 + (2^2) - 1.
You may wonder why we use powers of two and not any other powers. The reason is the binary selection that we perform on the numbers in the input sequence. Either we add a number to the sum or we don't. So, if we represent this selection for number ni as a selection variable si (either 0 or 1), then the possible sum is:
s = s0 * n0 + s1 * n1 + s2 * n2 + ...
Now, if we choose the ni to be powers of two ni = 2^i, then:
s = s0 * 2^0 + s1 * 2^1 + s2 * 2^2 + ...
= sum si * 2^i
This is equivalent to the binary representations of numbers (see Positional Notation). By definition, different choices for the selection variables will produce different sums. Hence, the number of possible sums is maximal by choosing powers of two in the input sequence.

Sample number with equal probability which is not part of a set

I have a number n and a set of numbers S ∈ [1..n]* with size s (which is substantially smaller than n). I want to sample a number k ∈ [1..n] with equal probability, but the number is not allowed to be in the set S.
I am trying to solve the problem in at worst O(log n + s). I am not sure whether it's possible.
A naive approach is creating an array of numbers from 1 to n excluding all numbers in S and then pick one array element. This will run in O(n) and is not an option.
Another approach may be just generating random numbers ∈[1..n] and rejecting them if they are contained in S. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if s is substantially smaller than n.
Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.
Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.
E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.
If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.
If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).
Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.
This is O(s) time, O(s) space.
This is an O(1) solution with O(s) initial setup that works by mapping each non-allowed number > s to an allowed number <= s.
Let S be the set of non-allowed values, S(i), where i = [1 .. s] and s = |S|.
Here's a two part algorithm. The first part constructs a hash table based only on S in O(s) time, the second part finds the random value k ∈ {1..n}, k ∉ S in O(1) time, assuming we can generate a uniform random number in a contiguous range in constant time. The hash table can be reused for new random values and also for new n (assuming S ⊂ { 1 .. n } still holds of course).
To construct the hash, H. First set j = 1. Then iterate over S(i), the elements of S. They do not need to be sorted. If S(i) > s, add the key-value pair (S(i), j) to the hash table, unless j ∈ S, in which case increment j until it is not. Finally, increment j.
To find a random value k, first generate a uniform random value in the range s + 1 to n, inclusive. If k is a key in H, then k = H(k). I.e., we do at most one hash lookup to insure k is not in S.
Python code to generate the hash:
def substitute(S):
H = dict()
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
H[s] = j
j += 1
return H
For the actual implementation to be O(s), one might need to convert S into something like a frozenset to insure the test for membership is O(1) and also move the len(S) loop invariant out of the loop. Assuming the j in S test and the insertion into the hash (H[s] = j) are constant time, this should have complexity O(s).
The generation of a random value is simply:
def myrand(n, s, H):
k = random.randint(s + 1, n)
return (H[k] if k in H else k)
If one is only interested in a single random value per S, then the algorithm can be optimized to improve the common case, while the worst case remains the same. This still requires S be in a hash table that allows for a constant time "element of" test.
def rand_not_in(n, S):
k = random.randint(len(S) + 1, n);
if k not in S: return k
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
if s == k: return j
j += 1
Optimizations are: Only generate the mapping if the random value is in S. Don't save the mapping to a hash table. Short-circuit the mapping generation when the random value is found.
Actually, the rejection method seems like the practical approach.
Generate a number in 1...n and check whether it is forbidden; regenerate until the generated number is not forbidden.
The probability of a single rejection is p = s/n.
Thus the expected number of random number generations is 1 + p + p^2 + p^3 + ... which is 1/(1-p), which in turn is equal to n/(n-s).
Now, if s is much less than n, or even more up to s = n/2, this expected number is at most 2.
It would take s almost equal to n to make it infeasible in practice.
Multiply the expected time by log s if you use a tree-set to check whether the number is in the set, or by just 1 (expected value again) if it is a hash-set. So the average time is O(1) or O(log s) depending on the set implementation. There is also O(s) memory for storing the set, but unless the set is given in some special way, implicitly and concisely, I don't see how it can be avoided.
(Edit: As per comments, you do this only once for a given set.
If, additionally, we are out of luck, and the set is given as a plain array or list, not some fancier data structure, we get O(s) expected time with this approach, which still fits into the O(log n + s) requirement.)
If attacks against the unbounded algorithm are a concern (and only if they truly are), the method can include a fall-back algorithm for the cases when a certain fixed number of iterations didn't provide the answer.
Similarly to how IntroSort is QuickSort but falls back to HeapSort if the recursion depth gets too high (which is almost certainly a result of an attack resulting in quadratic QuickSort behavior).
Find all numbers that are in a forbidden set and less or equal then n-s. Call it array A.
Find all numbers that are not in a forbidden set and greater then n-s. Call it array B. It may be done in O(s) if set is sorted.
Note that lengths of A and B are equal, and create mapping map[A[i]] = B[i]
Generate number t up to n-s. If there is map[t] return it, otherwise return t
It will work in O(s) insertions to a map + 1 lookup which is either O(s) in average or O(s log s)

Algorithm for obtaining a sum with minimum number of terms

The problem statement is following:
Given N. We need to find x1,x2,..,xp such that N = x1 + x2 + .. + xp, p must be minimum(means number of terms in the sum) and we also must be able to get all the numbers from 1 to (N-1) from the sum of the subset of (x1,x2,x3..xp).And numbers in the set might be repeated also.
For example if N=7.
7 = 1+2+4
And 6= (2,4) , 5= (4,1), 4 = (4),3=(1,2) and so on.
Example 2:
8 = 1+2+4+1
Example 3:(invalid)
8 = 1+2+5
But we can't get 4 from the subset of (1,2,5).So (1,2,5) is not a valid combination
My approach is if 'N-1'can be written as sum of p terms than 'N' either have p or p+1 terms. But that approach will require to check all possible combinations which sums up to "N-1" and have "p" terms. Can anyone has better solution other than this?
Solution:
Step1:
Assume that we got "K" entries in our set as our answer. Therefore we can obtain 2^K different numbers of sums from these numbers because each entry either will appear or not appear in the sum. And also if the the number is "N", we need to compute the sum for '1' to 'N'. Therefore (2^K -1) = N K=log(N+1)
Step2:
After the step1, we know that our answer must include "K" entries but what these entries actual are? Assume that our entries are (a1,a2,a3...ak). So number P can be written as
P = a1*b1 + a2*b2 + a3*b3....+ ak*bk. Where all b[i] = 0 or 1. Here, we can see P as a decimal representation of binary number (b1 b2 b3 bk), therefore we can take a[i] = 2^(i-1).
You should take all numbers 1,2,4 ....2^k, N-(1+...+2^k). (The last one only if it doesn't equal to 0)
Proof
First of all, if we only get k numbers, we can get maximum 2^k - 1 different sums except 0. So if N>=2^k, We need at least k + 1 numbers. So you can see that if our group of numbers correct it's minimum by size(or one of the minimums)
It's easy to see that we can get any number from 0 to 2^(k+1) - 1 using first numbers. What If we need more? We just get last number because it's less than 2^(k + 1). And get difference using first elements
I haven't run out the numbers on this, but you should be very very interested in the fact that you have listed the first three powers of two.
If I were looking for a better solution, that's where I'd start.

How can I compute the average cost for this solution of the element uniqueness problem?

In the book Introduction to the Design & Analysis of Algorithms, the following solution is proposed to the element uniqueness problem:
ALGORITHM UniqueElements(A[0 .. n-1])
// Determines whether all the elements in a given array are distinct
// Input: An array A[0 .. n-1]
// Output: Returns "true" if all the elements in A are distinct
// and false otherwise.
for i := 0 to n - 2 do
for j := i + 1 to n - 1 do
if A[i] = A[j] return false
return true
How can I compute the average cost (i.e. number of comparisons for a given n) for this algorithm? What is a reasonable assumption about the input?
If you don't know anything else about the input, then a reasonable assumption is that it's random. If so, and if the space of possible choices is large (e.g. the set of all real numbers), then the likelihood of two elements being the same is vanishingly small. (Mathematically, we say that the event of two randomly selected real numbers being distinct is almost sure.)
That means that your average case is equal to your worst case: you'll have to scan every element in the array to be sure that each one is distinct. Then the number of comparisons is n * (n - 1) / 2, or the sum of 1 ... n.
I think it's hard to talk about an average cost. The worst case cost is O(n2) and happens either when the repeated elements are towards the end of the array, for example something like this:
2 3 4 5 ... 1 1
Or when the array contains nothing but distinct elements.
The best case is when the array starts with two repeated elements, like this:
1 1 ...
In which case the cost is a single comparison. Another good case is when there exists an element near the beginning of the array that repeats at the end of the array, something like this:
2 3 4 1 ... 1
This will be (closer to) O(n).
The fact is the cost depends on the input, so you might as well assume you're going to always hit a worst case and try to find a better algorithm, maybe something based on sorting the array or on using hash tables, giving you O(nlog n) worst case and O(n) average case respectively.
Since you iterate twice over the array in a nested way, worst case cost should be O(n²)..
a closer look would show you that since you start second loop from the element after the one you are checking you have:
N-1 + (N-2) + (N-3) + (N-4) + (N-5) + .... + 1
comparisons so the exact average cost would be N*(N-1) / 2
According to your comment I think that you should assume that every element is uniformely chosen between the set of possible values.
This means that the element A[i] has the probability 1/n of being exactly a specified value. Starting from here you can do your considerations:
first of all you choose a whatever element of the array A[i]. What is the probability of having A[i] == A[i+1]? It's 1/n² since both elements are supposed to be random.
what is the probability of having A[i] == A[i+2]? You have 1/n * (n-1/n) * 1/n because you have respectively a specified element, anything except the specified one, and the same specified element
you can extend the argumentation over any element A[k] with k>i, then you add all probabilities and you will have which is the average probability of having two unique element in the array starting from a specified one.
you extend thing thing further considering that you can start from any A[i] with i = 0..l-1. Of course every different i will have different probabilities because array will be shorter as i increases.
NOTE: n is the number of different items that can be inserted into the array, not its length.
After this you can easily estimate your average comparison cost..
If you need an exact value for a given input length then this will work (thought it is overkill):
ALGORITHM complexity_counter_of_UniqueElements(A[0 .. n-1])
// Determines whether all the elements in a given array are distinct
// Input: An array A[0 .. n-1]
// Output: Returns "true" if all the elements in A are distinct
// and false otherwise.
counter acc = 0;
for i := 0 to n - 2 do
for j := i + 1 to n - 1 do
//if A[i] = A[j] return false
acc := 1 + acc
return acc
It is easy to see that this algorithm is O(nn) though, which is probably what you're interested in. The algorithm compares every element by every other element. If you created a table with the results of this the table would have to be at least ((nn)/2) to hold all of the results.
edit:
I see now what you were really asking.
You need to compute the probability that each comparison may result in a match. This depends on the size of your elements (things that live in A) and what kind of distribution they have.
Assuming a random distribution the chance that any two random A[x] == A[y] where x != y would be 1.0/(number of possible values of element).
P(n)
total_chance := 0.0
for i:= 0 to n - 2 do
for j := i + 1 to n - 1 do
this_chance := 1.0/(number_of_possible_values_of_element)
total_chance := total_chance + ((1-total_chance)*this_chance)
// This should be the the probability of the newly compared pair being equal weighted
// to account for the chance that it actually mattered (ie, hadn't found a match earlier)
return total_chance
O((1-P(n))nn), but P(n) is <= 1, so it is less than n*n

Resources