Minimum steps to reach target, given (1, 0) and operations A = 2A - B or B = 2B - A - algorithm

I had an interview and I was not able to give a best approach for the problem.
A=1, B=0
- Operation L: A=2A-B
- Operation R: B=2B-A
For each step, only one operation(L or R) is taken place.
For a given number N, what is the minimum number of operations required to make A or B equals to N?
The most important one is an efficiency.
Thanks in advance.

In k operations you can get all values of N in [-(2^k)+1, 2^k].
Notice that abs(A) + abs(B) = 2^k for all possible k paths, and that A & B exactly cover the range [-(2^k)+1, 2^k] in the set of paths of length k.
k=0: (1,0)
k=1: (1,-1), (2,0)
k=2: (1,-3), (2,-2), (3,-1), (4,0)
etc...
Given N we can find the minimum k via log. Then we know the final pair is (N, N - 2^k) (or (N-2^k, N) if N <= 0). It's easy to follow the path back up to k=0 because one of the two elements will be out of range for the next smaller k.
E.g., N = 35.
Log2(35) = 5.13, so we use k=6.
2^6 = 64, so our final pair is (35, -29)
(35,-29) -> (3,-29) -> (3, -13) -> (3, -5) -> (3,-1) -> (1,-1) -> (1,0)
Figuring out k is O(1), finding the path is O(k) which is O(log(abs(N)).
It's not likely you need to prove anything in an interview, but if you did, you could use this:
By observation: A - B = 2^k for k steps observed for small k.
Then via induction: we have some valid (A, B) s.t. A-B = 2^k. Then L gets us (2A-B, B), but 2A-B-B = 2A-2B = 2(A-B) = 2^(k+1) as desired. Similarly for R.

It would be a challenging task for an interview but I would start with the recursion of trying to find the origin from the result. Given valid (A', B), where A' is the target we are after,
A' = 2A - B
for some A, which means that
A = (A' + B) / 2
The latter tells us that all (A' + B) in the path must be divisible by 2, and since the path ends (starts) at 1, all the (A' + B) in it are powers of 2.
Another property we can observe, although it may not be relevant to the solution is that once we switch in the first step to (even, even) or (odd, odd), we cannot switch back.

Related

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Algorithm to find minimum from list of pairs

The problem statement:
Given a list of pairs {A|B}
Find the minimum sum where you must take 'm' values from 'A' and 'n' values from 'B'
you may not use the same 'pair' for both A and B
the size of the list will be between 2 and 500 items
the number of items you take (m & n) can also vary
the numbers in the pair (A & B) are ranged 0-9.
There of course can be multiple pair combinations that give you the correct minimum.
For example, given:
1 - {4,5}
2 - {3,2}
3 - {3,1}
4 - {1,0}
and desiring 2 from A, 1 from B
the correct answer is 5
taking 2A(3), 4A(1) and 3B(1).
Another example is:
1 - {5,4}
2 - {2,1}
3 - {6,6}
4 - {2,1}
5 - {5,5}
and desiring 2 from A, 2 from Bthe correct answer is 12
taking 1A(5), 5A(5), 2B(1), 4B(1).
I have solved this using a brute force approach, but of course as the list grows larger, and m/n increase, the performance suffers greatly.
How can I improve on this brute force approach?
What is this class of problem called?
Believe it or not, this is not homework!
It can be formulated as a minimum-cost flow problem. Let the pairs be (a_i, b_i). Create vertices s, t, a, b, u_i and arcs from s to a (capacity m, cost 0), from s to b (capacity n, cost 0), from a to u_i (capacity 1, cost a_i), from b to u_i (capacity 1, cost b_i), from u_i to t (capacity 1, cost 0). Send m + n units of flow as cheaply as possible from s to t. Flow on a->u_i means that a_i is chosen. Flow on b->u_i means that b_i is chosen.
It also can be solved by dynamic programming. Let Cost[i, j, k] be the minimum sum for choosing i A's and j B's from the first k pairs. Then we have recurrence relations
Cost[0, 0, 0] = 0
Cost[i, j, 0] = infinity
(for all i, j such that i > 0 or j > 0)
Cost[i, j, k] = min {Cost[i, j, k-1],
Cost[i-1, j, k-1] + a_k,
Cost[i, j-1, k-1] + b_k}
(for all i, j, k such that k > 0).
Trace back the minimum arguments to reconstruct the optimal choices.

N number of smileys using minimum keystrokes

This is an interview question, i wasn't able to solve myself. Can somebody she some light?
Question: Assume you have chat client, in the text box you have one smiley (it's already there). Now you want that smiley N times and only two operations are allowed Copy and Paste. Result must be N smileys no more no less. Copy is basically ^a + ^c. You always have to copy all the smiley that are there in text box at a given time. Copy is one keystroke and paste is one keystroke. Write a algorithm / program for this problem. What will be the complexity?
Whatever the number of smileys N is, factor N into a product of primes. Then for each prime p in the factorization you need to do one copy and p-1 pastes. This is the minimal number of operations possible if "copy" means that you have to copy the entire string of smileys. The total number of operations is the sum of all the prime factors in the factorization of N.
user2566092's answer is correct, but the statement
This is the minimal number of operations possible
is claimed without proof. Here's a formal justification.
Let n be the target number of smileys. A configuration (r, k) consists of the number r of smileys in the result and the number k of smileys on the clipboard. The initial configuration is (1, 0). From (r, k), the configurations (r, r) [copy] and (r + k, k) [paste] are reachable in one keystroke.
The following lemma implies that, after a copy, the number of smileys in the result divides n evenly.
Lemma Suppose that g divides both r and k (g|r and g|k). Every configuration (r', k') reachable from (r, k) satisfies g|r' and g|k'.
Proof by induction. If the next step is a copy, to (r, r), then g|r and g|r. If the next step is a paste, to (r + k, k), then g|r + k and g|k.
Now we can prove the main claim.
Theorem The minimum number of steps from (1, 0) to some (n, k) where k is arbitrary is p1 + p2 + ... + pm, where n = p1 p2 ... pm is a prime factorization.
Proof user2566092 covered the upper bound: for i from 1 to m, copy once and paste pi - 1 times. The lower bound is obvious when n = 1. Otherwise, the best first action clearly is a copy, to (1, 1). We show by induction that the minimum number of steps from (1, 1) to (n, k) is p1 + p2 + ... + pm - 1.
From (1, 1), by the lemma, all we can do is paste until r divides n. Fix a particular r|n. The best action sequences from (r, r) to (n, n) and from (1, 1) to (n/r, n/r) clearly are the same. Without loss of generality by permuting the primes, assume that r = p1 p2 ... pj. Since a b >= a + b for a, b >= 2, we have r >= p1 + p2 + ... + pj. If r < n, then the total cost, by the inductive hypothesis, is at least r + pj+1 + pj+2 + ... + pm - 1 >= p1 + p2 + ... + pj - 1. If r = n, then the total cost is n - 1 >= p1 + p2 + ... + pm - 1.
2 * 2log(N).
Each time duplicate everything, until you don't need the complete set anymore. Then you could duplicate only part of the list.
E.g. 7 smileys:
1 copy everything
2 copy everything
4 copy only 3
Have 7.
The program for the above problem is given below. The time complexity of this solution is O(n) in worst case. The worst case occurs when the given number is prime.
#include <iostream>
using namespace std;
int main()
{
int smiles;
cin>>smiles;
int ans=0;
for(int i=2;i<=smiles;i++)
{
while(smiles%i==0)
{
ans+=i;
smiles/=i;
}
}
cout<<ans<<"\n";
return 0;
}

Number of Positive Solutions to a1 x1+a2 x2+......+an xn=k (k<=10^18)

The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.

Quadratic testing in hash tables

During an assignment, I was asked to show that a hash table of size m (m>3, m is prime) that is less than half full, and that uses quadratic checking (hash(k, i) = (h(k) + i^2) mod m) we will always find a free spot.
I've checked and arrived to the conclusion that the spots that will be found (when h(k)=0) are 0 mod m, 1 mod m, 4 mod m, 9 mod m, ...
My problem is that I can't figure a way to show that it will always find the free spot. I've tested it myself with different values of m, and also have proven myself that if the hash table is more than half full, we might never find a free spot.
Can anyone please hint me towards the way to solve this?
Thanks!
0, 1, 4, ..., ((m-1)/2)^2 are all distinct mod m. Why?
Suppose two numbers from that range, i^2 and j^2, are equivalent mod m.
Then i^2 - j^2 = (i-j)(i+j) = 0 (mod m). Since m is prime, m must divide one of those factors. But the factors are both less than m, so one of them ((i-j)) is 0. That is, i = j.
Since we are starting at 0, more than half the slots that are distinct. If you can only fill less than m/2 of them, at least one remains open.
Let's break the proof down.
Setup
First, some background.
With a hash table, we define a probe sequence P. For any item q, following P will eventually lead to the right item in the hash table. The probe sequence is just a series of functions {h_0, ..., h_M-1} where h_i is a hash function.
To insert an item q into the table, we look at h_0(q), h_1(q), and so on, until we find an empty spot. To find q later, we examine the same sequence of locations.
In general, the probe sequence is of the form h_i(q) = [h(q) + c(i)] mod M, for a hash table of size M, where M is a prime number. The function c(i) is the collision-resolution strategy, which must have two properties:
First, c(0) = 0. This means that the first probe in the sequence must be equal to just performing the hash.
Second, the values {c(0) mod M, ..., c(M-1) mod M} must contain every integer between 0 and M-1. This means that if you keep trying to find empty spots, the probe sequence will eventually probe every array position.
Applying quadratic probing
Okay, we've got the setup of how the hash table works. Let's look at quadratic probing. This just means that for our c(i) we're using a general quadratic equation of the form ai^2 + bi + c, though for most implementations you'll usually just see c(i) = i^2 (that is, b, c = 0).
Does quadratic probing meet the two properties we talked about before? Well, it's certainly true that c(0) = 0 here, since (0)^2 is indeed 0, so it meets the first property. What about the second property?
It turns out that in general, the answer is no.
Theorem. When quadratic probing is used in a hash table of size M, where M is a prime number, only the first floor[M/2] probes in the probe sequence are distinct.
Let's see why this is the case, using a proof by contradiction.
Say that the theorem is wrong. Then that means there are two values a and b such that 0 <= a < b < floor[M/2] that probe the same position.
h_a(q) and h_b(q) must probe the same position, by (1), so h_a(q) = h_b(q).
h_a(q) = h_b(q) ==> h(q) + c(a) = h(q) + c(b), mod M.
The h(q) on both sides cancel. Our c(i) is just c(i) = i^2, so we have a^2 = b^2.
Solving the quadratic equation in (4) gives us a^2 - b^2 = 0, mod M. This is a difference of two squares, so the solution is (a - b)(a + b) = 0, mod M.
But remember, we said M was a prime number. The only way that (a - b)(a + b) can be zero mod M is if [case I] (a - b) is zero, or [case II] (a + b) is zero mod M.
Case I can't be right, because we said that a != b, so a - b must be something other than zero.
The only way for (a + b) to be zero mod M is for a + b to be equal to be a multiple of M or zero. They clearly can't be zero, since they're both bigger than zero. And since they're both less than floor[M/2], their sum must be less than M. So case II can't be right either.
Thus, if the theorem were wrong, one of two quantities must be zero, neither of which can possibly be zero -- a contradiction! QED: quadratic probing doesn't satisfy property two once your table is more than half full and if your table size is a prime number. The proof is complete!
From Wikipedia:
For prime m > 2, most choices of c1 and c2 will make h(k,i) distinct for i in [0,(m − 1) / 2]. Such choices include c1 = c2 = 1/2, c1 = c2 = 1, and c1 = 0,c2 = 1. Because there are only about m/2 distinct probes for a given element, it is difficult to guarantee that insertions will succeed when the load factor is > 1/2.
See the quadratic probing section in Data Structures and Algorithms with Object-Oriented Design Patterns in C++ for a proof that m/2 elements are distinct when m is prime.

Resources