I keep getting these hard interview questions. This one really baffles me.
You're given a function poly that takes and returns an int. It's actually a polynomial with nonnegative integer coefficients, but you don't know what the coefficients are.
You have to write a function that determines the coefficients using as few calls to poly as possible.
My idea is to use recursion knowing that I can get the last coefficient by poly(0). So I want to replace poly with (poly - poly(0))/x, but I don't know how to do this in code, since I can only call poly. ANyone have an idea how to do this?
Here's a neat trick.
int N = poly(1)
Now we know that every coefficient in the polynomial is at most N.
int B = poly(N+1)
Now expand B in base N+1 and you have the coefficients.
Attempted explanation: Algebraically, the polynomial is
poly = p_0 + p_1 * x + p_2 * x^2 + ... + p_k * x^k
If you have a number b and expand it in base n, then you get
b = b_0 + b_1 * n + b_2 * n^2 + ...
where each b_i is uniquely determined and b_i < n.
Related
Let us say we are given a number n.
We need to find the number of values S ^ (S+n) lying in the range [L, R].
(Where S is any non-negative integer and ^ is the bitwise xor operator).
I can easily do this if n is power of two (they have a very useful pattern)
I am not sure how to solve this for any general n.
Any suggestions?
EDIT:
n is also a non-negative integer.
n, L, R are all less than 10^18.
This was a programming question in some practice test which i gave sometime back, i just remembered this seeing a similar question in StackOverflow today.
EDIT 2:
Explaining with an example,
say n = 1.
Then we know that S ^ (S + 1) will always have a binary representation of all ones. eg: 1,3,7,...
So solving this is easy we just have to count the number of such numbers within the Range [L,R] it is quite simple.
For n = any power of 2 similar methods work. But i have no idea what to do if n is not a power of 2.
Let C(n) be the (infinite) set of numbers that can be written as S ^ (S + n) for some S.
We have the following recurrence relations on the sets C(n):
If n = 2k is even, then C(n) = {2x : x in C(k)};
If n = 2k + 1 is odd, then C(n) = {2x + 1 : x in C(k)} union {2x + 1 : x in C(k + 1)}.
An algorithm can be deduced from these relations. More precisely, a pair (C(n), C(n + 1)) can be deduced from (C(n / 2), C(n / 2 + 1)). Note that the union above is really a disjoint union, because every element in C(n) has the same parity as n, hence C(k) and C(k + 1) do not intersect.
Proof of the recurrence relations:
Simply look at the last binary digits of n and S.
I have this formula:
index = (a * k) % M
which maps a number 'k', from an input set K of distinct numbers, into it's position in a hashtable. I was wondering how to write a non-brute force program that finds such 'M' and 'a' so that 'M' is minimal, and there are no collisions for the given set K.
If, instead of a numeric multiplication you could perform a logic computation (and / or /not), I think that the optimal solution (minimum value of M) would be as small as card(K) if you could get a function that related each value of K (once ordered) with its position in the set.
Theoretically, it must be possible to write a truth table for such a relation (bit a bit), and then simplify the minterms through a Karnaugh Table with a proper program. Depending on the desired number of bits, the computational complexity would be affordable... or not.
If a is co-prime to M then a * k = a * k' mod M if, and only if, k = k' mod M, so you might as well use a = 1, which is always co-prime to M. This also covers all the cases in which M is prime, because all the numbers except 0 are then co-prime to M.
If a and M are not co-prime, then they share a common factor, say b, so a = x * b and M = y * b. In this case anything multiplied by a will also be divisible by b mod M, and you might as well by working mod y, not mod M, so there is nothing to be gained by using an a not co-prime to M.
So for the problem you state, you could save some time by leaving a=1 and trying all possible values of M.
If you are e.g. using 32-bit integers and really calculating not (a * k) mod M but ((a * k) mod 2^32) mod M you might be able to find cases where values of a other than 1 do better than a=1 because of what happens in (a * k) mod 2^32.
In my work, I came across the following problem: Given a similarity matrix D, where $d_{i,j} \in \Re$ represents the similarity between objects $i$ and $j$, I would like to select $k$ objects, for $k \in {1, \dots, n}$, in such a way that minimizes the similarity between the selected objects. My first attempt to formally formulate this problem was using the following integer program:
$\minimize$ $d_{1,2}X_1X_2 + d_{1,3}X_1X_3 + \dots + d_{1,n}X_1X_n + d_{2,1}X_2X_1 + \dots + d_{n,n-1}X_nX_{n-1} $
such that $X_1 + X_2 + \dots + X_n = k$ and $X_y \in {0,1}$, for $y=1,\dots,n$
In the above program, $X_y$ indicates whether or not object $y$ was selected. Clearly, the above program is not linear. I tried to make the objective function linear by using variables $X_{1,2} $, which indicates whether or not both objects $X_1$ and $X_2$ were selected. However, I am struggling to formulate the constraint that exactly $k$ objects must be chosen, i.e., the previous constraint $X_1 + X_2 + \dots + X_n = k$.
Since I am not an expert in mathematical programming, I wonder if you could help me with this.
Thank you in advance!
All the best,
Arthur
You were on the right path, just missing one thing:
Let x_i be 1 if object i is chosen and 0 otherwise.
Let y_ij be 1 if objects i & j are both chosen and 0 otherwise
The IP goes as follows:
maximize
sum d_ij y_ij
s.t.
sum x_i = k
x_i + x_j - 1 <= y_ij for all i<j
x & y binary variables
The strange looking linking constraint says that y_ij = 1 iff x_i + x_j =2
Only define one y variable for each pair!
Hope this helps
I am trying to calculate a function F(x,y) using dynamic programming. Functionally:
F(X,Y) = a1 F(X-1,Y)+ a2 F(X-2,Y) ... + ak F(X-k,Y) + b1 F(X,Y-1)+ b2 F(X,Y-2) ... + bk F(X,Y-k)
where k is a small number (k=10).
The problem is, X=1,000,000 and Y=1,000,000. So it is infeasible to calculate F(x,y) for every value between x=1..1000000 and y=1..1000000. Is there an approximate version of DP where I can avoid calculating F(x,y) for a large number of inputs and still get accurate estimate of F(X,Y).
A similar example is string matching algorithms (Levenshtein's distance) for two very long and similar strings (eg. similar DNA sequences). In such cases only the diagonal scores are important and the far-from-diagonal entries do not contribute to the final distance. How do we avoid calculating off-the-diagonal entries?
PS: Ignore the border cases (i.e. when x < k and y < k).
I'm not sure precisely how to adapt the following technique to your problem, but if you were working in just one dimension there is an O(k3 log n) algorithm for computing the nth term of the series. This is called a linear recurrence and can be solved using matrix math, of all things. The idea is to suppose that you have a recurrence defined as
F(1) = x_1
F(2) = x_2
...
F(k) = x_k
F(n + k + 1) = c_1 F(n) + c_2 F(n + 1) + ... + c_k F(n + k)
For example, the Fibonacci sequence is defined as
F(0) = 0
F(1) = 1
F(n + 2) = 1 x F(n) + 1 x F(n + 1)
There is a way to view this computation as working on a matrix. Specifically, suppose that we have the vector x = (x_1, x_2, ..., x_k)^T. We want to find a matrix A such that
Ax = (x_2, x_3, ..., x_k, x_{k + 1})^T
That is, we begin with a vector of terms 1 ... k of the sequence, and then after multiplying by matrix A end up with a vector of terms 2 ... k + 1 of the sequence. If we then multiply that vector by A, we'd like to get
A(x_2, x_3, ..., x_k, x_{k + 1})^T = (x_3, x_4, ..., x_k, x_{k + 1}, x_{k + 2})
In short, given k consecutive terms of the series, multiplying that vector by A gives us the next term of the series.
The trick uses the fact that we can group the multiplications by A. For example, in the above case, we multiplied our original x by A to get x' (terms 2 ... k + 1), then multiplied x' by A to get x'' (terms 3 ... k + 2). However, we could have instead just multiplied x by A2 to get x'' as well, rather than doing two different matrix multiplications. More generally, if we want to get term n of the sequence, we can compute Anx, then inspect the appropriate element of the vector.
Here, we can use the fact that matrix multiplication is associative to compute An efficiently. Specifically, we can use the method of repeated squaring to compute An in a total of O(log n) matrix multiplications. If the matrix is k x k, then each multiplication takes time O(k3) for a total of O(k3 log n) work to compute the nth term.
So all that remains is actually finding this matrix A. Well, we know that we want to map from (x_1, x_2, ..., x_k) to (x_1, x_2, ..., x_k, x_{k + 1}), and we know that x_{k + 1} = c_1 x_1 + c_2 x_2 + ... + c_k x_k, so we get this matrix:
| 0 1 0 0 ... 0 |
| 0 0 1 0 ... 0 |
A = | 0 0 0 1 ... 0 |
| ... |
| c_1 c_2 c_3 c_4 ... c_k |
For more detail on this, see the Wikipedia entry on solving linear recurrences with linear algebra, or my own code that implements the above algorithm.
The only question now is how you adapt this to when you're working in multiple dimensions. It's certainly possible to do so by treating the computation of each row as its own linear recurrence, then to go one row at a time. More specifically, you can compute the nth term of the first k rows each in O(k3 log n) time, for a total of O(k4 log n) time to compute the first k rows. From that point forward, you can compute each successive row in terms of the previous row by reusing the old values. If there are n rows to compute, this gives an O(k4 n log n) algorithm for computing the final value that you care about. If this is small compared to the work you'd be doing before (O(n2 k2), I believe), then this may be an improvement. Since you're saying that n is on the order of one million and k is about ten, this does seem like it should be much faster than the naive approach.
That said, I wouldn't be surprised if there was a much faster way of solving this problem by not proceeding row by row and instead using a similar matrix trick in multiple dimensions.
Hope this helps!
Without knowing more about your specific problem, the general approach is to use a top-down dynamic programming algorithm and memoize the intermediate results. That way you will only calculate the values that will be actually used (while saving the result to avoid repeated calculations).
I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.
This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.
As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.
This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
a[j+k]==b^{j}a[k]+a[j]
a[2n]==(b^n+1)a[n]
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
T=1
e=b%m
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
(a,r)#(b,s)=(ab,as+r)
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]
Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
If[EvenQ[exp],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
exp--];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
];
Return [Mod[sum + factor, m]]
]
Parameters:
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.