Expected value of expression - algorithm

How can I find the expected value for the expression in form P/Q
Given:
N integers
2 Operators, 'Bitwise OR' & '+'
We can use any of the two operator with equal probability between each consecutive integers to form the expression.
Currently, the solution that I have in mind is to generate all possible expression using the operators and then using the value of each expression I can calculate expected value for it.
But as N grows, this approach fails. Is there any other alternative that will be efficient in terms of time complexity?
Note: For this question: 'Bitwise OR' has higher priority than '+' operator.
There can be at max 10^5 integers.
Example:
Input
1 2 3
Output
19/4
The different ways are:
1+2+3 = 6
1+2|3 = 4
1|2+3 = 6
1|2|3 = 3
All these ways have probability = 1/4
So expected value will be 19/4

The important observation is that every + splits its left and right parts into sections that can be processed independently.
Let the array of numbers be a[1…N]. Define f(i) to be the expectation value obtained from a[i…N]. What we want to find is f(1).
Note that the first + sign in [i…N] will appear after the ith element with probability 1/2 and i+1th element with probability 1/4 and so on. Just find bitwise or of the elements till + and add the expectation value of what remains.
Thus we have the recurrence
f(i) = sum_{j = i to N-1} (or(a[i…j]) + f(j+1))/(2^(j-i+1))
+ or(a[i…N])/(2^(N-i))
This should be easy to implement efficiently without errors.
For the example array [1,2,3]:
f(3) = or(a[3…3]) = 3
f(2) = (or(a[2…2])+f(3))/2 + or(a[2…3])/2 = 5/2 + 3/2 = 4
f(1) = (or(a[1…1])+f(2))/2 + (or(a[1…2])+f(3))/4 + or(a[1…3])/4 = 5/2 + 6/4 + 3/4 = 19/4
The answer is found to be 19/4, as expected.

First of all, since there are 2ⁿ⁻¹ expressions (two possible operators on each of the n-1 places between numbers) and they are all equally probable, the expected value is the sum of all the expressions divided by 2ⁿ⁻¹. So the problem boils down to calculating the sum of the expressions.
An O(n²) algorithm
Let x_1, x_2, ..., x_n be the input numbers.
Let S_k be the sum of all expressions formed by inserting | or + between every pair of consecutive numbers in the list x_1, x_2, ..., x_k.
Let N_k be the number of all such expressions. N_k = 2 ^ (k - 1).
Let's see how we can use S_1, S_2, ..., S_(k-1) to calculate S_k.
The idea is to divide all possible expressions by the position of the last "+" in them.
The sum of the expressions of the form "... + x_k" is
S_(k-1) + x_k * N_(k-1)
The sum of the expressions of the form "... + x_(k-1) | x_k" is
S_(k-2) + (x_(k-1) | x_k) * N_(k-2)
The sum of the expressions of the form "... + x_(k-2) | x_(k-1) | x_k" is
S_(k-2) + (x_(k-2) | x_(k-1) | x_k) * N_(k-3)
...and so on until the single expression x_1 | x_2 | ... | x_k.
Here is a Python implementation of the algorithm.
numbers = [1, 2, 3] # The input numbers.
totals = [0] # The partial sums. For every k > 0 totals[k] is S_k.
for i in range(len(numbers)): # Processing the numbers one by one.
new_total = 0
last_summand = 0 # last_summand is numbers[j] | ... | numbers[i]
for j in range(i, 0, -1): # j is the position of the last plus in the expression.
# On every iteration new_total is increased by the sum of the
# expressions of the form "... + numbers[j] | ... | numbers[i]".
last_summand |= numbers[j]
new_total += totals[j] + last_summand * (2 ** (j - 1))
last_summand |= numbers[0]
new_total += last_summand # Handling the expression with no pluses at all.
totals.append(new_total)
# Now the last element in totals is the sum of all expressions.
print(str(totals[-1]) + '/' + str(2**(len(numbers) - 1)))
Further optimization: O(n*log(M))
The problem has two properties that can be used to create a faster algorithm.
If S_n is the sum of the expressions formed by the numbers x_1, x_2, ..., x_n, then 2*S_n is the sum of the expressions formed by the numbers 2*x_1, 2*x_2, ..., 2*x_n.
If x_1, x_2, ..., x_n and y_1, y_2, ..., y_n are such that x_k & y_m == 0 for any k and m, and SX_n is the sum of the expressions formed by x_1, x_2, ..., x_n, and SY_n is the sum of the expressions formed by y_1, y_2, ..., y_n, then SX_n + SY_n is the sum of the expressions formed by x_1+y_1, x_2+y_2, ..., x_n+y_n.
Which means, the problem can be reduced to finding the sum of the expressions for 1-bit numbers. Every bit position from 0 to 31 can be processed separately, and after the solutions are found we can simply add them.
Let x_1, x_2, ..., x_n be one-bit numbers (every x_i is either 0 or 1).
Let S_k be the sum of the expressions formed by x_1, x_2, ..., x_k.
Let N0_k be the number of such expressions where the last summand equals 0.
Let N1_k be the number of such expressions where the last summand equals 1.
Here is the recurrent relation that allows to find S_k, N0_k and N1_k knowing only x_k, S_(k-1), N0_(k-1) and N1_(k-1):
k = 1, x_1 = 0:
S_1 = 0
N0_1 = 1
N1_1 = 0
k = 1, x_1 = 1:
S_1 = 1
N0_1 = 0
N1_1 = 1
k > 1, x_k = 0:
S_k = S_(k-1) * 2
N0_k = N0_(k-1) * 2 + N0_(k-1)
N1_k = N1_(k-1)
k > 1, x_k = 1:
S_k = S_(k-1) * 2 + N0_(k-1) * 2 + N0_(k-1)
N0_k = 0
N1_k = N0_(k-1) * 2 + N0_(k-1) * 2
Since S_n can be found in O(n) and it needs to be found for every bit position, the time complexity of the whole algorithm is O(n*log(M)), where M is the upper bound on the numbers.
An implementation:
numbers = [1, 2, 3]
max_bits_in_number = 31
def get_bit(x, k):
return (x >> k) & 1
total_sum = 0
for bit_index in range(max_bits_in_number):
bit = get_bit(numbers[0], bit_index)
expression_sum = bit
expression_count = (1 - bit, bit)
for i in range(1, len(numbers)):
bit = get_bit(numbers[i], bit_index)
if bit == 0:
expression_sum = expression_sum * 2
expression_count = (expression_count[0] * 2 + expression_count[1], expression_count[1])
else:
expression_sum = expression_sum * 2 + expression_count[0] * 2 + expression_count[1]
expression_count = (0, expression_count[0] * 2 + expression_count[1]*2)
total_sum += expression_sum * 2**bit_index
print(str(total_sum) + '/' + str(2**(len(numbers) - 1)))

Related

Count number of subsequences of A such that every element of the subsequence is divisible by its index (starts from 1)

B is a subsequence of A if and only if we can turn A to B by removing zero or more element(s).
A = [1,2,3,4]
B = [1,4] is a subsequence of A.(Just remove 2 and 4).
B = [4,1] is not a subsequence of A.
Count all subsequences of A that satisfy this condition : A[i]%i = 0
Note that i starts from 1 not 0.
Example :
Input :
5
2 2 1 22 14
Output:
13
All of these 13 subsequences satisfy B[i]%i = 0 condition.
{2},{2,2},{2,22},{2,14},{2},{2,22},{2,14},{1},{1,22},{1,14},{22},{22,14},{14}
My attempt :
The only solution that I could came up with has O(n^2) complexity.
Assuming the maximum element in A is C, the following is an algorithm with time complexity O(n * sqrt(C)):
For every element x in A, find all divisors of x.
For every i from 1 to n, find every j such that A[j] is a multiple of i, using the result of step 1.
For every i from 1 to n and j such that A[j] is a multiple of i (using the result of step 2), find the number of B that has i elements and the last element is A[j] (dynamic programming).
def find_factors(x):
"""Returns all factors of x"""
for i in range(1, int(x ** 0.5) + 1):
if x % i == 0:
yield i
if i != x // i:
yield x // i
def solve(a):
"""Returns the answer for a"""
n = len(a)
# b[i] contains every j such that a[j] is a multiple of i+1.
b = [[] for i in range(n)]
for i, x in enumerate(a):
for factor in find_factors(x):
if factor <= n:
b[factor - 1].append(i)
# There are dp[i][j] sub arrays of A of length (i+1) ending at b[i][j]
dp = [[] for i in range(n)]
dp[0] = [1] * n
for i in range(1, n):
k = x = 0
for j in b[i]:
while k < len(b[i - 1]) and b[i - 1][k] < j:
x += dp[i - 1][k]
k += 1
dp[i].append(x)
return sum(sum(dpi) for dpi in dp)
For every divisor d of A[i], where d is greater than 1 and at most i+1, A[i] can be the dth element of the number of subsequences already counted for d-1.
JavaScript code:
function getDivisors(n, max){
let m = 1;
const left = [];
const right = [];
while (m*m <= n && m <= max){
if (n % m == 0){
left.push(m);
const l = n / m;
if (l != m && l <= max)
right.push(l);
}
m += 1;
}
return right.concat(left.reverse());
}
function f(A){
const dp = [1, ...new Array(A.length).fill(0)];
let result = 0;
for (let i=0; i<A.length; i++){
for (d of getDivisors(A[i], i+1)){
result += dp[d-1];
dp[d] += dp[d-1];
}
}
return result;
}
var A = [2, 2, 1, 22, 14];
console.log(JSON.stringify(A));
console.log(f(A));
I believe that for the general case we can't provably find an algorithm with complexity less than O(n^2).
First, an intuitive explanation:
Let's indicate the elements of the array by a1, a2, a3, ..., a_n.
If the element a1 appears in a subarray, it must be element no. 1.
If the element a2 appears in a subarray, it can be element no. 1 or 2.
If the element a3 appears in a subarray, it can be element no. 1, 2 or 3.
...
If the element a_n appears in a subarray, it can be element no. 1, 2, 3, ..., n.
So to take all the possibilities into account, we have to perform the following tests:
Check if a1 is divisible by 1 (trivial, of course)
Check if a2 is divisible by 1 or 2
Check if a3 is divisible by 1, 2 or 3
...
Check if a_n is divisible by 1, 2, 3, ..., n
All in all we have to perform 1+ 2 + 3 + ... + n = n(n - 1) / 2 tests, which gives a complexity of O(n^2).
Note that the above is somewhat inaccurate, because not all the tests are strictly necessary. For example, if a_i is divisible by 2 and 3 then it must be divisible by 6. Nevertheless, I think this gives a good intuition.
Now for a more formal argument:
Define an array like so:
a1 = 1
a2 = 1× 2
a3 = 1× 2 × 3
...
a_n = 1 × 2 × 3 × ... × n
By the definition, every subarray is valid.
Now let (m, p) be such that m <= n and p <= n and change a_mtoa_m / p`. We can now choose one of two paths:
If we restrict p to be prime, then each tuple (m, p) represents a mandatory test, because the corresponding change in the value of a_m changes the number of valid subarrays. But that requires prime factorization of each number between 1 and n. By the known methods, I don't think we can get here a complexity less than O(n^2).
If we omit the above restriction, then we clearly perform n(n - 1) / 2 tests, which gives a complexity of O(n^2).

Computing all infix products for a monoid / semigroup

Introduction: Infix products for a group
Suppose I have a group
G = (G, *)
and a list of elements
A = {0, 1, ..., n} ⊂ ℕ
x : A -> G
If our goal is to implement a function
f : A × A -> G
such that
f(i, j) = x(i) * x(i+1) * ... * x(j)
(and we don't care about what happens if i > j)
then we can do that by pre-computing a table of prefixes
m(-1) = 1
m(i) = m(i-1) * x(i)
(with 1 on the right-hand side denoting the unit of G) and then implementing f as
f(i, j) = m(i-1)⁻¹ * m(j)
This works because
m(i-1) = x(0) * x(1) * ... * x(i-1)
m(j) = x(0) * x(1) * ... * x(i-1) * x(i) * x(i+1) * ... * x(j)
and so
m(i)⁻¹ * m(j) = x(i) * x(i+1) * ... * x(j)
after sufficient reassociation.
My question
Can we rescue this idea, or do something not much worse, if G is only a monoid, not a group?
For my particular problem, can we do something similar if G = ([0, 1] ⊂ ℝ, *), i.e. we have real numbers from the unit line, and we can't divide by 0?
Yes, if G is ([0, 1] ⊂ ℝ, *), then the idea can be rescued, making it possible to compute ranged products in O(log n) time (or more accurately, O(log z) where z is the number of a in A with x(a) = 0).
For each i, compute the product m(i) = x(0)*x(1)*...*x(i), ignoring any zeros (so these products will always be non-zero). Also, build a sorted array Z of indices for all the zero elements.
Then the product of elements from i to j is 0 if there's a zero in the range [i, j], and m(j) / m(i-1) otherwise.
To find if there's a zero in the range [i, j], one can binary search in Z for the smallest value >= i in Z, and compare it to j. This is where the extra O(log n) time cost appears.
General monoid solution
In the case where G is any monoid, it's possible to do precomputation of n products to make an arbitrary range product computable in O(log(j-i)) time, although its a bit fiddlier than the more specific case above.
Rather than precomputing prefix products, compute m(i, j) for all i, j where j-i+1 = 2^k for some k>=0, and 2^k divides both i and j. In fact, for k=0 we don't need to compute anything, since the values of m(i, i+1) is simply x(i).
So we need to compute n/2 + n/4 + n/8 + ... total products, which is at most n-1 things.
One can construct an arbitrary interval [i, j] from at O(log_2(j-i+1)) of these building blocks (and elements of the original array): pick the largest building block contained in the interval and append decreasing sized blocks on either side of it until you get to [i, j]. Then multiply the precomputed products m(x, y) for each of the building blocks.
For example, suppose your array is of size 10. For example's sake, I'll assume the monoid is addition of natural numbers.
i: 0 1 2 3 4 5 6 7 8 9
x: 1 3 2 4 2 3 0 8 2 1
2: ---- ---- ---- ---- ----
4 6 5 8 3
4: ----------- ----------
10 13
8: ----------------------
23
Here, the 2, 4, and 8 rows show sums of aligned intervals of length 2, 4, 8 (ignoring bits left over if the array isn't a power of 2 in length).
Now, suppose we want to calculate x(1) + x(2) + x(3) + ... + x(8).
That's x(1) + m(2, 3) + m(4, 7) + x(8) = 3 + 6 + 13 + 2 = 24.

Sum of products of elements of all subarrays of length k

An array of length n is given. Find the sum of products of elements of the sub-array.
Explanation
Array A = [2, 3, 4] of length 3.
Sub-array of length 2 = [2,3], [3,4], [2,4]
Product of elements in [2, 3] = 6
Product of elements in [3, 4] = 12
Product of elements in [2, 4] = 8
Sum for subarray of length 2 = 6+12+8 = 26
Similarly, for length 3, Sum = 24
As, products can be larger for higher lengths of sub-arrays calculate in modulo 1000000007.
What is an efficient way for finding these sums for subarrays of all possible lengths, i.e., 1, 2, 3, ......, n where n is the length of the array.
There is rather simple way:
Construct product of terms (1 + A[i] * x):
P = (1 + A[0] * x) * (1 + A[1] * x) * (1 + A[2] * x)...*(1 + A[n-1] * x)
If we open the brackets, then we'll get polynomial
P = 1 + B[1] * x + B[2] * x^2 + ... + B[n] * x^n
Kth coefficient, B[k], is equal to the sum of products of sets with length K - for example, B[n] = A[0]*A[1]*A[2]*..A[n-1], B[2] = A[0]*A[1] + A[0]*A[2] + ... + A[n-2]*A[n-1] and so on.
So to find sum of products of all possible sets, we have to find value of polynomial P for x = 1, then subtract 1 to remove leading 0th term. If we don't want to take into consideration single-element sets, then subtract B1 = sum of A[i].
Example:
(1+2)(1+3)(1+4) = 60
60 - 1 = 59
59 - (2 + 3 + 4) = 50 = 24 + 26 - as your example shows
We first create a recursive relation. Let f(n, k) be the sum of all products of sub-arrays of length k from an array a of length n. The base cases are simple:
f(0, k) = 0 for all k
f(n, 0) = 1 for all n
The second rule might seem a little counter-intuitive, but 1 is the zero-element of multiplication.
Now we find a recursive relation for f(n+1, k). We want the product of all subarrays of size k. There are two types of subarrays here: the ones including a[n+1] and the ones not including a[n+1]. The sum of the ones not including a[n+1] is exactly f(n, k). The ones including a[n+1] are exactly all subarrays of length k-1 with a[n+1] added, so their summed product is a[n+1] * f(n, k-1).
This completes our recurrence relation:
f(n, k) = 0 if n = 0
= 1 if k = 0
= f(n-1, k) + a[n] * f(n-1, k-1) otherwise
You can use a neat trick to use very limited memory for your dynamic programming, because function f only depends on two earlier values:
int[] compute(int[] a) {
int N = a.length;
int[] f = int[N];
f[0] = 1;
for (int n = 1; n < N; n++) {
for (int k = n; k >= 1; k--) {
f[k] = (f[k] + a[n] * f[k-1]) % 1000000007;
}
}
return f;
}

Count number of subsequences with given k modulo sum

Given an array a of n integers, count how many subsequences (non-consecutive as well) have sum % k = 0:
1 <= k < 100
1 <= n <= 10^6
1 <= a[i] <= 1000
An O(n^2) solution is easily possible, however a faster way O(n log n) or O(n) is needed.
This is the subset sum problem.
A simple solution is this:
s = 0
dp[x] = how many subsequences we can build with sum x
dp[0] = 1, 0 elsewhere
for i = 1 to n:
s += a[i]
for j = s down to a[i]:
dp[j] = dp[j] + dp[j - a[i]]
Then you can simply return the sum of all dp[x] such that x % k == 0. This has a high complexity though: about O(n*S), where S is the sum of all of your elements. The dp array must also have size S, which you probably can't even afford to declare for your constraints.
A better solution is to not iterate over sums larger than or equal to k in the first place. To do this, we will use 2 dp arrays:
dp1, dp2 = arrays of size k
dp1[0] = dp2[0] = 1, 0 elsewhere
for i = 1 to n:
mod_elem = a[i] % k
for j = 0 to k - 1:
dp2[j] = dp2[j] + dp1[(j - mod_elem + k) % k]
copy dp2 into dp1
return dp1[0]
Whose complexity is O(n*k), and is optimal for this problem.
There's an O(n + k^2 lg n)-time algorithm. Compute a histogram c(0), c(1), ..., c(k-1) of the input array mod k (i.e., there are c(r) elements that are r mod k). Then compute
k-1
product (1 + x^r)^c(r) mod (1 - x^k)
r=0
as follows, where the constant term of the reduced polynomial is the answer.
Rather than evaluate each factor with a fast exponentiation method and then multiply, we turn things inside out. If all c(r) are zero, then the answer is 1. Otherwise, recursively evaluate
k-1
P = product (1 + x^r)^(floor(c(r)/2)) mod (1 - x^k).
r=0
and then compute
k-1
Q = product (1 + x^r)^(c(r) - 2 floor(c(r)/2)) mod (1 - x^k),
r=0
in time O(k^2) for the latter computation by exploiting the sparsity of the factors. The result is P^2 Q mod (1 - x^k), computed in time O(k^2) via naive convolution.
Traverse a and count a[i] mod k; there ought to be k such counts.
Recurse and memoize over the distinct partitions of k, 2*k, 3*k...etc. with parts less than or equal to k, adding the products of the appropriate counts.
For example, if k were 10, some of the partitions would be 1+2+7 and 1+2+3+4; but while memoizing, we would only need to calculate once how many pairs mod k in the array produce (1 + 2).
For example, k = 5, a = {1,4,2,3,5,6}:
counts of a[i] mod k: {1,2,1,1,1}
products of distinct partitions of k:
5 => 1
4,1 => 2
3,2 => 1
products of distinct partitions of 2 * k with parts <= k:
5,4,1 => 2
5,3,2 => 1
4,1,3,2 => 2
products of distinct partitions of 3 * k with parts <= k:
5,4,1,3,2 => 2
answer = 11
{1,4} {4,6} {2,3} {5}
{1,4,2,3} {1,4,5} {4,6,2,3} {4,6,5} {2,3,5}
{1,4,2,3,5} {4,6,2,3,5}

number of integral solutions

Question from the interview at f2f interview at MS:
Determine the number of integral solutions of
x1 + x2 + x3 + x4 + x5 = N
where 0 <= xi <= N
So basically we need to find the number of partitions of N in at most 5 parts
Supposed to be solved with paper and pencil. Did not make much headway though, does anybody have a solution for this?
Assume numbers are strictly > 0.
Consider an integer segment [0, N]. The problem is to split it into 4 segments of positive length. Imagine we do that by putting 4 splitter dots between adjacent numbers. How many ways to do that ? C(N-1, 4).
Now, some numbers can be 0-s. Let k be number of non-zero numbers. We can choose them in C(5,k) ways, for each having C(N-1, k) splittings. Accumulating by all k in [0,5] range, we get
Sum[ C(5,k) * C(n-1,k); k = 0 to 5]
#Grigor Gevorgyan indeed gives the right way to figure out the solution.
think about when
1 <= xi
that's exactly dividing N points into 5 segments. it's equivalent to insert 4 "splitter dots" out of N-1 possible places ( between adjacent numbers). So the answer is C(N-1,4)
then what about when
0 <= xi
?
If you have the solution of X+5 points in
1 <= xi
whose answer is C(N-1,4)=C(X+5-1,4)=C(X+4,4)
then you simply remove one point from each set, and you have a solution of X points, with
0 <= xi
which means,the answer now is exactly equal to C(X+4,4)
Topcoder tutorials
Look for the section "Combination with repetition" : The specific case is explained under that section with diagrmatic illustration .(A picture is worth quite a few words!)
You have the answer here.
It is classical problem -
Number of options to put N balls in M boxes = c(M+N-1,N).
The combinations solution is more appropriate if a pen and paper solution was asked. It's also the classic solution. Here is a dynamic programming solution.
Let dp[i, N] = number of solutions of x1 + x2 + ... +xi = N.
Let's take x1 + x2 = N:
We have the solutions:
0 + N = N
1 + N - 1 = N
...
N + 0 = N
So dp[2, N] = N + 1 solutions.
Let's take x1 + x2 + x3 = N:
We have the solutions:
0 + (0 + N) = N
0 + (1 + N - 1) = N
...
0 + (N + 0) = N
...
Notice that there are N + 1 solutions thus far. Moving on:
1 + (0 + N - 1) = N
1 + (1 + N - 2) = N
...
1 + (N - 1 + 0) = N
...
Notice that there are another N solutions. Moving on:
...
N - 1 + (0 + 1) = N
N - 1 + (1 + 0) = N
=> +2 solutions
N + (0 + 0) = N
=> +1 solution
So we have dp[3, N] = dp[2, N] + dp[2, N - 1] + dp[2, N - 2] + ... + dp[2, 0].
Also notice that dp[k, 0] = 1
Since for each row of the matrix we need N summations, the complexity for computing dp[k, N] is O(k*N), which is just as much as would be needed for the combinatorics solution.
To keep the complexity for each row O(N), store s[i] = sum of the first i elements on the previous row. The memory used can also be reduced to O(N).

Resources