Counting number of quadruples of integers - algorithm

I saw this question today where we need to count the number of
quadruples of integers
(X1, X2, X3, X4), such that Li ≤ Xi ≤ Ri for i
= 1, 2, 3, 4 and X1 ≠ X2, X2 ≠ X3, X3 ≠ X4, X4 ≠ X1.
input:
Li Ri
1 4
1 3
1 2
4 4
output:
8
1 2 1 4
1 3 1 4
1 3 2 4
2 1 2 4
2 3 1 4
2 3 2 4
3 1 2 4
3 2 1 4
My initial thoughts were using
Principle of Inclusion Exclusion
I was able to find number if unrestricted quadruples but I am not able to figure out how can we find the remaining conditions to reach the final solution. Also I came to know this question can be done using DFS .
How can we do this question with Inclusion Exclusion/ DFS

Inclusion/Exclusion will give you the number of quadruples, but won't give you the quadruples themselves.
Let Ai be the set of quadruples satisfying Lj<=Xj<=Rj for all j, with Xi=X(i+1) (where the indices are cyclic, so X5 means X1). In the example you provided,
A1 = { (1114), (1124), (2214), (2224), (3314), (3324) }
A2 = { (1114), (2114), (3114), (4114), (1224), (2224), (3224), (4224) }
A3 = { } (empty set)
A4 = { (4114), (4214), (4314), (4124), (4224), (4324) }
We also need the intersections of pairs of sets:
A1 cap A2 = { (1114), (2224) } (note first three numbers identical)
A1 cap A3 = { }
A1 cap A4 = { } (can't have X4=X1=X2)
A2 cap A3 = { }
A2 cap A4 = { (4114), (4224) }
A3 cap A4 = { }
Intersections of triples of sets:
A1 cap A2 cap A3 = { }
A1 cap A2 cap A4 = { }
A1 cap A3 cap A4 = { }
A2 cap A3 cap A4 = { }
And the intersection of all the sets:
A1 cap A2 cap A3 cap A4 = { }
Inclusion/exclusion in its complementary form tells us that
|intersection of complements of Ai| = |unrestricted quadruples|
- sum of |Ai| + sum of |Ai cap Aj| - sum of |Ai cap Aj cap Ak|
+ sum of |Ai cap Aj cap Ak cap Al|
where none of the indices i,j,k,l are equal. In your example,
|intersection of complements of Ai| = 4x3x2x1 - (6+8+0+6) + (2+0+0+0+2+0) - (0+0+0+0) + 0
= 24 - 20 + 4 - 0 + 0 = 8
In order to find the |Ai| and their intersections, you have to find intersections of intervals [Li,Ri] and multiply the lengths of intersections by the lengths of unrestricted intervals. For example,
|A1| = |[1234] cap [123]| x |[12]| x |[4]| = 3 x 2 x 1 = 6
|A2 cap A4| = |[123] cap [12]| x |[4] cap [1234]| = |[12]| x |[4]| = 2 x 1 = 2
I don't see what depth first search has to do with it in this approach.

It depends if the sets are disjoint or share elements. For n = 4, meaning quadruples, as you asked about, I think I got it down to between 1 and 4 iterations if we commit the ends to four types describing if x_1 is a member of X2 and x_4 a member of X3.
Example with three iterations:
input = {1,2,3}{1,2}{1,2,3}{3,4}
2 * (1)(12)(123)(3) = (1)(2)(1)(3) = 2 * 1 // x_1 ∈ X2, x_4 ∈ X3
2 * (1)(12)(123)(4) = (1)(2)(13)(4) = 2 * 2 // x_1 ∈ X2, x_4 ∉ X3
1 * (3)(12)(123)(4) = (3)(12)(12,3)(4) = 1 * (2 + 2) // x_1 ∉ X2, x_4 ∉ X3
Total = 10
Example with one iteration:
input = {1,2,3,4}{1,2,3,4}{1,2,3,4}{1,2,3,4} // x_1 ∈ X2, x_4 ∈ X3
12 * (1)(1234)(1234)(2) = (1)(2,34)(134)(2) = 12 * (3 + 4)
Total = 84

Related

Discussion about how to retrieve an i-th element in the j-th level of a binary tree algorithm

I am solving some problems from a site called codefights and the last one solved was about a binary tree in which are:
Consider a special family of Engineers and Doctors. This family has
the following rules:
Everybody has two children. The first child of an Engineer is an
Engineer and the second child is a Doctor. The first child of a Doctor
is a Doctor and the second child is an Engineer. All generations of
Doctors and Engineers start with an Engineer.
We can represent the situation using this diagram:
E
/ \
E D
/ \ / \
E D D E
/ \ / \ / \ / \
E D D E D E E D
Given the level and position of a person in the ancestor tree above,
find the profession of the person. Note: in this tree first child is
considered as left child, second - as right.
As there is some space and time restrictions, the solution can not be based on actually constructing the tree until the level required and check which element is in the position asked. So far so good. My proposed solution written in python was:
def findProfession(level, pos):
size = 2**(level-1)
shift = False
while size > 2:
if pos <= size/2:
size /= 2
else:
size /= 2
pos -= size
shift = not shift
if pos == 1 and shift == False:
return 'Engineer'
if pos == 1 and shift == True:
return 'Doctor'
if pos == 2 and shift == False:
return 'Doctor'
if pos == 2 and shift == True:
return 'Engineer'
As it solved the problem, I got access to the solutions of other used and I was astonished by this one:
def findProfession(level, pos):
return ['Engineer', 'Doctor'][bin(pos-1).count("1")%2]
Even more, I did not understand the logic behind it and so we arrived to this question. Someone could explain to me this algorithm?
Let's number the nodes of the tree in the following way:
1) the root has number 1
2) the first child of node x has number 2*x
3) the second child of node x has number 2*x+1
Now, notice that each time you go to the first child, the profession stays the same, and you add a 0 to the binary representation of the node.
And each time you go to the second child, the profession flips and you add a 1 to the binary representation.
Example: Let's find the profession of the 4th node in the 4th level (last level in the diagram you have in the question). First we start at the root with number 1, then we go to the first child with number 2 (10 binary). After that we go to the second child of 2 which is 5 (101 binary). Finally, we go to the second child of 5 which is 11 (1011 binary).
Notice that we started with only one bit equal to 1, then every 1 bit we added to the binary representation flipped the profession. So the number of times we flip a profession is equal to the (number of bits equal to 1) - 1. The parity of this amount decides the profession.
This leads us to the following solution:
X = number of bits equal to 1 in [ 2^(level-1) + pos - 1 ]
Y = (X-1) mod 2
if Y is 0 then the answer is "Engineer"
Otherwise the answer is "Doctor"
since 2^(level-1) is a power of 2, it has exactly one bit equal to 1, therefore you can write:
X = number of bits equal to 1 in [ pos-1 ]
Y = X mod 2
Which is equal to the solution you mentioned in the question.
This type of sequence is known as the Thue-Morse sequence. Using the same tree, here is a demonstration of why it gives the correct answer:
p is the 0-indexed position
b is the binary representation of p
c is the number of 1's in b
p0
E
b0
c0
/ \
p0 p1
E D
b0 b1
c0 c1
/ \ / \
p0 p1 p2 p3
E D D E
b0 b1 b10 b11
c0 c1 c1 c2
/ \ / \ / \ / \
p0 p1 p2 p3 p4 p5 p6 p7
E D D E D E E D
b0 b1 b10 b11 b100 b101 b110 b111
c0 c1 c1 c2 c1 c2 c2 c3
c is always even for Engineer and odd for Doctor. Therefore:
index = bin(pos-1).count('1') % 2
return ['Engineer', 'Doctor'][index]

Generate number with equal probability

You are given a function let’s say bin() which will generate 0 or 1 with equal probability. Now you are given a range of contiguous integers say [a,b] (a and b inclusive).
Write a function say rand() using bin() to generate numbers within range [a,b] with equal probability
The insight you need is that your bin() function returns a single binary digit, or "bit". Invoking it once gives you 0 or 1. If you invoke it twice you get two bits b0 and b1 which can be combined as b1 * 2 + b0, giving you one of 0, 1, 2 or 3 with equal probability. If you invoke it thrice you get three bits b0, b1 and b2. Put them together and you get b2 * 2^2 + b1 * 2 + b0, giving you a member of {0, 1, 2, 3, 4, 5, 6, 7} with equal probability. And so on, as many as you want.
Your range [a, b] has m = b-a+1 values. You just need enough bits to generate a number between 0 and 2^n-1, where n is the smallest value that makes 2^n-1 greater than or equal to m. Then just scale that set to start at a and you're good.
So let's say you are given the range [20, 30]. There are 11 numbers there from 20 to 30 inclusive. 11 is greater than 8 (2^3), but less than 16 (2^4), so you'll need 4 bits. Use bin() to generate four bits b0, b1, b2, and b3. Put them together as x = b3 * 2^3 + b2 * 2^2 + b1 * 2 + b0. You'll get a result, x, between 0 and 15. If x > 11 then generate another four bits. When x <= 11, your answer is x + 20.
Help, but no code:
You can shift the range [0,2 ** n] easily to [a,a+2 ** n]
You can easily produce an equal probability from [0,2**n-1]
If you need a number that isn't a power of 2, just generate a number up to 2 ** n and re-roll if it exceeds the number you need
Subtract the numbers to work out your range:
Decimal: 20 - 10 = 10
Binary : 10100 - 01010 = 1010
Work out how many bits you need to represent this: 4.
For each of these, generate a random 1 or 0:
num_bits = 4
rand[num_bits]
for (x = 0; x < num_bits; ++x)
rand[x] = bin()
Let's say rand[] = [0,1,0,0] after this. Add this number back to the start of your range.
Binary: 1010 + 0100 = 1110
Decimal: 10 + 4 = 14
You can always change the range [a,b] to [0,b-a], denote X = b - a. Then you can define a function rand(X) as follows:
function int rand(X){
int i = 1;
// determine how many bits you need (see above answer for why)
while (X < 2^i) {
i++;
}
// generate the random numbers
Boolean cont = true;
int num = 0;
while (cont == true) {
for (j = 1 to i) {
// this generates num in range [0,2^i -1] with equal prob
// but we need to discard if num is larger than X
num = num + bin() * 2^j;
}
if (num <= X) { cont = false}
}
return num;
}

A feature ranking algorithm

if I have the following partitions or subsets with the corresponding scores as follows:
{X1,X2} with score C1
{X2,X3} with score C2
{X3,X4} with score C3
{X4,X1} with score C4
I want to write an algorithm that will rank the Xs based on the corresponding score of the subset they appeared in.
one way for example will be to do the following:
X1 = (C1 + C4)/2
X2 = (C1 + C2)/2
X3 = (C2 + C3)/2
X4 = (C3 + C4)/2
and then sort the results.
is there a more efficient or better ideas to do the ranking?
If you think that the score of a set is the sum of the scores of each object, you can write your equation in matrix form as :
C = M * X
where C is a vector of length 4 with components C1, C2, C3, C4, M is the matrix (in your case, as I understand this may vary)
1 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1
and X is the unknown. You can then use Gaussian elimination to determine X and the get the ranking as you suggested.

Segmented Least Squares

Give an algorithm that takes a sequence of points in the plane (x_1, y_1), (x_2, y_2), ...., (x_n, y_n) and an integer k as input and returns the best piecewise linear function f consisting of at most k pieces that minimizes the sum squared error. You may assume that you have access to an algorithm that computes the sum squared error for one segment through a set of n points in Θ(n) time.The solution should use O(n^2k) time and O(nk) space.
Can anyone help me with this problem? Thank you so much!
(This is too late for your homework, but hope it helps anyway.)
First is dynamic programming in python / numpy for k = 4 only,
to help you understand how dynamic programming works;
once you understand that, writing a loop for any k should be easy.
Also, Cost[] is a 2d matrix, space O(n^2);
see the notes at the end for getting down to space O(n k)
#!/usr/bin/env python
""" split4.py: min-cost split into 4 pieces, dynamic programming k=4 """
from __future__ import division
import numpy as np
__version__ = "2014-03-09 mar denis"
#...............................................................................
def split4( Cost, verbose=1 ):
""" split4.py: min-cost split into 4 pieces, dynamic programming k=4
min Cost[0:a] + Cost[a:b] + Cost[b:c] + Cost[c:n]
Cost[a,b] = error in least-squares line fit to xy[a] .. xy[b] *including b*
or error in lsq horizontal lines, sum (y_j - av y) ^2 for each piece --
o--
o-
o---
o----
| | | |
0 2 5 9
(Why 4 ? to walk through step by step, then put in a loop)
"""
# speedup: maxlen 2 n/k or so
Cost = np.asanyarray(Cost)
n = Cost.shape[1]
# C2 C3 ... costs, J2 J3 ... indices of best splits
J2 = - np.ones(n, dtype=int) # -1, NaN mark undefined / bug
C2 = np.ones(n) * np.NaN
J3 = - np.ones(n, dtype=int)
C3 = np.ones(n) * np.NaN
# best 2-splits of the left 2 3 4 ...
for nleft in range( 1, n ):
J2[nleft] = j = np.argmin([ Cost[0,j-1] + Cost[j,nleft] for j in range( 1, nleft+1 )]) + 1
C2[nleft] = Cost[0,j-1] + Cost[j,nleft]
# an idiom for argmin j, min value c together
# best 3-splits of the left 3 4 5 ...
for nleft in range( 2, n ):
J3[nleft] = j = np.argmin([ C2[j-1] + Cost[j,nleft] for j in range( 2, nleft+1 )]) + 2
C3[nleft] = C2[j-1] + Cost[j,nleft]
# best 4-split of all n --
j4 = np.argmin([ C3[j-1] + Cost[j,n-1] for j in range( 3, n )]) + 3
c4 = C3[j4-1] + Cost[j4,n-1]
j3 = J3[j4]
j2 = J2[j3]
jsplit = np.array([ 0, j2, j3, j4, n ])
if verbose:
print "split4: len %s pos %s cost %.3g" % (np.diff(jsplit), jsplit, c4)
print "split4: J2 %s C2 %s" %(J2, C2)
print "split4: J3 %s C3 %s" %(J3, C3)
return jsplit
#...............................................................................
if __name__ == "__main__":
import random
import sys
import spread
n = 10
ncycle = 2
plot = 0
seed = 0
# run this.py a=1 b=None c=[3] 'd = expr' ... in sh or ipython
for arg in sys.argv[1:]:
exec( arg )
np.set_printoptions( 1, threshold=100, edgeitems=10, linewidth=100, suppress=True )
np.random.seed(seed)
random.seed(seed)
print "\n", 80 * "-"
title = "Dynamic programming least-square horizontal lines %s n %d seed %d" % (
__file__, n, seed)
print title
x = np.arange( n + 0. )
y = np.sin( 2*np.pi * x * ncycle / n )
# synthetic time series ?
print "y: %s av %.3g variance %.3g" % (y, y.mean(), np.var(y))
print "Cost[j,k] = sum (y - av y)^2 --" # len * var y[j:k+1]
Cost = spread.spreads_allij( y )
print Cost # .round().astype(int)
jsplit = split4( Cost )
# split4: len [3 2 3 2] pos [ 0 3 5 8 10]
if plot:
import matplotlib.pyplot as pl
title += "\n lengths: %s" % np.diff(jsplit)
pl.title( title )
pl.plot( y )
for js, js1 in zip( jsplit[:-1], jsplit[1:] ):
if js1 <= js: continue
yav = y[js:js1].mean() * np.ones( js1 - js + 1 )
pl.plot( np.arange( js, js1 + 1 ), yav )
# pl.legend()
pl.show()
Then, the following code does Cost[] for horizontal lines only, slope 0;
extending it to line segments of any slope, in time O(n), is left as an exercise.
""" spreads( all y[:j] ) in time O(n)
define spread( y[] ) = sum (y - average y)^2
e.g. spread of 24 hourly temperatures y[0:24] i.e. y[0] .. y[23]
around a horizontal line at the average temperature
(spread = 0 for constant temperature,
24 c^2 for constant + [c -c c -c ...],
24 * variance(y) )
How fast can one compute all 24 spreads
1 hour (midnight to 1 am), 2 hours ... all 24 ?
A simpler problem: compute all 24 averages in time O(n):
N = np.arange( 1, len(y)+1 )
allav = np.cumsum(y) / N
= [ y0, (y0 + y1) / 2, (y0 + y1 + y2) / 3 ...]
An identity:
spread(y) = sum(y^2) - n * (av y)^2
Voila: the code below, all spreads() in time O(n).
Exercise: extend this to spreads around least-squares lines
fit to [ y0, [y0 y1], [y0 y1 y2] ... ], not just horizontal lines.
"""
from __future__ import division
import sys
import numpy as np
#...............................................................................
def spreads( y ):
""" [ spread y[:1], spread y[:2] ... spread y ] in time O(n)
where spread( y[] ) = sum (y - average y )^2
= n * variance(y)
"""
N = np.arange( 1, len(y)+1 )
return np.cumsum( y**2 ) - np.cumsum( y )**2 / N
def spreads_allij( y ):
""" -> A[i,j] = sum (y - av y)^2, spread of y around its average
for all y[i:j+1]
time, space O(n^2)
"""
y = np.asanyarray( y, dtype=float )
n = len(y)
A = np.zeros((n,n))
for i in range(n):
A[i,i:] = spreads( y[i:] )
return A
So far we have an n x n cost matrix, space O(n^2).
To get down to space O( n k ),
look closely at the pattern of Cost[i,j] accesses in the dyn-prog code:
for nleft .. to n:
Cost_nleft = Cost[j,nleft ] -- time nleft or nleft^2
for k in 3 4 5 ...:
min [ C[k-1, j-1] + Cost_nleft[j] for j .. to nleft ]
Here Cost_nleft is one row of the full n x n cost matrix, ~ n segments, generated as needed.
This can be done in time O(n) for line segments.
But if "error for one segment through a set of n points takes O(n) time",
it seems we're up to time O(n^3). Comments anyone ?
If you can do least squares for some segment in n^2, it's easy to do what you want in n^2 k^2 with dynamic programming. You might be able to optimize that to a single k only.

How to decompose an integer in two for grid creation

Given an integer N I want to find two integers A and B that satisfy A × B ≥ N with the following conditions:
The difference between A × B and N is as low as possible.
The difference between A and B is as low as possible (to approach a square).
Example: 23. Possible solutions 3 × 8, 6 × 4, 5 × 5. 6 × 4 is the best since it leaves just one empty space in the grid and is "less" rectangular than 3 × 8.
Another example: 21. Solutions 3 × 7 and 4 × 6. 3 × 7 is the desired one.
A brute force solution is easy. I would like to see if a clever solution is possible.
Easy.
In pseudocode
a = b = floor(sqrt(N))
if (a * b >= N) return (a, b)
a += 1
if (a * b >= N) return (a, b)
return (a, b+1)
and it will always terminate, the distance between a and b at most only 1.
It will be much harder if you relax second constraint, but that's another question.
Edit: as it seems that the first condition is more important, you have to attack the problem
a bit differently. You have to specify some method to measure the badness of not being square enough = 2nd condition, because even prime numbers can be factorized as 1*number, and we fulfill the first condition. Assume we have a badness function (say a >= b && a <= 2 * b), then factorize N and try different combinations to find best one. If there aren't any good enough, try with N+1 and so on.
Edit2: after thinking a bit more I come with this solution, in Python:
from math import sqrt
def isok(a, b):
"""accept difference of five - 2nd rule"""
return a <= b + 5
def improve(a, b, N):
"""improve result:
if a == b:
(a+1)*(b-1) = a^2 - 1 < a*a
otherwise (a - 1 >= b as a is always larger)
(a+1)*(b-1) = a*b - a + b - 1 =< a*b
On each iteration new a*b will be less,
continue until we can, or 2nd condition is still met
"""
while (a+1) * (b-1) >= N and isok(a+1, b-1):
a, b = a + 1, b - 1
return (a, b)
def decomposite(N):
a = int(sqrt(N))
b = a
# N is square, result is ok
if a * b >= N:
return (a, b)
a += 1
if a * b >= N:
return improve(a, b, N)
return improve(a, b+1, N)
def test(N):
(a, b) = decomposite(N)
print "%d decomposed as %d * %d = %d" % (N, a, b, a*b)
[test(x) for x in [99, 100, 101, 20, 21, 22, 23]]
which outputs
99 decomposed as 11 * 9 = 99
100 decomposed as 10 * 10 = 100
101 decomposed as 13 * 8 = 104
20 decomposed as 5 * 4 = 20
21 decomposed as 7 * 3 = 21
22 decomposed as 6 * 4 = 24
23 decomposed as 6 * 4 = 24
I think this may work (your conditions are somewhat ambiguous). this solution is somewhat similar to other one, in basically produces rectangular matrix which is almost square.
you may need to prove that A+2 is not optimal condition
A0 = B0 = ceil (sqrt N)
A1 = A0+1
B1 = B0-1
if A0*B0-N > A1*B1-N: return (A1,B1)
return (A0,B0)
this is solution if first condition is dominant (and second condition is not used)
A0 = B0 = ceil (sqrt N)
if A0*B0==N: return (A0,B0)
return (N,1)
Other conditions variations will be in between
A = B = ceil (sqrt N)

Resources