Find rooted sub-tree containing predefined set of values - algorithm

Given a tree of nodes find a rooted sub-tree that contains a set of predefined values. The nodes in the tree are unique but their associated values may be repeated.
Ideally the most shallow sub-tree is returned. The sub-tree may also be returned simply as an array of nodes (or their unique IDs).
Here are several atomic test cases with the ideal resulting sub-tree:
#1 [A, A, B, C]
A1 Answer: A1
/ | \ / \
B1 D1 C1 B1 C1
/ \ \ /
A2 B2 B3 A2
#2 [A, B, B, A]
A1 Answer(s): A1 A1 A1 the first solution is preferred
/ \ / \ / \ as it is most shallow
B1 B2 B1 B2 B1 B2
/ \ \ / / \ \
A2 B3 B4 A2 A2 B3 B4
\ \
A3 A3
#3 [A, A, B, C]
A1 Answer: Not possible as only one B can be matched
/ \
B1 B2
/ \
A2 C1
#4 [B, B]
A1 Answer: Not possible as the root 'A' is not in the set
/ \
B1 B2
My approach was broken into two steps:
Breadth-first scan until all nodes in the set are found returning a tree that definitely contains the desired sub-tree.
Use backtracking to search the resulting sub-tree (essentially all permutations of the sub-tree) to find the exact nodes that satisfy the set.
However, this solution is not very efficient. It seems like I should be able to find the desired sub-tree simply by using a modified breadth-first search. I've also been unable to make this work in practice.

Related

Discussion about how to retrieve an i-th element in the j-th level of a binary tree algorithm

I am solving some problems from a site called codefights and the last one solved was about a binary tree in which are:
Consider a special family of Engineers and Doctors. This family has
the following rules:
Everybody has two children. The first child of an Engineer is an
Engineer and the second child is a Doctor. The first child of a Doctor
is a Doctor and the second child is an Engineer. All generations of
Doctors and Engineers start with an Engineer.
We can represent the situation using this diagram:
E
/ \
E D
/ \ / \
E D D E
/ \ / \ / \ / \
E D D E D E E D
Given the level and position of a person in the ancestor tree above,
find the profession of the person. Note: in this tree first child is
considered as left child, second - as right.
As there is some space and time restrictions, the solution can not be based on actually constructing the tree until the level required and check which element is in the position asked. So far so good. My proposed solution written in python was:
def findProfession(level, pos):
size = 2**(level-1)
shift = False
while size > 2:
if pos <= size/2:
size /= 2
else:
size /= 2
pos -= size
shift = not shift
if pos == 1 and shift == False:
return 'Engineer'
if pos == 1 and shift == True:
return 'Doctor'
if pos == 2 and shift == False:
return 'Doctor'
if pos == 2 and shift == True:
return 'Engineer'
As it solved the problem, I got access to the solutions of other used and I was astonished by this one:
def findProfession(level, pos):
return ['Engineer', 'Doctor'][bin(pos-1).count("1")%2]
Even more, I did not understand the logic behind it and so we arrived to this question. Someone could explain to me this algorithm?
Let's number the nodes of the tree in the following way:
1) the root has number 1
2) the first child of node x has number 2*x
3) the second child of node x has number 2*x+1
Now, notice that each time you go to the first child, the profession stays the same, and you add a 0 to the binary representation of the node.
And each time you go to the second child, the profession flips and you add a 1 to the binary representation.
Example: Let's find the profession of the 4th node in the 4th level (last level in the diagram you have in the question). First we start at the root with number 1, then we go to the first child with number 2 (10 binary). After that we go to the second child of 2 which is 5 (101 binary). Finally, we go to the second child of 5 which is 11 (1011 binary).
Notice that we started with only one bit equal to 1, then every 1 bit we added to the binary representation flipped the profession. So the number of times we flip a profession is equal to the (number of bits equal to 1) - 1. The parity of this amount decides the profession.
This leads us to the following solution:
X = number of bits equal to 1 in [ 2^(level-1) + pos - 1 ]
Y = (X-1) mod 2
if Y is 0 then the answer is "Engineer"
Otherwise the answer is "Doctor"
since 2^(level-1) is a power of 2, it has exactly one bit equal to 1, therefore you can write:
X = number of bits equal to 1 in [ pos-1 ]
Y = X mod 2
Which is equal to the solution you mentioned in the question.
This type of sequence is known as the Thue-Morse sequence. Using the same tree, here is a demonstration of why it gives the correct answer:
p is the 0-indexed position
b is the binary representation of p
c is the number of 1's in b
p0
E
b0
c0
/ \
p0 p1
E D
b0 b1
c0 c1
/ \ / \
p0 p1 p2 p3
E D D E
b0 b1 b10 b11
c0 c1 c1 c2
/ \ / \ / \ / \
p0 p1 p2 p3 p4 p5 p6 p7
E D D E D E E D
b0 b1 b10 b11 b100 b101 b110 b111
c0 c1 c1 c2 c1 c2 c2 c3
c is always even for Engineer and odd for Doctor. Therefore:
index = bin(pos-1).count('1') % 2
return ['Engineer', 'Doctor'][index]

Avoid accuracy problems while computing the permanent using the Ryser formula

Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).

Uniqueness in Permutation and Combination

I am trying to create some pseudocode to generate possible outcomes for this scenario:
There is a tournament taking place, where each round all players in the tournament are in a group with other players of different teams.
Given x amount of teams, each team has exactly n amount of players. What are the possible outcomes for groups of size r where you can only have one player of each team AND the player must have not played with any of the other players already in previous rounds.
Example: 4 teams (A-D), 4 players each team, 4 players each grouping.
Possible groupings are: (correct team constraint)
A1, B1, C1, D1
A1, B3, C1, D2
But not: (violates same team constraint)
A1, A3, C2, D2
B3, C2, D4, B1
However, the uniqueness constraint comes into play in this grouping
A1, B1, C1, D1
A1, B3, C1, D2
While it does follow the constraints of playing with different teams, it has broken the rule of uniqueness of playing with different players. In this case A1 is grouped up twice with C1
At the end of the day the pseudocode should be able to create something like the following
Round 1 Round 2 Round 3 Round 4
a1 b1 a1 d4 a1 c2 a1 c4
c1 d1 b2 c3 b4 d3 d2 b3
a2 b2 a2 d1 a2 c3 a2 c1
c2 d2 b3 c4 b1 d4 d3 b4
a3 b3 a3 d2 a3 c4 a3 c2
c3 d3 b4 c1 b2 d1 d4 b1
a4 b4 a4 d3 a4 c1 a4 c3
c4 d4 b1 c2 b3 d2 d1 b2
In the example you see that in each round no player has been grouped up with another previous player.
If the number of players on a team is a prime power (2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, etc.), then here's an algorithm that creates a schedule with the maximum number of rounds, based on a finite affine plane.
We work in the finite field GF(n), where n is the number of players on a team. GF(n) has its own notion of multiplication; when n is a prime, it's multiplication mod n, and when n is higher power of some prime, it's multiplication of univariate polynomials mod some irreducible polynomial of the appropriate degree. Each team is identified by a nonzero element of GF(n); let the set of team identifiers be T. Each team member is identified by a pair in T×GF(n). For each nonzero element r of GF(n), the groups for round r are
{{(t, r*t + c) | t in T} | c in GF(n)},
where * and + denote multiplication and addition respectively in GF(n).
Implementation in Python 3
This problem is very closely related to the Social Golfer Problem. The Social Golfer Problem asks, given n players who each play once a day in g groups of size s (n = g×s), how many days can they be scheduled such that no player plays with any other player more than once?
The algorithms for finding solutions to instances of Social Golfer problems are a patchwork of constraint solvers and mathematical constructions, which together don't address very many cases satisfactorily. If the number of players on a team is equal to the group size, then solutions to this problem can be derived by interpreting the first day's schedule as the team assignments and then using the rest of the schedule. There may be other constructions.

Find the best set among the many sets based on it's item's cost

I have items in sets as a below example. Each item contains particular cost.
I have a max budget. I need to do combination in such a way that in each combination I need at least one item from each set and sum of the costs should be equal to my budget.
Example
A = [a1, a2, a3, a4, ... , a10]
B = [b1, b2, b3, b4, ... , b10]
C = [c1, c2, c3, c4, ... , c10] may be upto G
Max budget = 10
cost of a1 = 2
a2 = 8
b1 = 1
b2 = 7
c1 = 3
c2 = 1
etc
Output can be
[a1, b2, c2] i,e 2+7+1 = 10
[a2, b1, c2] i,e 8+1+1 = 10
[a1, b1, c1] i,e 2+1+3 = 6 Eliminated (since 6 != 10)
goes on
I can have max of 7 sets and 10 items in each. So maximum combinations will be 10^7. Is there any algorithm to achieve this easily. I followed brute force method and it is too expensive.
Thank you.

Finding the "expanded to factored" algorithm

This question is about algorithms and thus language-independent.
Given the following rows:
A1, B1, C1, D1 (1)
A1, B2, C1, D1 (2)
A2, B1, C1, D1 (3)
A2, B2, C1, D1 (4)
A3, B1, C1, D1 (5)
A3, B2, C1, D1 (6)
A1, B1, C2, D1 (7)
They can be factored as follow:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| A2 | B2 | | |
| A3 | | | |
+----+----+----+----+
| A1 | B1 | C2 | D1 |
+----+----+----+----+
The following objects can store those data:
class ExpandedRow {
String a;
String b;
String c;
String d;
}
class FactoredRow {
List<String> as;
List<String> bs;
List<String> cs;
List<String> ds;
}
Concerning the transformations algorithms, the factored --> expanded one is quite easy:
List<FactoredRow> factoredRows = fill();
List<ExpandedRow> expandedRows = empty();
for each factoredRow in factoredRows {
for each a in factoredRow.as {
for each b in factoredRow.bs {
for each c in factoredRow.cs {
for each d in factoredRow.ds {
expandedRows.add(new ExpandedRow(a, b, c, d));
}
}
}
}
}
But I'm lost concerning the expanded --> factored one. How can I factorize a List<ExpandedRow> into a List<FactoredRow>?
In other words, I have the factored table as input. I expand it using the provided algorithm and store it in its expanded state. The question is: how to retrieve the initial factored state after having expanding it?
I thought that if two expanded rows have only one attribute that differs, they can be factored, for example A1, B1, C1, D1 (1) and A1, B1, C2, D1 (2). But if we factorize those two rows together, we will end with:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| | | C2 | |
+----+----+----+----+
| A1 | B2 | C1 | D1 |
| A2 | | | |
| A3 | | | |
+----+----+----+----+
| A2 | B1 | C1 | D1 |
| A3 | | | |
+----+----+----+----+
Which is less factored than the initial table.
It's seems that there are many factored solutions, and the main issue is to define and to find the most factored one.
This problem seems something like a graph partitioning problem. I suspect it's NP-hard but I haven't been able to prove it yet.
Let's take a simpler example to see what's going on. Consider the pairs (A1,B1), (A2,B1), (A3,B1), (A2,B2). We represent the points as points in 2D-space, and connect points if it is possible to move from one to the other by a translation parallel to the x- or y-axis:
(A2,B2)
|
(A1,B1) -- (A2,B1) -- (A3,B1)
The idea is to partition the graph by lines parallel to the axes, and repartition each partition, and so on, until we get pieces that are complete rectangles, line segments, or points.
There are two esssentially different ways of partitioning the graph above. We can draw a vertical line at position x=1.5:
(A2,B2)
|
(A1,B1) (A2,B1) -- (A3,B1)
after which the right-side piece needs to be further partitioned (by a vertical or horizontal line, let's take horizontal):
(A2,B2)
(A1,B1) (A2,B1) -- (A3,B1)
We have now factored the original list into
A1 B1
-----
A2 B2
-----
A2 B1
A3
On the other hand, if we had made our initial partition with a horizontal line at position y=1.5, we would have
(A2,B2)
(A1,B1) -- (A2,B1) -- (A3,B1)
which is already nicely factored into a point and a line segment:
A2 B2
-----
A1 B1
A2
A3
In higher dimensions (4D for letters A, B, C, D) we have a similar problem, except that there are correspondingly more choices for initial cuts, and the allowed final pieces are higher-dimesional (not just points, line segments, and rectangles but also 3D and 4D boxes).
The problem feels NP-hard to me, just like many other graph partitioning problems, but there are probably reasonably fast approximation algorithms.

Resources