Finding the "expanded to factored" algorithm - algorithm

This question is about algorithms and thus language-independent.
Given the following rows:
A1, B1, C1, D1 (1)
A1, B2, C1, D1 (2)
A2, B1, C1, D1 (3)
A2, B2, C1, D1 (4)
A3, B1, C1, D1 (5)
A3, B2, C1, D1 (6)
A1, B1, C2, D1 (7)
They can be factored as follow:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| A2 | B2 | | |
| A3 | | | |
+----+----+----+----+
| A1 | B1 | C2 | D1 |
+----+----+----+----+
The following objects can store those data:
class ExpandedRow {
String a;
String b;
String c;
String d;
}
class FactoredRow {
List<String> as;
List<String> bs;
List<String> cs;
List<String> ds;
}
Concerning the transformations algorithms, the factored --> expanded one is quite easy:
List<FactoredRow> factoredRows = fill();
List<ExpandedRow> expandedRows = empty();
for each factoredRow in factoredRows {
for each a in factoredRow.as {
for each b in factoredRow.bs {
for each c in factoredRow.cs {
for each d in factoredRow.ds {
expandedRows.add(new ExpandedRow(a, b, c, d));
}
}
}
}
}
But I'm lost concerning the expanded --> factored one. How can I factorize a List<ExpandedRow> into a List<FactoredRow>?
In other words, I have the factored table as input. I expand it using the provided algorithm and store it in its expanded state. The question is: how to retrieve the initial factored state after having expanding it?
I thought that if two expanded rows have only one attribute that differs, they can be factored, for example A1, B1, C1, D1 (1) and A1, B1, C2, D1 (2). But if we factorize those two rows together, we will end with:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| | | C2 | |
+----+----+----+----+
| A1 | B2 | C1 | D1 |
| A2 | | | |
| A3 | | | |
+----+----+----+----+
| A2 | B1 | C1 | D1 |
| A3 | | | |
+----+----+----+----+
Which is less factored than the initial table.
It's seems that there are many factored solutions, and the main issue is to define and to find the most factored one.

This problem seems something like a graph partitioning problem. I suspect it's NP-hard but I haven't been able to prove it yet.
Let's take a simpler example to see what's going on. Consider the pairs (A1,B1), (A2,B1), (A3,B1), (A2,B2). We represent the points as points in 2D-space, and connect points if it is possible to move from one to the other by a translation parallel to the x- or y-axis:
(A2,B2)
|
(A1,B1) -- (A2,B1) -- (A3,B1)
The idea is to partition the graph by lines parallel to the axes, and repartition each partition, and so on, until we get pieces that are complete rectangles, line segments, or points.
There are two esssentially different ways of partitioning the graph above. We can draw a vertical line at position x=1.5:
(A2,B2)
|
(A1,B1) (A2,B1) -- (A3,B1)
after which the right-side piece needs to be further partitioned (by a vertical or horizontal line, let's take horizontal):
(A2,B2)
(A1,B1) (A2,B1) -- (A3,B1)
We have now factored the original list into
A1 B1
-----
A2 B2
-----
A2 B1
A3
On the other hand, if we had made our initial partition with a horizontal line at position y=1.5, we would have
(A2,B2)
(A1,B1) -- (A2,B1) -- (A3,B1)
which is already nicely factored into a point and a line segment:
A2 B2
-----
A1 B1
A2
A3
In higher dimensions (4D for letters A, B, C, D) we have a similar problem, except that there are correspondingly more choices for initial cuts, and the allowed final pieces are higher-dimesional (not just points, line segments, and rectangles but also 3D and 4D boxes).
The problem feels NP-hard to me, just like many other graph partitioning problems, but there are probably reasonably fast approximation algorithms.

Related

Find rooted sub-tree containing predefined set of values

Given a tree of nodes find a rooted sub-tree that contains a set of predefined values. The nodes in the tree are unique but their associated values may be repeated.
Ideally the most shallow sub-tree is returned. The sub-tree may also be returned simply as an array of nodes (or their unique IDs).
Here are several atomic test cases with the ideal resulting sub-tree:
#1 [A, A, B, C]
A1 Answer: A1
/ | \ / \
B1 D1 C1 B1 C1
/ \ \ /
A2 B2 B3 A2
#2 [A, B, B, A]
A1 Answer(s): A1 A1 A1 the first solution is preferred
/ \ / \ / \ as it is most shallow
B1 B2 B1 B2 B1 B2
/ \ \ / / \ \
A2 B3 B4 A2 A2 B3 B4
\ \
A3 A3
#3 [A, A, B, C]
A1 Answer: Not possible as only one B can be matched
/ \
B1 B2
/ \
A2 C1
#4 [B, B]
A1 Answer: Not possible as the root 'A' is not in the set
/ \
B1 B2
My approach was broken into two steps:
Breadth-first scan until all nodes in the set are found returning a tree that definitely contains the desired sub-tree.
Use backtracking to search the resulting sub-tree (essentially all permutations of the sub-tree) to find the exact nodes that satisfy the set.
However, this solution is not very efficient. It seems like I should be able to find the desired sub-tree simply by using a modified breadth-first search. I've also been unable to make this work in practice.

Avoid accuracy problems while computing the permanent using the Ryser formula

Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).

Uniqueness in Permutation and Combination

I am trying to create some pseudocode to generate possible outcomes for this scenario:
There is a tournament taking place, where each round all players in the tournament are in a group with other players of different teams.
Given x amount of teams, each team has exactly n amount of players. What are the possible outcomes for groups of size r where you can only have one player of each team AND the player must have not played with any of the other players already in previous rounds.
Example: 4 teams (A-D), 4 players each team, 4 players each grouping.
Possible groupings are: (correct team constraint)
A1, B1, C1, D1
A1, B3, C1, D2
But not: (violates same team constraint)
A1, A3, C2, D2
B3, C2, D4, B1
However, the uniqueness constraint comes into play in this grouping
A1, B1, C1, D1
A1, B3, C1, D2
While it does follow the constraints of playing with different teams, it has broken the rule of uniqueness of playing with different players. In this case A1 is grouped up twice with C1
At the end of the day the pseudocode should be able to create something like the following
Round 1 Round 2 Round 3 Round 4
a1 b1 a1 d4 a1 c2 a1 c4
c1 d1 b2 c3 b4 d3 d2 b3
a2 b2 a2 d1 a2 c3 a2 c1
c2 d2 b3 c4 b1 d4 d3 b4
a3 b3 a3 d2 a3 c4 a3 c2
c3 d3 b4 c1 b2 d1 d4 b1
a4 b4 a4 d3 a4 c1 a4 c3
c4 d4 b1 c2 b3 d2 d1 b2
In the example you see that in each round no player has been grouped up with another previous player.
If the number of players on a team is a prime power (2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, etc.), then here's an algorithm that creates a schedule with the maximum number of rounds, based on a finite affine plane.
We work in the finite field GF(n), where n is the number of players on a team. GF(n) has its own notion of multiplication; when n is a prime, it's multiplication mod n, and when n is higher power of some prime, it's multiplication of univariate polynomials mod some irreducible polynomial of the appropriate degree. Each team is identified by a nonzero element of GF(n); let the set of team identifiers be T. Each team member is identified by a pair in T×GF(n). For each nonzero element r of GF(n), the groups for round r are
{{(t, r*t + c) | t in T} | c in GF(n)},
where * and + denote multiplication and addition respectively in GF(n).
Implementation in Python 3
This problem is very closely related to the Social Golfer Problem. The Social Golfer Problem asks, given n players who each play once a day in g groups of size s (n = g×s), how many days can they be scheduled such that no player plays with any other player more than once?
The algorithms for finding solutions to instances of Social Golfer problems are a patchwork of constraint solvers and mathematical constructions, which together don't address very many cases satisfactorily. If the number of players on a team is equal to the group size, then solutions to this problem can be derived by interpreting the first day's schedule as the team assignments and then using the rest of the schedule. There may be other constructions.

scheduling algorithm shortest job first

i am trying to understand how shortest job first algorithm works, am i doing this in the right way please help
Proc Burst1 Burst2
+------+---------+--------+
| A | 10 | 5 |
| B | 3 | 9 |
| C | 8 | 11 |
+------+---------+--------+
B1->3->C1->11->B2->20->A1->30->A2->35->C2->46
"Shortest job first" is not really an algorithm, but a strategy: among the jobs ready to execute always choose the job with the shortest execution time. Your sequence looks ok. In the beginning the following jobs are ready for execution (with execution time in parenthesis):
A1(10), B1(3), C1(8)
So B1 is chosen, after which also job B2 is ready to execute, so here is the updated list of ready jobs:
A1(10), B2(9), C1(8)
Now C1 is chosen, and so on.
There are variants of the strategy "shortest job first", where the total time over all bursts, i.e. A1 + A2, B1 + B2, ..., is taken into account. Then the chosen sequence would be:
B1, B2, A1, A2, C1, C2

Find a node in a the tree based on some selection criteria

[BASE]
/ \ \
C1 C2 C3
/\ \
C4 C5 C6
I have a tree like the above. This is a N child tree which is not balanced. The problem is, I need to select one of the node based on some condition. Like
Select C1 when k1 = a
Select C4 when K1 = a and K2=b and K3=C
Select C5 when k1 = a and k'=z
Select C2 when K'' = b
Select C3 when k5 = 9
Select C6 when k5=9 and k6 = 10
The input to the program would be an arbitraty length of key value pairs like if input is -k1=a,k2=b,k3=c,k8=10 - I should select C4 as that is the best match.
Ideally I was thinking of traversing the tree and for each node, there is a selection criteria which I can match against the input set. But soon I figured out, this tree can be very huge and Base node can have tens of thousands of child nodes under it. So it might not be a good idea to go node by node. If there is a way to select the nodes more efficiently, I would love to know that.
Looks like your k's are pointing to directory structure and the leaf of this structure (exactly one leaf for each directory) is the node you are looking for. You can keep this string in node as another value. What is not clear in question is how are the k's related to the tree
for e.g.
a->c1
a/b/c->c4
I have found a workable solution like this one
----------------------------------------
|rowId|param1|param2|param3|param4|node|
----------------------------------------
|10 | a | | | | C1 |
----------------------------------------
|14 | a | b | c | | C4 |
----------------------------------------
|18 | a | b | | | C5 |
----------------------------------------
Lets call it a condition table. Each column represent the input series (k) and for different combinations of the value, there is a node to be selected. This table can be think of an in memory data structure or a real table in RDBMS.

Resources