Related
Lets say I have package A version 1 and package A version 2, Will call them A1 and A2 respectively.
If I have a pool of packages: A1, A2, B1, B2, C1, C2, D1, D2
A1 depends on B1, will represent as (A1, (B1)).
Plus A1 depends on any version of package C "C1 or C2 satisfy A1", will represent as (A1, (C1, C2))
combining A1 deps together, then A1 data-structure becomes: (A1, (B1), (C1, C2))
Also B1 depends on D1: (B1, (D1))
A1 structure becomes: (A1, ((B1, (D1))), (C1, C2))
similarly A2 structure is (A2, ((B2, (D2))), (C1, C2))
My question is: How can I select best candidate of package A, where I can select based on a condition (for example, the condition is the package does not conflict with current installed packages).
by combining A1 and A2: ((A1, ((B1, (D1))), (C1, C2)), (A2, ((B2, (D2))), (C1, C2)))
How can I traverse this data structure
So start with A1, if doesn't conflict check B1, if doesn't conflict check D1, if doesn't conflict check (C1, C2), and take one only either C1 or C2.
With this I end up selecting (A1, B1, D1, C1).
In case if A1 or any of its deps did not meet the condition, (for example if B1 conflicts with installed packages), then drop A1 entirely and move to check A2. then end up with (A2, B2, D2, C1).
What kind of traversal would that be?
I have been reading about in-order, pre-order, post-order traversal, and wondering if I need to do something similar here.
Assuming you are asking traversal on a more generic problem rather than working on this instance, I don't think there exists such a traversal.
Note that in-order is only applicable to BINARY trees. Any other kind of tree does not have in-order traversal. If your generic problem has B1, B2, B3, then apparently there wouldn't be a binary tree representation.
One property about traversal, is that the tree has all the information inclusively in the itself. When you traverse over a tree you never worry about "external information". In your case, your tree is not complete in information - you need to depend on external information to see if there is a conflict. e.g. B1 is installed - this information is never in the tree.
You can use adjacency list to represent the data:
Suppose the packages are A1, A2, B1, B2, C1, C2.
And A1 depends on B1 and C2, A2 depends on B1 and C1 and C2.
The above data can be represented as
[A1] -> [B1, C2]
[A2] -> [B1, C1, C2]
Use Topological Sorting to get the order of dependencies
I have an intermediate pig structure like
(A, B, (n. no Cs))
example:
(a1,b1, (c11,c12))
(a2,b2, (c21))
(a3,b3, (c31,c32, c33))
Now, I want the data in format
(a1, b1, c11)
(a1, b2, c12)
(a2, b2, c21) etc.
How do I go about doing it?
Essentially I want the size of the tuples, and then use this size for running a nested for loop.
Can you try the below approach?
input
a1 b1 (c11,c12)
a2 b2 (c21)
a3 b3 (c31,c32,c33)
PigScript:
A = LOAD 'input' AS(f1,f2,T:(f3:chararray));
B = FOREACH A GENERATE f1,f2,FLATTEN(T);
C = FOREACH B GENERATE f1,f2,FLATTEN(TOKENIZE(T::f3));
DUMP C;
Output:
(a1,b1,c11)
(a1,b1,c12)
(a2,b2,c21)
(a3,b3,c31)
(a3,b3,c32)
(a3,b3,c33)
Is there any special design pattern or algorithm available for this problem?
There are multiple items (A1,A2,A3 ..An) and I want to arrange them, some of them are related to the other ones and only can comes before or after them.
For example A2 can be placed only after A4 and “An” can be placed at the end of the set. But the point is that the sequence of some items can be interchangeable and based on sequence some items should not be in the set.
For examlpe consider this scenario
There are 6 items
A1, A2, A3, A4, A5, A6
And the rules are
A1 must be at the first place (always),
A2 can be after A4,
A5 can be in the set only if A3 has been there before it
A6 comes at the end of the set and it is a mandatory member but it only can be there if all other valid items have been in the set before it !
Valid sets are like this
A1, A4, A3, A2, A5, A6
A1, A4, A2, A6
Invalid sets
A4, A3, A2, A5, A6 (A1 is missed)
A1, A4, A3, A2, A5 (A6 is missed)
A1, A3, A2, A6 (A2 only comes after A4)
Note: I have to validate the input! and input can have any order! I mean I don’t want to sort the items I want to validate an input set from the user
As a sample, based on my example above the below sets are all valid
{A1, A4, A3, A2, A5, A6}
{A1, A4, A2, A3, A5, A6}
{A1, A3, A4, A2, A5, A6}
{A1, A3, A5, A4, A2, A6}
So user might enter any of these as an input and all of them are valid based on the defined conditions!
Any idea of any special design pattern or algorithm which can be applied to this problem? Number of items or the rules might get change in the future!
“BalusC” had been removed my “Design pattern” tag! But so far I think the best way to handle this problem might be the command pattern. I mean I considered each item as a command from the user and I defined a validation process for the command (“canExecute”) I am going to code it in C# and since ICommand interface in .Net has “canExecute” method I think I will use it to validate the command based on condition. (Execute method just added the item to the result set!) I have not coded it yet so I am not sure how complicated the validation process might be. I thought maybe someone has some idea how I can combine the command pattern and a validation algorithm to achieve the goals.
I might be wrong so any idea or suggestion can be helpful. Thanks.
This should be solvable with a variation of a topological sort.
Basically, you build a directed acyclic graph where there is an edge from Ai to Aj if Ai must come before Aj in the result. The topological sort will then give you a valid order for the A's.
This will not deal with the rule that some items may be missing, but that should be simple to layer on top of this.
A1, A4, A2, A6
cannot be a valid set, because
A6 comes at the end of the set and it is a mandatory member but it
only can be there if all other valid items have been in the set before
it
I am looking for an efficient algorithm to synchronize two arrays. Let's say a1 and a2 are two arrays given as input.
a1 - C , C++ , Java , C# , Perl
a2 - C++ , Python , Java , Cw , Haskel
Output 2 arrays:
Output A1: C , C++ , Java
Output A2: Cw , Haskell , Python
Output A1:
1) items common to both arrays
2) items only in A1 and not in A2
Output A2:
items only in a2
Thanks in advance.
Raj
Sort both arrays with an efficient sorting algorithm, complexity of O(n.log(n))
Build the output arrays initially empty
Compare the first element a1 of sorted A1 to the first element a2 of sorted A2
Equal means is in both arrays, put a1 into OutputA1
a1 < a2 means a1 is only in A1, a1 now necomes next element in sorted A1, put a1 into OutputA1
else a2 < a1 means a2 is only in A2, a2 now necomes next element in sorted A2, put a2 into OutputA2
Do this until you processed all elements in the sorted arrays, complexity of O(n).
Given a DAG, in which each node belongs to a category, how can this graph be transformed into a table with a column for each category? The transformation doesn't have to be reversible, but should preserve useful information about the structure of the graph; and should be a 'natural' transformation, in the sense that a person looking at the graph and the table should not be surprised by any of the rows. It should also be compact, i.e. have few rows.
For example given a graph of nodes a1,b1,b2,c1 with edges a1->b1, a1->b2, b1->c1, b2->c1 (i.e. a diamond-shaped graph) I would expect to see the following table:
a b c
--------
a1 b1 c1
a1 b2 c1
I've thought about this problem quite a bit, but I'm having trouble coming up with an algorithm that gives intuitive results on certain graphs. Consider the graph a1,b1,c1 with edges a1->c1, b1->c1. I'd like the algorithm to produce this table:
a b c
--------
a1 b1 c1
But maybe it should produce this instead:
a b c
--------
a1 c1
a1 b1
I'm looking for creative ideas and insights into the problem. Feel free to vary to simplify or constrain the problem if you think it will help.
Brainstorm away!
Edit:
The transformation should always produce the same set of rows, although the order of rows does not matter.
The table should behave nicely when sorting and filtering using, e.g., Excel. This means that mutliple nodes cannot be packed into a single cell of the table - only one node per cell.
What you need is a variation of topological sorting. This is an algorithm that "sorts" graph vertexes as if a---->b edge meant a > b. Since the graph is a DAG, there is no cycles in it and this > relation is transitive, so at least one sorting order exists.
For your diamond-shaped graph two topological orders exist:
a1 b1 b2 c1
a1 b2 b1 c1
b1 and b2 items are not connected, even indirectly, therefore, they may be placed in any order.
After you sorted the graph, you know an approximation of order. My proposal is to fill the table in a straightforward way (1 vertex per line) and then "compact" the table. Perform sorting and pick the sequence you got as output. Fill the table from top to bottom, assigning a vertex to relevant column:
a b c
--------
a1
b2
b1
c1
Now compact the table by walking from top to bottom (and then make similar pass from bottom to top). On each iteration, you take a closer look to a "current" row (marked as =>) and to the "next" row.
If in a column nodes in current and next node differ, do nothing for this column:
from ----> to
X b c X b c
-------- --------
=> X1 . . X1 . .
X2 . . => X2 . .
If in a column X in the next row there is no vertex (table cell is empty) and in the current row there is vertex X1, then you sometimes should fill this empty cell with a vertex in the current row. But not always: you want your table to be logical, don't you? So copy the vertex if and only if there's no edge b--->X1, c--->X1, etc, for all vertexes in current row.
from ---> to
X b c X b c
-------- --------
=> X1 b c X1 b c
b1 c1 => X1 b1 c1
(Edit:) After first (forward) and second (backward) passes, you'll have such tables:
first second
a b c a b c
-------- --------
a1 a1 b2 c1
a1 b2 a1 b2 c1
a1 b1 a1 b1 c1
a1 b1 c1 a1 b1 c1
Then, just remove equal rows and you're done:
a b c
--------
a1 b2 c1
a1 b1 c1
And you should get a nice table. O(n^2).
How about compacting all reachable nodes from one node together in one cell ? For example, your first DAG should look like:
a b c
---------------
a1 [b1,b2]
b1 c1
b2 c1
It sounds like a train system map with stations within zones (a,b,c).
You could be generating a table of all possible routes in one direction. In which case "a1, b1, c1" would seem to imply a1->b1 so don't format it like that if you have only a1->c1, b1->c1
You could decide to produce a table by listing the longest routes starting in zone a,
using each edge only once, ending with the short leftover routes. Or allow edges to be reused only if they connect unused edges or extend a route.
In other words, do a depth first search, trying not to reuse edges (reject any path that doesn't include unused edges, and optionally trim used edges at the endpoints).
Here's what I ended up doing:
Find all paths emanating from a node without in-edges. (Could be expensive for some graphs, but works for mine)
Traverse each path to collect a row of values
Compact the rows
Compacting the rows is dones as follows.
For each pair of columns x,y
Construct a map of every value of x to it's possible values of y
Create another map For entries that only have one distinct value of y, mapping the value of x to its single value of y.
Fill in the blanks using these maps. When filling in a value, check for related blanks that can be filled.
This gives a very compact output and seems to meet all my requirements.