Is it possible to collapse multiple paths like shown in the picture with graphviz? And if so any suggestions on how this could be achieved?
I'm used to the weeds.
Comment out unwanted nodes and edges and add 3 nodes with shape=point. You might also create invisible edges to/from the new nodes, I did not.
This:
digraph ellipsis {
rankdir=RL
nodesep=.1
node [shape=circle]
{
rank=same
B1
B2
B3
/* comment out unwanted nodes
B4
B5
...
*/
e4 [shape=point]
e5 [shape=point]
e6 [shape=point]
B99
edge [style=invis]
B1 -> B2 -> B3 -> e4 -> e5 -> e6 -> B99
}
C -> B1
C -> B2
C -> B3
/* comment out unwanted edges
C -> B4
*/
C -> B99
B1 -> A
B2 -> A
B3 -> A
/* comment out more unwanted edges
B4 -> A
*/
B99 -> A
}
Produces this:
Related
Given a tree of nodes find a rooted sub-tree that contains a set of predefined values. The nodes in the tree are unique but their associated values may be repeated.
Ideally the most shallow sub-tree is returned. The sub-tree may also be returned simply as an array of nodes (or their unique IDs).
Here are several atomic test cases with the ideal resulting sub-tree:
#1 [A, A, B, C]
A1 Answer: A1
/ | \ / \
B1 D1 C1 B1 C1
/ \ \ /
A2 B2 B3 A2
#2 [A, B, B, A]
A1 Answer(s): A1 A1 A1 the first solution is preferred
/ \ / \ / \ as it is most shallow
B1 B2 B1 B2 B1 B2
/ \ \ / / \ \
A2 B3 B4 A2 A2 B3 B4
\ \
A3 A3
#3 [A, A, B, C]
A1 Answer: Not possible as only one B can be matched
/ \
B1 B2
/ \
A2 C1
#4 [B, B]
A1 Answer: Not possible as the root 'A' is not in the set
/ \
B1 B2
My approach was broken into two steps:
Breadth-first scan until all nodes in the set are found returning a tree that definitely contains the desired sub-tree.
Use backtracking to search the resulting sub-tree (essentially all permutations of the sub-tree) to find the exact nodes that satisfy the set.
However, this solution is not very efficient. It seems like I should be able to find the desired sub-tree simply by using a modified breadth-first search. I've also been unable to make this work in practice.
I have problem with clusterization of clients.
I have a dataset with columns such as name, address, email, phone, etc. (in a example A,B,C). Each row has unique identifier (ID). I need to assign CLUSTER_ID (X) to each row. In one cluster all rows have one or more the same attributes as other rows. So clients with ID=1,2,3 have the same A attribute and clients with ID=3,10 have the same B attribute then ID=1,2,3,10 should be in the same cluster.
How can I solve this problem using SQL?
If it's not possible how to write the algorithm (pseudocode)?
The performance is very important, because the dataset contains milions of rows.
Sample Input:
ID A B C
1 A1 B3 C1
2 A1 B2 C5
3 A1 B10 C10
4 A2 B1 C5
5 A2 B8 C1
6 A3 B1 C4
7 A4 B6 C3
8 A4 B3 C5
9 A5 B7 C2
10 A6 B10 C3
11 A8 B5 C4
Sample Output:
ID A B C X
1 A1 B3 C1 1
2 A1 B2 C5 1
3 A1 B10 C10 1
4 A2 B1 C5 1
5 A2 B8 C1 1
6 A3 B1 C4 1
7 A4 B6 C3 1
8 A4 B3 C5 1
9 A5 B7 C2 2
10 A6 B10 C3 1
11 A8 B5 C4 1
Thanks for any help.
A possible way is by repeating updates for the empty X.
Start with cluster_id 1.
F.e. by using a variable.
SET #CurrentClusterID = 1
Take the top 1 record, and update it's X to 1.
Now loop an update for all records with an empty X,
and that can be linked to a record with X = 1 and that has the same A or B or C
Disclaimer:
The statement will vary depending on the RDBMS.
This is just intended as pseudo-code.
WHILE (<<some check to see if there were records updated>>)
BEGIN
UPDATE yourtable t
SET t.X = #CurrentClusterID
WHERE t.X IS NULL
AND EXISTS (
SELECT 1 FROM yourtable d
WHERE d.X = #CurrentClusterID
AND (d.A = t.A OR d.B = t.B OR d.C = t.C)
);
END
Loop that till it updates 0 records.
Now repeat the method for the other clusters, till there are no more empty X in the table.
1) Increase the #CurrentClusterID by 1
2) Update the next top 1 record with an empty X to the new #CurrentClusterID
3) Loop the update till no-more updates were done.
An example test on db<>fiddle here for MS Sql Server.
Task
I want to calculate the permanent P of a NxN matrix for N up to 100. I can make use of the fact that the matrix features only M=4 (or slightly more) different rows and cols. The matrix might look like
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
... | r1 identical rows
A1 ... A1 B1 ... B1 C1 ... C1 D1 ... D1 |
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
...
A2 ... A2 B2 ... B2 C2 ... C2 D2 ... D2
A3 ... A3 B3 ... B2 C2 ... C2 D2 ... D2
...
A3 ... A3 B3 ... B3 C3 ... C3 D3 ... D3
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
...
A4 ... A4 B4 ... B4 C4 ... C4 D4 ... D4
---------
c1 identical cols
and c and r are the multiplicities of cols and rows. All values in the matrix are laying between 0 and 1 and are encoded as double precision floating-point numbers.
Algorithm
I tried to use the Ryser formula to calculate the permanent. For the formula, one needs to first calculate the sum of each row and multiply all the row sums. For the matrix above this yields
S0 = (c1 * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* (c1 * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
As a next step the same is done with col 1 deleted
S1 = ((c1-1) * A1 + c2 * B1 + c3 * C1 + c4 * D1)^r1 * ...
* ((c1-1) * A4 + c2 * B4 + c3 * C4 + c4 * D4)^r4
and this number is subtracted from S0.
The algorithm continues with all possible ways to delete single and group of cols and the products of the row sums of the remaining matrix are added (even number of cols deleted) and subtracted (odd number of cols deleted).
The task can be solved relative efficiently if one makes use of the identical cols (for example the result S1 will pop up exactly c1 times).
Problem
Even if the final result is small the values of the intermediate results S0, S1, ... can reach values up to N^N. A double can hold this number but the absolute precision for such big numbers is below or on the order of the expected overall result. The expected result P is on the order of c1!*c2!*c3!*c4! (actually I am interested in P/(c1!*c2!*c3!*c4!) which should lay between 0 and 1).
I tried to arrange the additions and subtractions of the values S in a way that the sums of the intermediate results are around 0. This helps in the sense that I can avoid intermediate results that are exceeding N^N, but this improves things only a little bit. I also thought about using logarithms for the intermediate results to keep the absolute numbers down - but the relative accuracy of the encoded numbers will be still bounded by the encoding as floating point number and I think I will run into the same problem. If possible, I want to avoid the usage of data types that are implementing a variable-precision arithmetic for performance reasons (currently I am using matlab).
I want, for example, for Mathematica to generate 7 + 5f if I write the expression (2+f) (3+f). I always want f^2 to be computed as 1 (or any other value I assign to it) but for f to be a special undefined symbol. If I define f^2:=1 I get a Tag Power is protected error message.
I am a Mathematica newbie, self taught, so please try to answer this in as elementary fashion as possible.
For the record, I am trying to define Clifford algebra operations in n-dimensional space-time and being able to make an assignment like this would tremendously simplify the task.
Generalized to all symbols e1,e2,e3,...,en
x = (a + a1 e1 + a2 e2 + a3 e3 + a4 e1 e2 - a5 e1 e3 + a6 e2 e3 +
a7 e1 e2 e3);
y = (b + b1 e1 + b2 e2 + b3 e3 + b4 e1 e2 - b5 e1 e3 + b6 e2 e3 +
b7 e1 e2 e3);
ReplaceAll[
Expand[x y],
Power[e_, 2] /; First[Characters[ToString[e]]] === "e" -> 1
]
This way which I have just learned from #Edmund is more elegant:
Expand[(2 + e1)(3 + e2)] /.Power[s_Symbol,2]/; StringStartsQ["e"]#SymbolName[s]->1
6 + 3 e1 + 2 e2 + e1 e2
ReplaceAll[Expand[(2 + f) (3 + f)], Power[f, 2] -> 1]
7 + 5 f
I have following tree:
digraph G {
subgraph cluster0{
37[label="+"];
42[label="a"];
44[label="b"];
47[label="*"];
46[label="c"];
49[label="d"];
51[label="e"];
53[label="f"];
55[label="g"];
57[label="h"];
61[label="*"];
60[label="i"];
63[label="j"];
37 -> 42[label="c"];
37 -> 44[label="c"];
37 -> 47[label="c"];
37 -> 61[label="c"];
42 -> 37[label="p"];
44 -> 37[label="p"];
47 -> 37[label="p"];
47 -> 46[label="c"];
47 -> 49[label="c"];
47 -> 51[label="c"];
47 -> 53[label="c"];
47 -> 55[label="c"];
47 -> 57[label="c"];
46 -> 47[label="p"];
49 -> 47[label="p"];
51 -> 47[label="p"];
53 -> 47[label="p"];
55 -> 47[label="p"];
57 -> 47[label="p"];
61 -> 37[label="p"];
61 -> 60[label="c"];
61 -> 63[label="c"];
60 -> 61[label="p"];
63 -> 61[label="p"];
}
}
Output is here: http://i.imgur.com/q1qXkCT.png
Order of children in first * subtree is: G H C D E F, but it should be C D E F G H.
I have noticed that if I delete subgraph cluster0{ the order is right, but I can't do it this way.
Can you suggest any other solution?
Graphviz attempts to retain the lexical ordering of nodes when there is no other constraint. However, the edge labels can affect placement as they take up space and can push nodes around.
If you have a specific order that is essential, then try something like
{ rank = same;
46 -> 49 -> 51 -> 53 -> 55 -> 57 [style = invis];
}
to introduce the additional ordering constraint into the graph.
You need to be careful with this though, as it can distort more complex graphs in ways that are very difficult to predict.
Clusters further complicate matters in larger graphs as they implicitly attempt to make the subgraph more compact and introduce a bounding box that non-cluster members cannot cross.