When I run Graphviz on a specific graph, I get
aromanov#ws:~/IdeaProjects/scalan$ dot -v -O -Tpng myfile.dot
dot - graphviz version 2.26.3 (20100126.1600)
Activated plugin library: libgvplugin_pango.so.6
Using textlayout: textlayout:cairo
Activated plugin library: libgvplugin_dot_layout.so.6
Using layout: dot:dot_layout
Using render: cairo:cairo
Using device: png:cairo:cairo
The plugin configuration file:
/usr/lib/graphviz/config6
was successfully loaded.
render : cairo dot fig gd map ps svg tk vml vrml xdot
layout : circo dot fdp neato nop nop1 nop2 osage patchwork sfdp twopi
textlayout : textlayout
device : canon cmap cmapx cmapx_np dot eps fig gd gd2 gif gv imap imap_np ismap jpe jpeg jpg pdf plain plain-ext png ps ps2 svg svgz tk vml vmlz vrml wbmp x11 xdot xlib
loadimage : (lib) eps gd gd2 gif jpe jpeg jpg png ps svg
fontname: "Times-Roman" resolved to: (ps:pango Times Roman,) (PangoCairoFcFont) "DejaVu Sans 14"
network simplex: 605 nodes 1434 edges maxiter=2147483647 balance=1
network simplex: 100
network simplex: 605 nodes 1434 edges 131 iter 0.01 sec
mincross: pass 0 iter 0 trying 0 cur_cross 28683 best_cross 28683
mincross: pass 0 iter 1 trying 0 cur_cross 21867 best_cross 21867
mincross: pass 0 iter 2 trying 0 cur_cross 11534 best_cross 11534
mincross: pass 0 iter 3 trying 0 cur_cross 8949 best_cross 8949
mincross: pass 1 iter 0 trying 0 cur_cross 8701 best_cross 6900
mincross: pass 1 iter 1 trying 1 cur_cross 14055 best_cross 6900
mincross: pass 1 iter 2 trying 2 cur_cross 11429 best_cross 6900
mincross: pass 1 iter 3 trying 3 cur_cross 7558 best_cross 6900
mincross: pass 2 iter 0 trying 0 cur_cross 6190 best_cross 6190
mincross: pass 2 iter 1 trying 1 cur_cross 11316 best_cross 6190
mincross: pass 2 iter 2 trying 2 cur_cross 11511 best_cross 6190
mincross: pass 2 iter 3 trying 3 cur_cross 7098 best_cross 6190
mincross: pass 2 iter 4 trying 4 cur_cross 6628 best_cross 6190
mincross: pass 2 iter 5 trying 5 cur_cross 13131 best_cross 6190
mincross: pass 2 iter 6 trying 6 cur_cross 11633 best_cross 6190
mincross: pass 2 iter 7 trying 7 cur_cross 7562 best_cross 6190
mincross: pass 2 iter 8 trying 8 cur_cross 6800 best_cross 6190
merge2: graph G, rank 5 has only 52 < 53 nodes
merge2: graph G, rank 9 has only 82 < 83 nodes
merge2: graph G, rank 30 has only 123 < 124 nodes
merge2: graph G, rank 38 has only 141 < 142 nodes
merge2: graph G, rank 42 has only 148 < 149 nodes
merge2: graph G, rank 59 has only 172 < 173 nodes
merge2: graph G, rank 60 has only 177 < 178 nodes
merge2: graph G, rank 61 has only 179 < 180 nodes
merge2: graph G, rank 62 has only 185 < 187 nodes
merge2: graph G, rank 63 has only 187 < 189 nodes
merge2: graph G, rank 64 has only 188 < 190 nodes
merge2: graph G, rank 65 has only 186 < 188 nodes
merge2: graph G, rank 66 has only 189 < 190 nodes
merge2: graph G, rank 74 has only 207 < 208 nodes
merge2: graph G, rank 80 has only 222 < 223 nodes
merge2: graph G, rank 81 has only 226 < 227 nodes
merge2: graph G, rank 82 has only 226 < 227 nodes
merge2: graph G, rank 83 has only 228 < 229 nodes
merge2: graph G, rank 84 has only 230 < 231 nodes
merge2: graph G, rank 85 has only 232 < 233 nodes
merge2: graph G, rank 86 has only 232 < 233 nodes
merge2: graph G, rank 87 has only 232 < 233 nodes
merge2: graph G, rank 88 has only 236 < 237 nodes
merge2: graph G, rank 89 has only 239 < 240 nodes
merge2: graph G, rank 90 has only 244 < 245 nodes
merge2: graph G, rank 91 has only 246 < 247 nodes
merge2: graph G, rank 118 has only 177 < 178 nodes
mincross G: 6189 crossings, 8.40 secs.
network simplex: 57721 nodes 86837 edges maxiter=2147483647 balance=2
network simplex: 100 200 300 400 500 600 700 800 900 1000
network simplex: 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
network simplex: 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000
network simplex: 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000
network simplex: 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000
network simplex: 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000
And so on. For some reason it is attempting to work with 57721 nodes, when the graph only has 605 nodes (as the beginning of the error says). Is there a way to tell it to stop, perhaps with a worse layout? I've tried other layouts as well; neato and twopi produced a complete mess with everything overlapping, fdp is somewhat better but still very bad, and circo seems to hang as well. Graphviz version is 2.26.3 (which is unfortunately the latest available for Debian stable).
We fixed that. Debian should get the latest release. Version 2.26 is over 4 years old. Try to install it from http://www.graphviz.org/Download_linux_ubuntu.php
Note that the number of nodes reported includes the virtual ("dummy") nodes created to route edges across the levels of a ranked graph. You can get quadratic blowup if the graph has a lot of "long" edges. This is not a bug.
For 605 nodes, I'd suggest neato -Goverlap=false or -Elen=2 or 3 or sfdp (it ignores edge lengths but seems better at avoiding overlaps).
With |E| ~= 3|V| your graph is not necessarily too dense or difficult to lay out.
Stephen North
Related
I have a 2d grid with some of the tiles being obstacles (walls), I want to be able to find the shortest path that allows you to go around the grid being able to see all of the other grids in the map with a radius of view. Here is a pixel art example (Blacks are the obstacles, gray is an arbitrary path).
Set R to be the "radius of view"
Create orthogonal point grid with separation 2 * R
Remove grid points that collide with obstacles
Add connections from each point to the 9 or less closest points. Do not connect across obstacles
Calculate minimum spanning tree of remaining points.
Specify an "obstacle course" with a 20 by 20 gris, a view radius of 2 and three groups of abstacles
20 20 2
4 4
5 5
6 6
14 9
14 10
14 11
14 12
14 13
14 14
14 15
14 16
10 16
11 16
12 16
13 16
4 14
4 15
5 15
It looks like this
Adjacency list ( link id, node1 id, node2 id, distance )
l 42 47 5
l 42 142 5
l 47 52 5
l 47 152 7.07107
l 52 57 5
l 52 147 7.07107
l 52 152 5
l 52 157 7.07107
l 57 152 7.07107
l 57 157 5
l 142 242 5
l 142 247 7.07107
l 147 152 5
l 147 242 7.07107
l 147 247 5
l 147 252 7.07107
l 152 247 7.07107
l 152 252 5
l 157 257 5
l 242 247 5
l 242 342 5
l 247 252 5
l 247 347 5
l 257 357 5
l 342 347 5
l 347 352 5
l 352 357 5
Pass this graph into your favorite graph theory library to get the minimum spanning tree.
Here is the result when I use the PathFinder application
The spanning tree is useful if, for example, the robot needs to return to a base frequently to recharge. However, if your robot does not need to do this, then it involves a lot of unnecessary backtracking.
Calculation of a tour of the nodes that visits all of them with minimal backtracking can be done in various ways ( see travelling salesman problem ). I use depth first searching of the spanning tree plus using the Dijsktra algorithm to find the nearest unvisited node when the robot gets trapped at a leaf of the spanning tree. This quickly gives a reasonably effective result.
Application coded in C++, available at https://github.com/JamesBremner/obstacles
I want to understand engine log of IBM ILOG CPLEX studios for a ILP model. I have checked there documentation also but could not able to get clear idea.
Example of Engine log :
Version identifier: 22.1.0.0 | 2022-03-09 | 1a383f8ce
Legacy callback pi
Tried aggregator 2 times.
MIP Presolve eliminated 139 rows and 37 columns.
MIP Presolve modified 156 coefficients.
Aggregator did 11 substitutions.
Reduced MIP has 286 rows, 533 columns, and 3479 nonzeros.
Reduced MIP has 403 binaries, 0 generals, 0 SOSs, and 129 indicators.
Presolve time = 0.05 sec. (6.16 ticks)
Found incumbent of value 233.000000 after 0.07 sec. (9.40 ticks)
Probing time = 0.00 sec. (1.47 ticks)
Tried aggregator 2 times.
Detecting symmetries...
Aggregator did 2 substitutions.
Reduced MIP has 284 rows, 531 columns, and 3473 nonzeros.
Reduced MIP has 402 binaries, 129 generals, 0 SOSs, and 129 indicators.
Presolve time = 0.01 sec. (2.87 ticks)
Probing time = 0.00 sec. (1.45 ticks)
Clique table members: 69.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 8 threads.
Root relaxation solution time = 0.00 sec. (0.50 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 233.0000 18.0000 92.27%
* 0+ 0 178.0000 18.0000 89.89%
* 0+ 0 39.0000 18.0000 53.85%
0 0 22.3333 117 39.0000 22.3333 4 42.74%
0 0 28.6956 222 39.0000 Cuts: 171 153 26.42%
0 0 31.1543 218 39.0000 Cuts: 123 251 20.12%
0 0 32.1544 226 39.0000 Cuts: 104 360 17.55%
0 0 32.6832 212 39.0000 Cuts: 102 456 16.20%
0 0 33.1524 190 39.0000 Cuts: 65 521 14.99%
Detecting symmetries...
0 0 33.3350 188 39.0000 Cuts: 66 566 14.53%
0 0 33.4914 200 39.0000 Cuts: 55 614 14.12%
0 0 33.6315 197 39.0000 Cuts: 47 673 13.77%
0 0 33.6500 207 39.0000 Cuts: 61 787 13.72%
0 0 33.7989 206 39.0000 Cuts: 91 882 13.34%
* 0+ 0 38.0000 33.7989 11.06%
0 0 33.9781 209 38.0000 Cuts: 74 989 10.58%
0 0 34.0074 209 38.0000 Cuts: 65 1043 10.51%
0 0 34.2041 220 38.0000 Cuts: 63 1124 9.99%
0 0 34.2594 211 38.0000 Cuts: 96 1210 9.84%
0 0 34.3032 216 38.0000 Cuts: 86 1274 9.73%
0 0 34.3411 211 38.0000 Cuts: 114 1353 9.63%
0 0 34.3420 220 38.0000 Cuts: 82 1402 9.63%
0 0 34.3709 218 38.0000 Cuts: 80 1462 9.55%
0 0 34.4494 228 38.0000 Cuts: 87 1530 9.34%
0 0 34.4882 229 38.0000 Cuts: 97 1616 9.24%
0 0 34.5173 217 38.0000 Cuts: 72 1663 9.16%
0 0 34.5545 194 38.0000 Cuts: 67 1731 9.07%
0 0 34.5918 194 38.0000 Cuts: 76 1786 8.97%
0 0 34.6094 199 38.0000 Cuts: 73 1840 8.92%
0 0 34.6226 206 38.0000 Cuts: 77 1883 8.89%
0 0 34.6421 206 38.0000 Cuts: 53 1928 8.84%
0 0 34.6427 213 38.0000 Cuts: 84 1982 8.83%
Detecting symmetries...
0 2 34.6427 213 38.0000 34.6478 1982 8.82%
Elapsed time = 0.44 sec. (235.86 ticks, tree = 0.02 MB, solutions = 4)
GUB cover cuts applied: 32
Cover cuts applied: 328
Implied bound cuts applied: 205
Flow cuts applied: 11
Mixed integer rounding cuts applied: 17
Zero-half cuts applied: 35
Gomory fractional cuts applied: 1
Root node processing (before b&c):
Real time = 0.43 sec. (235.61 ticks)
Parallel b&c, 8 threads:
Real time = 0.27 sec. (234.23 ticks)
Sync time (average) = 0.11 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 0.71 sec. (469.84 ticks)
Mainly I want to understand what are nodes,left,gap,root node processing, parallel b&c.
I hope anyone of you will give a resource or explain it clearly so that it can be helpful when someone starts using IBM ILOG CPLEX studio in future
Thanks a lot in advance
I am expecting for someone to fill knowledge gaps regarding Engine log of IBMs ILOG CPLEX studio
I recommend
Progress reports: interpreting the node log
https://www.ibm.com/docs/en/icos/12.8.0.0?topic=mip-progress-reports-interpreting-node-log
I downloaded Stanford NLP 3.5.2 and run sentiment analysis with default configuration (i.e. I did not change anything, just unzip and run).
java -cp "*" edu.stanford.nlp.sentiment.Evaluate -model edu/stanford/nlp/models/sentiment/sentiment.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
66258 correct
16342 incorrect
0.802155 accuracy
Tested 2210 roots
976 correct
1234 incorrect
0.441629 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 323 161 27 3 3 517
1 1294 5498 2245 652 148 9837
2 292 2993 51972 2868 282 58407
3 99 602 2283 7247 2140 12371
4 0 1 21 228 1218 1468
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.62476, recall=0.16086, spec=0.99759, f1=0.25584
1 prec=0.55891, recall=0.59406, spec=0.94084, f1=0.57595
2 prec=0.88982, recall=0.91908, spec=0.75299, f1=0.90421
3 prec=0.58581, recall=0.65894, spec=0.92844, f1=0.62022
4 prec=0.8297, recall=0.32129, spec=0.99683, f1=0.46321
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 44 39 9 0 0 92
1 193 451 190 131 36 1001
2 23 62 82 30 8 205
3 19 81 101 299 255 755
4 0 0 7 50 100 157
Marg. (Gold) 279 633 389 510 399
0 prec=0.47826, recall=0.15771, spec=0.97514, f1=0.2372
1 prec=0.45055, recall=0.71248, spec=0.65124, f1=0.55202
2 prec=0.4, recall=0.2108, spec=0.93245, f1=0.27609
3 prec=0.39603, recall=0.58627, spec=0.73176, f1=0.47273
4 prec=0.63694, recall=0.25063, spec=0.96853, f1=0.35971
Approximate Negative label accuracy: 0.646009
Approximate Positive label accuracy: 0.732504
Combined approximate label accuracy: 0.695110
Approximate Negative root label accuracy: 0.797149
Approximate Positive root label accuracy: 0.774477
Combined approximate root label accuracy: 0.785832
The test.txt file is downloaded from http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip (contains train.txt, dev.txt and test.txt). The download link is get from http://nlp.stanford.edu/sentiment/code.html
However, in the paper "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)." which sentiment analysis tool is based on, the authors reported that the accuracy when classify 5 classes is 0.807.
Is my results I obtained normal?
I get the same results when I run it out of the box. It would not surprise me if the version of their system they made for Stanford CoreNLP differs slightly from the version in the paper.
Alright. Now this question is pretty hard. I am going to give you an example.
Now the left numbers are my algorithm classification and the right numbers are the original class numbers
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 89
177 89
177 89
177 89
177 89
177 89
177 89
So here my algorithm merged 2 different classes into 1. As you can see it merged class 86 and 89 into one class. So what would be the error at the above example ?
Or here another example
203 7
203 7
203 7
203 7
16 7
203 7
17 7
16 7
203 7
At the above example left numbers are my algorithm classification and the right numbers are original class ids. As can be seen above it miss classified 3 products (i am classifying same commercial products). So at this example what would be the error rate? How would you calculate.
This question is pretty hard and complex. We have finished the classification but we are not able to find correct algorithm for calculating success rate :D
Here's a longish example, a real confuson matrix with 10 input classes "0" - "9"
(handwritten digits),
and 10 output clusters labelled A - J.
Confusion matrix for 5620 optdigits:
True 0 - 9 down, clusters A - J across
-----------------------------------------------------
A B C D E F G H I J
-----------------------------------------------------
0: 2 4 1 546 1
1: 71 249 11 1 6 228 5
2: 13 5 64 1 13 1 460
3: 29 2 507 20 5 9
4: 33 483 4 38 5 3 2
5: 1 1 2 58 3 480 13
6: 2 1 2 294 1 1 257
7: 1 5 1 546 6 7
8: 415 15 2 5 3 12 13 87 2
9: 46 72 2 357 35 1 47 2
----------------------------------------------------
580 383 496 1002 307 670 549 557 810 266 estimates in each cluster
y class sizes: [554 571 557 572 568 558 558 566 554 562]
kmeans cluster sizes: [ 580 383 496 1002 307 670 549 557 810 266]
For example, cluster A has 580 data points, 415 of which are "8"s;
cluster B has 383 data points, 249 of which are "1"s; and so on.
The problem is that the output classes are scrambled, permuted;
they correspond in this order, with counts:
A B C D E F G H I J
8 1 4 3 6 7 0 5 2 6
415 249 483 507 294 546 546 480 460 257
One could say that the "success rate" is
75 % = (415 + 249 + 483 + 507 + 294 + 546 + 546 + 480 + 460 + 257) / 5620
but this throws away useful information —
here, that E and J both say "6", and no cluster says "9".
So, add up the biggest numbers in each column of the confusion matrix
and divide by the total.
But, how to count overlapping / missing clusters,
like the 2 "6"s, no "9"s here ?
I don't know of a commonly agreed-upon way
(doubt that the Hungarian algorithm
is used in practice).
Bottom line: don't throw away information; look at the whole confusion matrix.
NB such a "success rate" will be optimistic for new data !
It's customary to split the data into say 2/3 "training set" and 1/3 "test set",
train e.g. k-means on the 2/3 alone,
then measure confusion / success rate on the test set — generally worse than on the training set alone.
Much more can be said; see e.g.
Cross-validation.
You have to define the error criteria if you want to evaluate the performance of an algorithm, so I'm not sure exactly what you're asking. In some clustering and machine learning algorithms you define the error metric and it minimizes it.
Take a look at this
https://en.wikipedia.org/wiki/Confusion_matrix
to get some ideas
You have to define a error metric to measure yourself. In your case, a simple method should be to find the properties mapping of your product as
p = properties(id)
where id is the product id, and p is likely be a vector with each entry of different properties. Then you can define the error function e (or distance) between two products as
e = d(p1, p2)
Sure, each properties must be evaluated to a number in this function. Then this error function can be used in the classification algorithm and learning.
In your second example, it seems that you treat the pair (203 7) as successful classification, so I think you have already a metric yourself. You may be more specific to get better answer.
Classification Error Rate(CER) is 1 - Purity (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
ClusterPurity <- function(clusters, classes) {
sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}
Code of #john-colby
Or
CER <- function(clusters, classes) {
1- sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}
Note: I am still looking for a fast solution. Two of the solutions below are wrong and the third one is terribly slow.
I have N toys from 1....N. Each toy has an associated cost with it. You have to go on a shopping spree such that on a particular day, if you buy toy i, then the next toy you can buy on the same day should be i+1 or greater. Moreover, the absolute cost difference between any two consecutively bought toys should be greater than or equal to k. What is the minimum number of days can I buy all the toys.
I tried a greedy approach by starting with toy 1 first and then seeing how many toys can I buy on day 1. Then, I find the smallest i that I have not bought and start again from there.
Example:
Toys : 1 2 3 4
Cost : 5 4 10 15
let k be 5
On day 1, buy 1,3, and 4
on day 2, buy toy 2
Thus, I can buy all toys in 2 days
Note greedy not work for below example: N = 151 and k = 42
the costs of the toys 1...N in that order are :
383 453 942 43 27 308 252 721 926 116 607 200 195 898 568 426 185 604 739 476 354 533 515 244 484 38 734 706 608 136 99 991 589 392 33 615 700 636 687 625 104 293 176 298 542 743 75 726 698 813 201 403 345 715 646 180 105 732 237 712 867 335 54 455 727 439 421 778 426 107 402 529 751 929 178 292 24 253 369 721 65 570 124 762 636 121 941 92 852 178 156 719 864 209 525 942 999 298 719 425 756 472 953 507 401 131 150 424 383 519 496 799 440 971 560 427 92 853 519 295 382 674 365 245 234 890 187 233 539 257 9 294 729 313 152 481 443 302 256 177 820 751 328 611 722 887 37 165 739 555 811
You can find the optimal solution by solving the asymmetric Travelling Salesman.
Consider each toy as a node, and build the complete directed graph (that is, add an edge between each pair of nodes). The edge has cost 1 (has to continue on next day) if the index is smaller or the cost of the target node is less than 5 plus the cost of the source node, and 0 otherwise. Now find the shortest path covering this graph without visiting a node twice - i.e., solve the Travelling Salesman.
This idea is not very fast (it is in NP), but should quickly give you a reference implementation.
This is not as difficult as ATSP. All you need to do is look for increasing subsequences.
Being a mathematician, the way I would solve the problem is to apply RSK to get a pair of Young tableaux, then the answer for how many days is the height of the tableau and the rows of the second tableau tell you what to purchase on which day.
The idea is to do Schensted insertion on the cost sequence c. For the example you gave, c = (5, 4, 10, 15), the insertion goes like this:
Step 1: Insert c[1] = 5
P = 5
Step 2: Insert c[2] = 4
5
P = 4
Step 3: Insert c[3] = 10
5
P = 4 10
Step 4: Insert c[4] = 15
5
P = 4 10 15
The idea is that you insert the entries of c into P one at a time. When inserting c[i] into row j:
if c[i] is bigger than the largest element in the row, add it to the end of the row;
otherwise, find the leftmost entry in row j that is larger than c[i], call it k, and replace k with c[i] then insert k into row j+1.
P is an array where the lengths of the rows are weakly decreasing and The entries in each of row P (these are the costs) weakly increase. The number of rows is the number of days it will take.
For a more elaborate example (made by generating 9 random numbers)
1 2 3 4 5 6 7 8 9
c = [ 5 4 16 7 11 4 13 6 5]
16
7
5 6 11
P = 4 4 5 13
So the best possible solution takes 4 days, buying 4 items on day 1, 3 on day 2, 1 on day 3, and 1 on day 4.
To handle the additional constraint that consecutive costs must increase by at least k involves redefining the (partial) order on costs. Say that c[i] <k< c[j] if and only if c[j]-c[i] >= k in the usual ordering on numbers. The above algorithm works for partial orders as well as total orders.
I somewhat feel that a greedy approach would give a fairly good result.
I think your approach is not optimal just because you always pick toy 1 to start while you should really pick the least expensive toy. Doing so would give you the most room to move to the next toy.
Each move being the least expensive one, it is just DFS problem where you always follow the least expensive path constrained by k.