Is there a way to calculate how long a dot graph will take in graphviz?

Is there a way to calculate how long a dot graph will take in graphviz? - graphviz

Is there any way to calculate how long a dot graph will likely take to render?
I am currently running with 25,000 nodes and 42,000 edges. It's taking some hours and I'd like to know if to give up or get an estimated time when it completes.

Short answer: nope.
Longer answer: There are things you can do to better understand what dot is doing and make it run faster.
Adding -v4 (https://www.graphviz.org/doc/info/command.html#-v) to the command line will (help) show progress (not well documented) (4 is arbitrary)
There are 5 or more attributes that you can use to improve performance (mclimit, nslimit, nslimit1, remincross and searchsize) (again, not well documented) As a guess, try setting nslimit and nslimit1 both to 2.
Here are some links to more performance info:
(Linux:) Logging w/ timestamp: https://forum.graphviz.org/t/how-to-timestamp-dot-fdp-neato-twopi-circo-v-output/654
https://forum.graphviz.org/t/where-does-generating-the-graph-take-most-of-the-time/668/3
https://forum.graphviz.org/t/dot-command-seems-to-never-end/958/4
You might be having a "footprint" problem with all those nodes (Too many square inches of node space) Minimally, set output format to svg.
Some "innocent" attributes can also be killers, like splines=ortho (https://forum.graphviz.org/t/creating-a-dot-graph-with-thousands-of-nodes/1092)
(future) https://gitlab.com/graphviz/graphviz/-/issues/2135
25,000 nodes is a fair number of nodes, but 42,000 edges is not that many edges/node
With that many nodes, you might also try some of the other Graphviz engines (neato, fdp, circo, twopi) Often, one or more will run much faster.

Related

Find best producrion order

I run embroidery machines that make custom bags for whole sale. We have two and each can hold up to 12 colors of thread. One of the most time intensive tasks is changing threads. Normal I pick the next patten that has all or some of the same colors as the last.
If I had a sheet with every design and all the needed thread colors by number they require how would I even begin to write a program to order the list to show the best order to make the bags to minimize thread changes.
I would think it would need to see how many patterns had overlapping colors and what colors are used more and what don't.
I don't even know if this cns be done or even how I would code it.
Some advice would be welcome.
EDIT:
So a little more info to help make things more clear. When we get an order it will be a list of designs and what kind of bags those will go on. Each design had between 1 and 7 colors in it. Some colors are shared between designs. I can easily have a reffrence sheet listing the colors needed for each patten. What I am trying to figure out is how do I evaluate all the designs in an order to find the best order to produce them with the most overlap of colors to minimize change over.

This is basically Multiple Travelling Salesman problem, but instead of two salesmen travel time to cities, you have thread-changing time of two machines. Instead of having to visit each city, each pattern needs to be produced. The cost ("distance") function is the time needed for the switch between two patterns.
For a more realistic example, I think you should actually have the cost be the total time for a pattern, starting from switching threads, but also including the embroidery time for the pattern batch, otherwise you can end up with the thread changing times being balanced, but one machine could get all the jobs for 3 bags each, and the other all the jobs with 1000 bags each, if the embroidery time is not calculated into the cost function.
For example, I believe you can adapt this solution. The only difference is that you would be starting with a distance matrix (by calculating the cost function for each pair of patterns), you would not need to calculate it from coordinates (because you don't have coordinates).

FlowField Pathfinding on Large RTS Maps

When building a large map RTS game, my team are experiencing some performance issues regarding pathfinding.
A* is obviously inefficient due to not only janky path finding, but processing costs for large groups of units moving at the same time.
After research, the obvious solution would be to use FlowField pathfinding, the industry standard for RTS games as it stands.
The issue we are now having after creating the base algorithm is that the map is quite large requiring a grid of around 766 x 485. This creates a noticeable processing freeze or lag when computing the flowfield for the units to follow.
Has anybody experienced this before or have any solutions on how to make the flowfields more efficient? I have tried the following:
Adding flowfields to a list when it is created and referencing later (Works once it has been created, but obviously lags on creation.)
Processing flowfields before the game is started and referencing the list (Due to the sheer amount of cells, this simply doesn't work.)
Creating a grid based upon the distance between the furthest selected unit and the destination point (Works for short distances, not if moving from one end of the map to the other).
I was thinking about maybe splitting up the map into multiple flowfields, but I'm trying to work out how I would make them move from field to field.
Any advice on this?
Thanks in advance!

Maybe this is a bit late answer. Since you have mentioned that this is a large RTS game, then the computation should not be limited to one CPU core. There are a few advice for you to use flowfield more efficiently.
Use multithreads to compute new flow fields for each unit moving command
Group units, so that all units in same command group share the same flowfield
Partition the flowfield grids, so you only have to update any partition that had modification in path (new building/ moving units)
Pre baked flowfields grid slot cost:you prebake basic costs of the grids(based on environments or other static values that won't change during the game).
Divide, e.g. you have 766 x 485 map, set it as 800 * 500, divide it into 100 * 80 * 50 partitions as stated in advice 3.
You have a grid of 10 * 10 = 100 slots, create a directed graph (https://en.wikipedia.org/wiki/Graph_theory) using the a initial flowfield map (without considering any game units), and use A* algorihtm to search the flowfield grid before the game begins, so that you know all the connections between partitions.
For each new flowfield, Build flowfield only with partitions marked by a simple A* search in the graph. And then use alternative route if one node of the route given by A* is totally blocked from reaching the next one (mark the node as blocked and do A* again in this graph)
6.Cache, save flowfield result from step.5 for further usage (same unit spawning from home and going to the enemy base. Same partition routes. Invalidate cache if there is any change in the path, and only invalidate the cache of the changed partition first, check if this partition still connects to other sides, then only minor change will be made within the partition only)
Runtime late updating the units' command. If the map is large enough. Move the units immediately to the next partition without using the flowfield, (use A* first to search the 10*10 graph to get the next partition). And during this time of movement, in the background, build the flowfield using previous step 1-6. (in fact you only need few milliseconds to do the calculation if optimized properly, then the units changes their route accordingly. Most of the time there is no difference and player won't notice a thing. In the worst case, where we finally have to search all patitions to get the only possible route, it is only the first time there will be any delay, and the cache will minimise the time since it is the only way and the cache will be used repeatitively)
Re-do the build process above every once per few seconds for each command group (in the background), just in case anything changes in the middle of the way.
I could get this working with much larger random map (2000*2000) with no fps drop at all.
Hope this helps anyone in the future.

Statistics/Algorithm: How do I compare a weekly graph with its own history to see when in the past it was almost the same?

I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!

I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.

GraphViz Dot very long duration of generation

I have a tree structure I want to be generated by Dot. Each node has 4 edges to another 4 nodes. In sum there are about 1,000 nodes. If I try to generate it with Dot it takes a very long time (once I let it work like for a hour - CPU usage was 100% all the time but it didn't finish). Is there a way to accelerate this? Maybe by setting down the quality? Or using another (faster?) vizualization software? I've attached my Dot file for you to test it on your own machine.
Thank you.
Dot File: http://lh.rs/3fmsfjmbvRw2

chk this link,laying out a large graph with graphviz
sfdp -x -Goverlap=scale -Tpng data.dot > data.png

You may want to try setting the nslimit or nslimit1 attributes as mentioned here:
https://web.archive.org/web/20170421065851/http://www.graphviz.org:80/content/dot-performance-issues
https://graphviz.org/doc/info/attrs.html#d:nslimit (or the original:
https://web.archive.org/web/20170421065851/http://www.graphviz.org:80/content/attrs#dnslimit)
You may also tune the maxiter, mclimit and splines attributes, especially splines=line gave me a huge speedup (albeit being somewhat ugly).
Also, as E-man suggested, dot is really slower than e.g. circo or twopi, so you may consider using one of those, if they look OK for your graph.

Importing a large .dot file into Gephi - https://gephi.org is really fast.

Graph plotting: only keeping most relevant data

In order to save bandwith and so as to not to have generate pictures/graphs ourselves I plan on using Google's charting API:
http://code.google.com/apis/chart/
which works by simply issuing a (potentially long) GET (or a POST) and then Google generate and serve the graph themselves.
As of now I've got graphs made of about two thousands entries and I'd like to trim this down to some arbitrary number of entries (e.g. by keeping only 50% of the original entries, or 10% of the original entries).
How can I decide which entries I should keep so as to have my new graph the closest to the original graph?
Is this some kind of curve-fitting problem?
Note that I know that I can do POST to Google's chart API with up to 16K of data and this may be enough for my needs, but I'm still curious

The flot-downsample plugin for the Flot JavaScript graphing library could do what you are looking for, up to a point.
The purpose is to try retain the visual characteristics of the original line using considerably fewer data points.
The research behind this algorithm is documented in the author's thesis.
Note that it doesn't work for any kind of series, and won't give meaningful results when you want a downsampling factor beyond 10, in my experience.
The problem is that it cuts the series in windows of equal sizes then keep one point per window. Since you may have denser data in some windows than others the result is not necessarily optimal. But it's efficient (runs in linear time).

What you are looking to do is known as downsampling or decimation. Essentially you filter the data and then drop N - 1 out of every N samples (decimation or down-sampling by factor of N). A crude filter is just taking a local moving average. E.g. if you want to decimate by a factor of N = 10 then replace every 10 points by the average of those 10 points.
Note that with the above scheme you may lose some high frequency data from your plot (since you are effectively low pass filtering the data) - if it's important to see short term variability then an alternative approach is to plot every N points as a single vertical bar which represents the range (i.e. min..max) of those N points.

Graph (time series data) summarization is a very hard problem. It's like deciding, in a text, what is the "relevant" part to keep in an automatic summarization of it. I suggest you use one of the most respected libraries for finding "patterns of interest" in time series data by Eamonn Keogh

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio