How can I visually compare refactoring options? - d3.js

I have an existing system model, representing services calling APIs backed by one or more implementations in perhaps other services, in the form of a DAG. I've been rendering this by GraphViz dot files' digraph which is fairly reasonable, if sometimes difficult to enforce layering.
While considering various possible refactorings of services and APIs, I'd like to be able to chart alternative routes towards an end goal. Each refactoring step would yield a different DAG – represented in terms of diffs from the previous DAG (e.g., convert service A's use of API x in service B to API y in service C) – and similarly renderable.
What tools are there to be able to create such "refactoring paths" and then visualize flows between them, determine dependencies and parallelism? Extra bonus points for goal seeking (e.g., no dependencies from any service other than service A on service C; cheapest path based on weights) and providing a loose ordering of refactorings that demonstrate their (presumably) monotonically increasing system value.
I am picturing two UI components:
a DAG diff that shows visually the nodes/vectors that got replaced in the source graph with nodes/vectors in the second
a controlling display that acts like D3's force layout that lets you navigate the loosely connected DAG of refactorings and select which refactoring you'd like to see the before/after picture in the DAG diff.
That said, I'm totally up to other tooling, formats, etc. Just would like to be able to produce these and display them to other people as to why what we're doing is valuable (goal assertion) and taking as long as it does (dependencies and Gantt charts).

Would it be feasible for you to create a separate tool that, given two similar DAGs, outputs the "merge" of them? If that's possible, then visualizing the merge DAG will probably tell you a lot about both DAGs. You can color-code the nodes by whether they appear in both DAGs or in either.
That's how we originally designed the visual diffs of workflow graphs in VisTrails, see here.
If you insist on showing the two DAGs side by side, creating the merge DAG might still be the right idea, because then dot can lay the merge graph out, and you can simply hide the appropriate nodes for each subgraph. In this way, the shared structure will be laid out consistently by construction.

Related

Is it possible to rearrange nodes in the NDepend dependency graph?

If a graph is simple, it's easy enough to just look through. For more complex graphs though, it's hard to make sense of if it's not arranged in a way that resembles how developers conceptualize a class or method hierarchy. Understandably, NDepend wouldn't be able to do this automatically.
Can I move graph nodes around by hand? Or alternatively, is there another program that I can export the graph to and rearrange the nodes there?
No so far NDepend' graph' nodes cannot be re-arranged and cannot be exported to another tool that lets achieve this. Did you try to tick or untick the cluster settings that will provoque re-arranging complex graph or sub-graph?
Same if you tinker with layout settings:
Also a screenshot of a graph that is not in a way that resembles how developers conceptualize a class or method hierarchy would be welcome.

Representing parallel operations in flowcharts

I need to do a flowchart of a hydraulic system featuring a temperature regulation module. However, temperature regulation is only activated during one part of the cycle. During this part, the system continues to perform other operations.
I want to represent that in my flowchart diagram. I need a way to tell when the parallel algorithm begins and when it ends within the main flowchart. How do I do that ?
Add two new flowchart nodes/operators fork and join.
Fork takes one control flow coming in, and produces two going out, similar to the diamond decision node in regular flowcharts. Unlike the decision node, the fork node passes control to both of its children, meaning that "parallel execution starts here".
Join takes two control flows in, and produces one control flow out. It means,
"wait for execution on both input branches to complete, then pass control to the output branch". This means "parallel execution stops here".
Now your flowchart can have serial and parallel sections; you can even have the parallel sections generate additional, nested parallelism.
Now you need a graphical representation of fork and join.
We represent the control flow graph of real computer programs with essentially a flowchart ("actions" and "decisions"). Because of the similarity of "fork" to "decision" (one input, two outputs) we chose to draw "fork" as an upward facing triangle (one input at the top, two outputs at the triangle base endpoints), and "join" as a downward facing triangle, with two inputs at the triangle base endpoints and one output at the peak of the downward facing triangle.
You can see automatically generated examples of this for C and COBOL programs. You might not think these contain parallelism... but, in fact, the langauge semantics of many languages is often unspecified, allowing parallel execution in effect (C function arguments according to the standard can be evaluated in arbitrary order, for example). We model this nondeterminism as "parallelism" because the net effect is the same.
People building industrial control software also need to express this. They
simply split the control flow line going from one flow graph item to another. See Sequential function charts in this document.
The most general "control flow" notation I know are Colored Petri Nets. These can model not just control flow, but data flow, synchronization, and arithmetic, which means you can model systems with very complex flows. You can model a regular flowchart directly with a CPN. CPNs also generalize finite state machines. What that means for programmers, is that if you don't know about CPNs, you should learn about these now. And you discover that "flowcharts" (as CPNs) are now useful again in discussing systems whose parts run asynchronously.

Algorithm for slicing a dynamic graph

I am currently working on a project based on graph and I am searching for an algorithm for slicing an dynamic graph. I have already done some research but most algorithms that I have found works only for a static graph. In my environment, the graph is dynamic, it means that users add/delete elements, create/delete dependences at runtime.
(In reality I am working with UML models but UML models can be also represented by typed graphs, wich are composed of typed Vertices and edges)
I also search for the terms graph fragmentation but I did not find anything. And I would like to know if exist such algorithm for slicing a dynamic graph?
[UPDATE]
Sorry for not being clear and I am updating my question.Let me first expose the context.
In MDE (Model Driven Engineering), large-scale industrial systems involve nowadays hundreds of developpers working on hundreds of models representing pars of the whole system specification. In a such context, the approach commonly adopted is to use a central repository. The solution I provide for my project (I am currently working on a research lab), is a solution which is peer-to-peer oriented, that means that every developper has his own replication of the system specification.
My main problem is how to replicate this data, the models.
For instance, imagine Alice and Bob working on this UML diagram and Alice has the whole diagram in his repository. Bob wants to have the elements {FeedOrEntry, Entry}, how can I slice this diagram UML?
I search for the terms of "model Slicing".I have found one paper which gives an approach for slicing UML Class Diagrams but the problem with this algorithm is it only works for a static graph. In our context, developpers add/update/remove elements constantly and the shared elements should be consistent with the other replicas.
Since UML Models can also be seen as a graph, I also search for the terms for "graph slicing" or "graph fragment" but I have found nothing useful.
And I would like to know if exist such algorithm for slicing a dynamic graph
If you make slicing atomic, I see no problem with using algorithm shown in paper you linked.
However, for your consistency constraints, I believe that your p2p approach is incompatible. Alternative is merge operation, but I have no idea how would that operation work. It probably, at least partially, would have to be done manually.
Sounds like maybe you need a NoSQL graph database such as Neo4J or FlockDB. They can store billions of vertexes and edges.
What about to normalize the graph to an adjacent tree model? Then you can use a DFS or BFS to slice the graph?

Techniques for visualising change over time in graphs

I'm looking to display a graph (network diagram, not a chart) and show its changes over time. Is there a standard or best way to do this, or any kind of 'network diff' tool?
I'm looking for an overview of the general layout decisions involved, i.e. a list of options and trade-offs to be made, and best-practice guidelines where these exist.
Wow. Not an easy question! I'm curious if anyone can come up with some authoritative resources for you.
I haven't found any standard or best practice documented anywhere from a design standpoint, nor do I know of any tool specifically designed for determining and displaying the changes, but I have some ideas.
First, a few technical notes. There's GraphML, which you can use (and extend) to represent your graph in a standard format, and there are some parsers available, and it works with Prefuse and probably other display libraries. It's just XML, though - nothing too special. Creating the "diff" by comparing two GraphML files should be pretty simple.
The really interesting part is how to communicate the differences to the user.
In all cases, you should have a visual indicator for nodes and edges that are added or removed. You may use color, showing existing nodes as something neutral, say gray, new nodes as green, and removed nodes as red. There are lots of options.
You might find this slideshow interesting.
It's probably obvious, but, over time, the nodes should not move more than necessary to adapt to the new state of the graph - the layout should evolve, not start from scratch for every state. This is crucial for comparing the states.
Side-by-side before/after comparison. Present before and after snapshots of the same graph side-by-side. If your graph is very large and complicated, a side-by-side layout may be impractical. You could try overlaying one graph over the other, though that is likely to be disorienting.
Side-by-side series comparison. AKA small multiples. Same as above but showing as many points in time as is useful. Even more restrictive than before-after in terms of how much space required, and difficult for.
Animate a single graph. I think the most intuitive method is to smoothly animate the graph changes, though a choppy slideshow could work if the changes between slides are not too drastic.
Showing details. If useful, you can spell out the change event details in a few different ways.
Show labels on the graph node (could be interactive if there are too many to show at once)
Show a list in a sidebar / legend. Nice if reading the progression of changes is useful, but harder to connect to the visual.
Show a timeline instead of a list. This shows the 'real' progression of events better than a simple list, which gives the impression that all the events are evenly spaced over time.
What you actually choose to do would depend largely on the nature of your dataset and your goals. A simple graph of a few dozen nodes and a few changes is a much different challenge than a huge network, like say every constellation in the night sky!
Here is an interesting study: http://publik.tuwien.ac.at/files/PubDat_198995.pdf
This paper presents a prototype, and user tests will be published soon in:
P. Federico, W. Aigner, S. Miksch, F. Windhager, M. Smuc:
"Vertigo zoom: combining relational and temporal perspectives on
dynamic networks";
accepted as talk for: 11th International Working Conference on
Advanced Visual Interfaces (AVI2012), Capri Island; 2012-05-21 -
2012-05-25; in: "Proceedings of the 11th International Working
Conference on Advanced Visual Interfaces (AVI2012)", ACM, (2012),
ISBN: 978-1-4503-1287-5.
http://ieg.ifs.tuwien.ac.at/~federico/pub.php
Your question is kind of general, I'm not clear exactly what kinds of analysis you are aiming for. The are several network analysis packages that have some dynamics capacity. Gephi is one. The networkDynamic and ndtv R packages provide tools for representing and visualizing dynamics as animations and static layouts (disclaimer: I'm a maintainer)

What are good examples of problems that graphs can solve better than the alternative? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
After reading Stevey Yegge's Get That Job At Google article, I found this little quote interesting:
Whenever someone gives you a problem, think graphs. They are the most fundamental and flexible way of representing any kind of a relationship, so it's about a 50–50 shot that any interesting design problem has a graph involved in it. Make absolutely sure you can't think of a way to solve it using graphs before moving on to other solution types. This tip is important!
What are some examples of problems that are best represented and/or solved by graph data structures/algorithms?
One example I can think of: navigation units (ala Garmin, TomTom), that supply road directions from your current location to another, utilize graphs and advanced pathing algorithms.
What are some others?
Computer Networks: Graphs model intuitively model computer networks and the Internet. Often nodes will represent end-systems or routers, while edges represent connections between these systems.
Data Structures: Any data structure that makes use of pointers to link data together is making use of a graph of some kind. This includes tree structures and linked lists which are used all the time.
Pathing and Maps: Trying to find shortest or longest paths from some location to a destination makes use of graphs. This can include pathing like you see in an application like Google maps, or calculating paths for AI characters to take in a video game, and many other similar problems.
Constraint Satisfaction: A common problem in AI is to find some goal that satisfies a list of constraints. For example, for a University to set it's course schedules, it needs to make sure that certain courses don't conflict, that a professor isn't teaching two courses at the same time, that the lectures occur during certain timeslots, and so on. Constraint satisfaction problems like this are often modeled and solved using graphs.
Molecules: Graphs can be used to model atoms and molecules for studying their interaction and structure among other things.
I am very very interested in graph theory and ive used it solved so many different kinds of problem. You can solve a lot of Path related problem, matching problem, structure problems using graph.
Path problems have a lot of applications.
This was in a career cup's interview question.
Say you want to find the longest sum of a sub array. For example, [1, 2, 3, -1] has the longest sum of 6. Model it as a Directed Acyclic Graph (DAG), add a dummy source, dummy destination. Connect each node with an edge which has a weight corresponding to the number. Now use the Longest Path algorithm in the DAG to solve this problem.
Similarly, Arbitrage problems in financial world or even geometry problems of finding the longest overlapping structure is a similar path problem.
Some obvious ones would be the network problems (where your network could have computers people, organisation charts, etc).
You can glean a lot of structural information like
which point breaks the graph into two pieces
what is the best way to connect them
what is the best way to reach one place to another
is there a way to reach one place from another, etc.
I've solved a lot of project management related problems using graphs. A sequence of events can be pictured as a directed graph (if you don't have cycles then thats even better). So, now you can
sort the events according to their priority
you can find the event that is the most crucial (that is would free a lot of other projects)
you can find the duration needed to solve the total project (path problem), etc.
A lot of matching problems can be solved by graph. For example, if you need to match processors to the work load or match workers to their jobs. In my final exam, I had to match people to tables in restaurants. It follows the same principle (bipartite matching -> network flow algorithms). Its simple yet powerful.
A special graph, a tree, has numerous applications in the computer science world. For example, in the syntax of a programming language, or in a database indexing structure.
Most recently, I also used graphs in compiler optimization problems. I am using Morgan's Book, which is teaching me fascinating techniques.
The list really goes on and on and on. Graphs are a beautiful math abstraction for relation. You really can do wonders, if you can model it correctly. And since the graph theory has found so many applications, there are many active researches in the field. And because of numerous researches, we are seeing even more applications which is fuelling back researches.
If you want to get started on graph theory, get a good beginner discrete math book (Rosen comes to my mind), and you can buy books from authors like Fould or Even. CLRS also has good graph algorithms.
Your source code is tree structured, and a tree is a type of graph. Whenever you hear people talking about an AST (Abstract Syntax Tree), they're talking about a kind of graph.
Pointers form graph structures. Anything that walks pointers is doing some kind of graph manipulation.
The web is a huge directed graph. Google's key insight, that led them to dominate in search, is that the graph structure of the web is of comparable or greater importance than the textual content of the pages.
State machines are graphs. State machines are used in network protocols, regular expressions, games, and all kinds of other fields.
It's rather hard to think of anything you do that does not involve some sort of graph structure.
An example most people are familiar: build systems. Make is the typical example, but almost any good build system relies on a Directed Acyclic Graph. The basic idea is that the direction models the dependency between a source and a target, and you should "walk" the graph in a certain order to build things correctly -> this is an example of topological sort.
Another example is source control system: again based on a DAG. It is used for merging, for example, to find common parent.
Well, many program optimization algorithms that compilers use are based on graphs (e.g., figure out call graph, flow control, lots of static analysis).
Many optimization problems are based on graph. Since many problems are reducable to graph colouring and similar problems, then many other problems are also graph based.
I'm not sure I agree that graphs are the best way to represent every relation and I certainly try to avoid these "got a nail, let's find a hammer" approaches. Graphs often have poor memory representations and many algorithms are actually more efficient (in practice) when implemented with matrices, bitsets, and other things.
OCR. Picture a page of text scanned at an angle, with some noise in the image, where you must find the space between lines of text. One way is to make a graph of pixels, and find the shortest path from one side of the page to the other, where the difference in brightness is the distance between pixels.
This example is from the Algorithm Design Manual, which has lots of other real world examples of graph problems.
One popular example is garbage collection.
The collector starts with a set of references, then traverses all the objects they reference, then all the objects referenced there and so on. Everything it finds is added into a graph of reachable objects. All other objects are unreachable and collected.
To find out if two molecules can fit together. When developing drugs one is often interested in seeing if the drug molecules can fit into larger molecules in the body. The problem with determining whether this is possible is that molecules are not static. Different parts of the molecule can rotate around their chemical bindings so that a molecule can change into quite a lot of different shapes.
Each shape can be said to represent a point in a space consisting of shapes. Solving this problem involves finding a path through this space. You can do that by creating a roadmap through space, which is essentially a graph consisting of legal shapes and saying which shape a shape can turn into. By using a A* graph search algorithm through this roadmap you can find a solution.
Okay that was a lot of babble that perhaps wasn't very understandable or clear. But my point was that graphs pop up in all kinds of problems.
Graphs are not data structures. They are mathematical representation of relations. Yes, you can think and theoretize about problems using graphs, and there is a large body of theory about it. But when you need to implement an algorithm, you are choosing data structures to best represent the problem, not graphs. There are many data structures that represent general graphs, and even more for special kinds of graphs.
In your question, you mix these two things. The same theoretical solution may be in terms of graph, but practical solutions may use different data structures to represent the graph.
The following are based on graph theory:
Binary trees and other trees such as Red-black trees, splay trees, etc.
Linked lists
Anything that's modelled as a state machine (GUIs, network stacks, CPUs, etc)
Decision trees (used in AI and other applications)
Complex class inheritance
IMHO most of the domain models we use in normal applications are in some respect graphs. Already if you look at the UML diagrams you would notice that with a directed, labeled graph you could easily translate them directly into a persistence model. There are some examples of that over at Neo4j
Cheers
/peter
Social connections between people make an interesting graph example. I've tried to model these connections at the database level using a traditional RDMS but found it way too hard. I ended up choosing a graph database and it was a great choice because it makes it easy to follow connections (edges) between people (nodes).
Graphs are great for managing dependencies.
I recently started to use the Castle Windsor Container, after inspecting the Kernel I found a GraphNodes property. Castle Windsor uses a graph to represent the dependencies between objects so that injection will work correctly. Check out this article.
I have also used simple graph theory to develop a plugin framework, each graph node represent a plugin, once the dependencies have been defined I can traverse the graph to create a plugin load order.
I am planning on changing the algorithm to implement Dijkstra's algorithm so that each plugin is weighted with a specific version, thus a simple change will only load the latest version of the plugin.
I with I had discovered this sooner. I like that quote "Whenever someone gives you a problem, think graphs." I definitely think that's true.
Profiling and/or Benchmarking algorithms and implementations in code.
Anything that can be modelled as a foreign key in a relational database is essentially an edge and nodes in a graph.
Maybe that will help you think of examples, since most things are readily modelled in a RDBMS.
You could take a look at some of the examples in the Neo4j wiki,
http://wiki.neo4j.org/content/Domain_Modeling_Gallery
and the projects that Neo4j is used in (the known ones)
http://wiki.neo4j.org/content/Neo4j_In_The_Wild .
Otherwise, Recommender Algorithms are a good use for Graphs, see for instance PageRank, and other stuff at
https://github.com/tinkerpop/gremlin/wiki/pagerank
Analysing transaction serialisability in database theory.
You can utilise graphs anywhere you can define the problem domain objects into nodes and the solution as the flow of control and/or data amongst the nodes.
Considering the fact that trees are indeed connected-acyclic graphs, there are even more areas you can use the graph theory.
Basically nearlly all common data structures like trees, lists, queues, etc, can be thought as type of graph, some with different type of constraint.
To my experiences, I have used graph intensively in network flow problems, which is used in lots of areas like telecommunication network routing and optimisation, workload assignment, matching, supply chain optimisation and public transport planning.
Another interesting area is social network modelling as previous answer mentioned.
There are far more, like integrated circuit optimisation, etc.

Resources