Graphviz Sharing Attributes between Nodes or Edges - graphviz

I'm using Graphviz (namely Dot) to draw up a state machine for a Hypermedia API I'm planning on building. In my graph, nodes represent states, while edges represent links. What I'm trying to do is have edges (links) of the same "type" (ie, use the same verb or same rel) to share attributes like color.
I know you can define "global" attributes that apply to all nodes/edges, but I need something I can apply more generally to several different "types". The closest analogy I can come up with for what I want is HTML classes. I don't need multiple "classes" for my edges (although that would be nice) but repeating attributes like color=red, style=bold is cumbersome.
Is there a way in Dot to declare something like this? Or at least some way I don't have to repeat myself so often?

I've done this in two different ways:
Option (A): Write the dot file from another script. This is particularly useful when I'm using a script (in, say, Python or Perl) to rework the input data into dot format for drawing. In that case, as well as having the Python script write the data into dot format I can also have it write the attributes for each node and edge into the dot file. An example is shown below (not runnable because I've extracted it from a larger script that interprets the input data but you can see how the Perl is writing the dot code).
print "graph G {\n graph [overlap = scale, size = \"10,10\"]; node [fontname = \"Helvetica\", fontsize = 9]\n";
for ($j = 0; $j <= $#sectionList; $j++) {
print "n$j [label = \"$sectionList[$j]\", style = filled, fillcolor = $groupColour{$group{$sectionList[$j]}} ]\n";
}
for ($j = 0; $j <= $#sectionList; $j++) {
for ($i = $j+1; $i <= $#sectionList; $i++) {
$wt = ($collab{$sectionList[$j]}{$sectionList[$i]}+0)/
($collab{$sectionList[$j]}{$sectionList[$j]}+0);
if ($wt > 0.01) {
print "n$j -- n$i [weight = $wt, ";
if ($wt > 0.15) {
print "style = bold]\n";
}
elsif ($wt > 0.04) {
print "]\n";
} else {
print "style = dotted]\n";
}
}
}
print "\n";
}
print "}\n";
Option (B): If I'm writing the dot script by hand, I'll use a macro processor to define common elements. For example given the file polygon.dot.m4 containing the m4 macro define() as follows:
define(SHAPE1,square)
define(SHAPE2,triangle)
digraph G {
a -> b -> c;
b -> d;
a [shape=SHAPE1];
b [shape=SHAPE2];
d [shape=SHAPE1];
e [shape=SHAPE2];
}
... the command m4 <polygon.dot.m4 | dot -Tjpg -opolygon.jpg produces:
Changing the definitions of SHAPE1 and SHAPE2 at the top of the file will change the shapes drawn for each of the relevant nodes.

Related

Fuzzy string record search algorithm (supporting word transpose and character transpose)

I am trying to find the best algorithm for my particular application. I have searched around on SO, Google, read various articles about Levenshtein distances, etc. but honestly it's a bit out of my area of expertise. And most seem to find how similar two input strings are, like a Hamming distance between strings.
What I'm looking for is different, more of a fuzzy record search (and I'm sure there is a name for it, that I don't know to Google). I am sure someone has solved this problem before and I'm looking for a recommendation to point me in the right direction for my further research.
In my case I am needing a fuzzy search of a database of entries of music artists and their albums. As you can imagine, the database will have millions of entries so an algorithm that scales well is crucial. It's not important to my question that Artist and Album are in different columns, the database could just store all words in one column if that helped the search.
The database to search:
|-------------------|---------------------|
| Artist | Album |
|-------------------|---------------------|
| Alanis Morissette | Jagged Little Pill |
| Moby | Everything is Wrong |
| Air | Moon Safari |
| Pearl Jam | Ten |
| Nirvana | Nevermind |
| Radiohead | OK Computer |
| Beck | Odelay |
|-------------------|---------------------|
The query text will contain from just one word in the entire Artist_Album concatenation up to the entire thing. The query text is coming from OCR and is likely to have single character transpositions but the most likely thing is the words are not guaranteed to have the right order. Additionally, there could be extra words in the search that aren't a part of the album (like cover art text). For example, "OK Computer" might be at the top of the album and "Radiohead" below it, or some albums have text arranged in columns which intermixes the word orders.
Possible search strings:
C0mputer Rad1ohead
Pearl Ten Jan
Alanis Jagged Morisse11e Litt1e Pi11
Air Moon Virgin Records
Moby Everything
Note that with OCR, some letters will look like numbers, or the wrong letter completely (Jan instead of Jam). And in the case of Radiohead's OK Computer and Moby's Everything Is Wrong, the query text doesn't even have all of the words. In the case of Air's Moon Safari, the extra words Virgin Records are searched, but Safari is missing.
Is there a general algorithm that could return the single likeliest result from the database, and if none meet some "likeliness" score threshold, it returns nothing? I'm actually developing this in Python, but that's just a bonus, I'm looking more for where to get started researching.
Let's break the problem down in two parts.
First, you want to define some measure of likeness (this is called a metric). This metric should return a small number if the query text closely matches the album/artist cover, and return a larger number otherwise.
Second, you want a datastructure that speeds up this process. Obviously, you don't want to calculate this metric every single time a query is ran.
part 1: the metric
You already mentioned Levenshtein distance, which is a great place to start.
Think outside the box though.
LD makes certain assumptions (each character replacement is equally likely, deletion is equally likely as insertion, etc). You can obviously improve the performance of this metric by taking into account what faults OCR is likely to introduce.
E.g. turning a '1' into an 'i' should not be penalized as harshly as turning a '0' into an '_'.
I would implement the metric in two stages. For any given two strings:
split both strings in tokens (assume space as the separator)
look for the most similar words (using a modified version of LD)
assign a final score based on 'matching words', 'missing words' and 'added words' (preferably weighted)
This is an example implementation (fiddle around with the constants):
static double m(String a, String b){
String[] aParts = a.split(" ");
String[] bParts = b.split(" ");
boolean[] bUsed = new boolean[bParts.length];
int matchedTokens = 0;
int tokensInANotInB = 0;
int tokensInBNotInA = 0;
for(int i=0;i<aParts.length;i++){
String a0 = aParts[i];
boolean wasMatched = true;
for(int j=0;j<bParts.length;j++){
String b0 = bParts[j];
double d = levenshtein(a0, b0);
/* If we match the token a0 with a token from b0
* update the number of matchedTokens
* escape the loop
*/
if(d < 2){
bUsed[j]=true;
wasMatched = true;
matchedTokens++;
break;
}
}
if(!wasMatched){
tokensInANotInB++;
}
}
for(boolean partUsed : bUsed){
if(!partUsed){
tokensInBNotInA++;
}
}
return (matchedTokens
+ tokensInANotInB * -0.3 // the query is allowed to contain extra words at minimal cost
+ tokensInBNotInA * -0.5 // the album title should not contain too many extra words
) / java.lang.Math.max(aParts.length, bParts.length);
}
This function uses a modified levenshtein function:
static double levenshtein(String x, String y) {
double[][] dp = new double[x.length() + 1][y.length() + 1];
for (int i = 0; i <= x.length(); i++) {
for (int j = 0; j <= y.length(); j++) {
if (i == 0) {
dp[i][j] = j;
}
else if (j == 0) {
dp[i][j] = i;
}
else {
dp[i][j] = min(dp[i - 1][j - 1]
+ costOfSubstitution(x.charAt(i - 1), y.charAt(j - 1)),
dp[i - 1][j] + 1,
dp[i][j - 1] + 1);
}
}
}
return dp[x.length()][y.length()];
}
Which uses the function 'cost of substitution' (which works as explained)
static double costOfSubstitution(char a, char b){
if(a == b)
return 0.0;
else{
// 1 and i
if(a == '1' && b == 'i')
return 0.5;
if(a == 'i' && b == '1')
return 0.5;
// 0 and O
if(a == '0' && b == 'o')
return 0.5;
if(a == 'o' && b == '0')
return 0.5;
if(a == '0' && b == 'O')
return 0.5;
if(a == 'O' && b == '0')
return 0.5;
// default
return 1.0;
}
}
I only included a couple of examples (turning '1' into 'i' or '0' into 'o').
But I'm sure you get the idea.
part 2: the datastructure
Look into BK-trees. They are a specific datastructure to hold metric information. Your metric needs to be a genuine metric (in the mathematical sense of the word). But that's easily arranged.

Mata error 3204

I am unsure why I am getting an error.
I think it may stem from a misunderstanding around the structure syntax, but I am not certain if this is the issue (it would be unsurprising if there are multiple issues).
I am emulating code (from William Gould's The Mata Book) in which the input is a scalar, but the input for the program I am writing is a colvector.
The objective of this exercise is to create a square matrix from a column vector (according to some rules) and once created, multiply this square matrix by itself.
The code is the following:
*! spatial_lag version 1.0.0
version 15
set matastrict on
//--------------------------------------------------------------
local SL struct laginfo
local RS real scalar
local RC real colvector
local RM real matrix
//--------------------------------------------------------------
mata
`SL'
{
//-------------------inputs:
`RC' v
//-------------------derived:
`RM' W
`RM' W2
`RS' n
}
void lagset(`RC' v)
{
`SL' scalar r
// Input:
r.v = v
//I set the derived variables to missing:
r.W = .z
r.W2 = .z
r.n = .z // length of vector V
}
`RM' w_mat(`SL' scalar r)
{
if (r.W == .z) {
real scalar row, i
real scalar col, j
r.W = J(r.n,r.n,0)
for (i=1; i<=r.n; i++) {
for (i=1; i<=r.n; i++) {
if (j!=i) {
if (r.v[j]==r.v[i]) {
r.W[i,j] = 1
}
}
}
}
}
return(r.W)
}
`RS' wlength(`SL' scalar r)
{
if (r.n == .z) {
r.n = length(r.v)
}
return(r.n)
}
`RM' w2mat(`SL' scalar r)
{
if (r.W2 == .z) {
r.W2 = r.W * r.W
}
return(r.W2)
}
end
This compiles without a problem, but it give an error when I attempt to use it interactively as follows:
y=(1\1\1\2\2\2)
q = lagset(y)
w_mat(q)
w2mat(q)
The first two lines run fine, but when I run the last two of those lines, I get:
w_mat(): 3204 q[0,0] found where scalar required
<istmt>: - function returned error
What am I misunderstanding?
This particular error is unrelated to structures. Stata simply complains because the lagset() function is void. That is, it does not return anything. Thus, q ends up being empty, which is in turn used as input in the function w_mat() inappropriately - hence the q[0,0] reference.

Force some connections to be horizontal

I'm using DOT to visualize a lisp AST, and the picture that is generated currently looks like this:
Currently, the vertical lines are specified normally like parent -> child;, and the skewed ones are specified using constraint like so: parent -> child [constraint=false];.
This kind of works, but what I'm really looking for is a way to make the vertical connections stay the same where each connection puts the child one row downwards, but make the horizontal connections be actually horizontal. This would create something that looks more like this:
Is this possible?
You may be making it too complicated - this simple basic code does the job:
digraph so
{
# nodes
A[ label = "list" ];
B[ label = "ident: +" ];
C[ label = "literal: 1" ];
D[ label = "list" ];
E[ label = "ident: *" ];
F[ label = "literal: 3" ];
G[ label = "literal: 2" ];
# layout
{ rank = same; B C D }
{ rank = same; E F G }
# edges
A -> B;
B -> C -> D;
D -> E;
E -> F -> G;
}
compiled with dot -T png -o so.png so.dot yields what you want:

How can I transform the code I wrote down below?

I am suppose to code the snake game in java with processing for IT classes and since I had no idea how to do it I searched for a YouTube tutorial. Now I did find one but he used the keys 'w','s','d','a' to move the snake around - I on the other hand want to use the arrow keys. Could someone explain to me how I transform this code:
if (keyPressed == true) {
int newdir = key=='s' ? 0 : (key=='w' ? 1 : (key=='d' ? 2 : (key=='a' ? 3 : -1)));
}
if(newdir != -1 && (x.size() <= 1 || !(x.get(1) ==x.get(0) + dx[newdir] && y.get (1) == y.get(0) + dy[newdir]))) dir = newdir;
}
into something like this:
void keyPressed () {
if (key == CODED) {
if (keyCode == UP) {}
else if (keyCode == RIGHT) {}
else if (keyCode == DOWN) {}
else if (keyCode == LEFT) {}
}
This is my entire coding so far:
ArrayList<Integer> x = new ArrayList<Integer> (), y = new ArrayList<Integer> ();
int w = 900, h = 900, bs = 20, dir = 1; // w = width ; h = height ; bs = blocksize ; dir = 2 --> so that the snake goes up when it starts
int[] dx = {0,0,1,-1} , dy = {1,-1,0,0};// down, up, right, left
void setup () {
size (900,900); // the 'playing field' is going to be 900x900px big
// the snake starts off on x = 5 and y = 30
x.add(5);
y.add(30);
}
void draw() {
//white background
background (255);
//
// grid
// vertical lines ; the lines are only drawn if they are smaller than 'w'
// the operator ++ increases the value 'l = 0' by 1
//
for(int l = 0 ; l < w; l++) line (l*bs, 0, l*bs, height);
//
// horizontal lines ; the lines are only drawn if they are smaller than 'h'
// the operator ++ increases the value 'l = 0' by 1
//
for(int l = 0 ; l < h; l++) line (0, l*bs, width, l*bs);
//
// snake
for (int l = 0 ; l < x.size() ; l++) {
fill (0,255,0); // the snake is going to be green
rect (x.get(l)*bs, y.get(l)*bs, bs, bs);
}
if(frameCount%5==0) { // will check it every 1/12 of a second -- will check it every 5 frames at a frameRate = 60
// adding points
x.add (0,x.get(0) + dx[dir]); // will add a new point x in the chosen direction
y.add (0,y.get(0) + dy[dir]); // will add a new point y in the chosen direction
// removing points
x.remove(x.size()-1); // will remove the previous point x
y.remove(y.size()-1); // will remove the previous point y
}
}
It's hard to answer general "how do I do this" type questions. Stack Overflow is designed for more specific "I tried X, expected Y, but got Z instead" type questions. That being said, I'll try to answer in a general sense:
You're going to have a very difficult time trying to take random code you find on the internet and trying to make it work in your sketch. That's not a very good way to proceed.
Instead, you need to take a step back and really think about what you want to happen. Instead of taking on your entire end goal at one time, try breaking your problem down into smaller steps and taking on those steps one at a time.
Step 1: Can you store the state of your game in variables? You might store things like the direction the snake is traveling the location of the snake, etc.
Step 2: Can you write code that just prints something to the console when you press the arrow keys? You might do this in a separate example sketch instead of trying to add it directly to your full sketch.
Step 3: Can you combine those two steps and change the state of your sketch when an arrow key is pressed? Maybe you change the direction the snake is traveling.
The point is that you need to try something instead of trying to copy-paste random code without really understanding it. Break your problem down into small steps, and then post an MCVE of that specific step if you get stuck. Good luck.
You should take a look into Java API KeyEvent VK_LEFT.
And as pczeus already told you, you need to implement a capturing of the keystrokes! This can be checked here (Link from this SO answer).

How do I convert a STAN model file to a graphviz DOT file or another graphical representation?

I have a STAN file describing an hierarchical model. I would like to visualize this hierarchy with all parameters by converting the STAN code to a Graphviz DOT file. Another graphical representation will do fine as well.
Consider the following small example:
data {
int<lower=0> J; // number of items
int<lower=0> y[J]; // number of successes for j
int<lower=0> n[J]; // number of trials for j
}
parameters {
real<lower=0,upper=1> theta[J]; // chance of success for j
real<lower=0,upper=1> lambda; // prior mean chance of success
real<lower=0.1> kappa; // prior count
}
transformed parameters {
real<lower=0> alpha; // prior success count
real<lower=0> beta; // prior failure count
alpha <- lambda * kappa;
beta <- (1 - lambda) * kappa;
}
model {
lambda ~ uniform(0,1); // hyperprior
kappa ~ pareto(0.1,1.5); // hyperprior
theta ~ beta(alpha,beta); // prior
y ~ binomial(n,theta); // likelihood
}
generated quantities {
real<lower=0,upper=1> avg; // avg success
int<lower=0,upper=1> above_avg[J]; // true if j is above avg
int<lower=1,upper=J> rnk[J]; // rank of j
int<lower=0,upper=1> highest[J]; // true if j is highest rank
avg <- mean(theta);
for (j in 1:J)
above_avg[j] <- (theta[j] > avg);
for (j in 1:J) {
rnk[j] <- rank(theta,j) + 1;
highest[j] <- rnk[j] == 1;
}
}
Is there a way to parse this and convert it into a DOT language like file that I can draw to visualize the hierarchy?
I googled around a lot and the closest thing I could find to a parser was inside the http://gephi.github.io/ project.. Not sure if that helps.
What I want to end up with is something similar to this:
There is no tool for that in the Stan repository. Part of the reason is, unlike the BUGS family, such a graph is not necessary for Stan to operate. But they are nice visualization tools, so if you wrote a converter I'm sure there would be interest in using it. My guess is the path of least resistance would involve converting the .stan file to the format expected by PyMC and using their graphing capabilities.

Resources