How to add this type of node description by Mermaid? - mermaid

This is a flowchart pattern that I really like to use and I currently use drawio to draw it:
Notice that there are two kinds of descriptions in the flow chart
description1:How does A get to B
description2:Some properties of B
I know Mermaid can implement the description1 by:
graph TB
A --->|"description1:<br>How does A get to B"| B
But description2 is also very important to me, is there any way to achieve it?
The current workaround:
I use the heading of subgraph instead of description2:
graph TB
A --->|"description1:<br>How does A get to B"| B
subgraph description2:<br>Some properties of B
B
end
But I have to say it's a very ugly temporary solution. So I ask here..

While some types of Mermaid diagrams explicitly support notes (e.g. sequence diagrams), flowcharts do not.
I believe the closest you're going to get is to connect B to itself with an invisible link (~~~):
graph TB
A --->|"description1:<br>How does A get to B"| B
B ~~~|"description2:<br>Some properties of B"| B

Related

Aligning Clusters within Graphviz

I am using Graphviz to automatically create an architecture diagram. I am having the following two problems and was hoping to get assistance.
I am using UUID to uniquely identify a component (example: "a5320de8-a320-11ea-bb37-0242ac130002" [label="Component A"]). When mapping A -> B, I'll get "Component A" -> b0c5e47c. Which is strange. The only way that I've been able to map UUID to UUID is to put quotes around them. Any suggestions?
I want to align clusters in a specific manner and specific direction. I've tried {rank=same; cluster_B, cluster_C, cluster_D}; and "9653369c-a322-11ea-bb37-0242ac130002" -> "aa31adb9-9621-40c2-855c-621832dd8c61" [style=invis] But neither work.
I have three sections within my dot file, they are:
Components (within this section, I list out all 100+ components and color code them based on a specific rule.
Clusters (within this section, I cluster the components into specific 'groupings')
Diagram or mapping (within this section, I then map the different components and clusters).
Here is a sample of my DOT file.
digraph architecture {
#graph [rankdir=LR]
compound=true;
#Compliant
node[fillcolor="#013220" style="filled" shape=square fontcolor="white"];
"a5320de8-a320-11ea-bb37-0242ac130002" [label="Component A"]
"b0c5e47c-a320-11ea-bb37-0242ac130002" [label="Component B"]
#Clusters
#Customer-facing client application cluster
subgraph cluster_A{
label="Client Apps";
"f7b3915d-6b3d-4d4c-bef0-bdabda915c03";
"9912de2b-739a-4c5c-834e-e0c3d09d70d1";
"16bb2066-9293-470e-99ec-c59d8426c0ab";
"641a6601-f4f6-4c06-baa6-e5e232f8abed";
"c5e92b09-a470-4fb6-af5c-e5f7dbeff919";
}
#Diagrams
"f7b3915d-6b3d-4d4c-bef0-bdabda915c03" -> {"35305026-d285-458c-85ad-7eae4e785e84", "76e0e679-42a6-47f0-9164-abc223da07fe"};
76e0e679-42a6-47f0-9164-abc223da07fe" -> "35305026-d285-458c-85ad-7eae4e785108";
}
I get something like:
However, I want to arrange the cluster is a specific way, like:
As you found out, hyphens are not legal characters in a node ID unless the string is quoted. If you want more info: https://www.graphviz.org/doc/info/lang.html
There is no straight-forward to align clusters. Sometime you can force desired alignment by embedding multiple clusters within another cluster to "shrink-wrap" them. For example embed B3 and B4 within cluster B34. But no guarantees.
You can use gvpr to reposition clusters (and their contents) but that can get pretty complex.

How do I manage missing data due to logic question SPSS?

I had a question on my questionnaire that asked living arrangements and if participants selected option B or C they got asked another question. All data on SPSS for those who selected A is coming up as a '.' as they obviously did not get to see the next question.
Should I just write all missing answers as a number e.g. 3 and then label the number 3 as 'ignore' or 'NA' in variable view or is there something else I can do?
Please note I'm a 3rd year university student who is awful at SPSS so I don't know all the technical terms! Hope someone can help.
Thanks.
As mentioned in the comments, it really depends on how you intend on using your data. That said, one approach is to assign values to your system missings to increase flexibility. I'll riff an example based on your description.
Q1) Please describe your living arrangements.
a. I live alone.
b. I live with family/roommates and am the head of household.
c. I live with family/roommates, but am not the head of household.
[If Q1=a Then skip]
Q2) How many other members, excluding yourself, do you live with?
a. 1 b. 2 c. 3 d. 4 or more
Q3) Are any of these [Q2 count] members non-family roommates?
a. Yes
b. No
Since Q2 is a count of people in household (excluding self), you might recode those living on their own to actually answer the question.
RECODE Q2 (SYSMIS=0).
EXE .
Or if there was a need to differentiate (either because not all SYSMIS should be 0 or because you want to keep track of why they are 0). In this case you can toggle those missing values on/off as needed.
IF (Q1='a') Q2=0.001 .
EXE .
ADD VALUE LABELS Q2 .001 'Lives Alone' .
MISSING VALUES Q2 (.001) .
For Q3, it really is more of an N/A. In that case you might choose an arbitrary value (-1, 99, etc) for tracking purposes and always keep those missing values set.
If (Q1='a' AND Q3=$SYSMIS) Q3=99 .
EXE .
ADD VALUE LABELS Q3 99 'N/A, lives alone' .
MISSING VALUES Q3 (99) .

Force Graphviz to complain for duplicate nodes

I just noticed that duplicate node names (even if labeled uniquely) get processed without complaint by Graphviz. For example, consider the following simple graph as rendered (with circo) in the image below:
graph {
a [label="a1"]
a [label="a2"]
b
c
d
e
a -- b;
b -- c;
a -- c;
d -- c;
e -- c;
e -- a;
}
I want the above graph to have two nodes: a1 and a2. So I know I should instantiate them with unique names (different than what I did above). But in a large graph, I may not notice that I mistakenly instantiated two different nodes with identical names. So if I do something like this, I'd like to force Graphviz to complain about it or bring it to my attention somehow, maybe with a warning or an error message.
How do I accomplish that?
All the graphviz programs silently merge nodes with duplicate names and I cannot find any way to have them produce a warning when they do that. Since we only have to find the cases where nodes are declared by themselves, however, rather than nodes that are implicitly declared when an edge is declared (in which case duplication is normal and expected), we just have to find all the node names and identify the duplicates.
If no more than one node is ever declared on a line, this could be done with the following script:
#!/bin/sh
sed -n 's/^[\t ][\t ]*\([_a-zA-Z][_a-zA-Z0-9]*\) *\(\[.*\)*;*$/\1/ p' | \
sort | uniq -c | awk '$$1>1'
If we call this script findDupNodes, we can run it as follows:
$ findDupNodes <duplicates.gv
2 a
The script finds node names that are either declared by themselves or with a list of attributes that starts with [, sorts them, counts how many times each is declared (with uniq -c) and filters out the ones that are declared only once.
Multiple nodes can be declared on a single line (e.g. a; b; c; d;) but this script does not handle that case, or (probably) some other cases -- most of which would probably require a full-blown xdot language parser.
Nevertheless, this script should find many of the duplicate node names that might find their way into hand-written graphviz scripts.

SPSS: Can I generate graphs for multiple variables using a single syntax input with the GRAPH command?

I was wondering if it was possible to create graphs for multiple variables in a single syntax command in SPSS:
GRAPH
/HISTOGRAM(NORMAL)=
As it is, I'm creating multiple graphs as such:
GRAPH
/HISTOGRAM(NORMAL)=CO
GRAPH
/HISTOGRAM(NORMAL)=Min_last
GRAPH
/HISTOGRAM(NORMAL)=Day_abs
etc etc.
If I would do something along the lines of:
GRAPH
/HISTOGRAM(NORMAL)=CO Min_last Day_abs
and it would generate a graph for each variable, I'd be pretty happy.
Anyways, let me know if you think it's possible or if I need to provide more info. Thanks for reading!
If you just to save typing and want an independent set of graphs, you can define a macro like this.
define !H (!positional !cmdend)
!do !i !in (!1)
graph /histogram(normal)=!i.
!doend
!enddefine.
and invoke it with a list of variables.
!H salary salbegin.
The way I like to do it is to reshape the data so all three variables are in the same row using VARSTOCASES and then either panel the charts in small multiples (if you want the axes to be the same) or use SPLIT FILES to produce seperate charts. Example of the split file approach below:
*Making fake data.
INPUT PROGRAM.
LOOP #i = 1 TO 100.
COMPUTE CO = RV.NORMAL(0,1).
COMPUTE Min_last = RV.UNIFORM(0,1).
COMPUTE Days_abs = RV.POISSON(5).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
*Reshaping to long.
VARSTOCASES /MAKE V FROM CO Min_last Days_abs /INDEX VLab (V).
*Split file and build seperate charts.
SORT CASES BY VLab.
SPLIT FILE BY VLab.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=V
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: V=col(source(s), name("V"))
GUIDE: axis(dim(1), label("Value"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(V))), shape.interior(shape.square))
END GPL.
SPLIT FILE OFF.

What's the best way to inherit properties in a tree-based structure?

I have a simple CMS system, that has a simple tree hierarchy:
We have pages A through E that has the following hierarchy:
A -> B -> C -> D -> E
All the pages are the same class, and have a parent-child relationship.
Now, let's say I have a property I want inherited among the pages. Let's say A is red:
A (red) -> B -> C -> D -> E
In this case, B through E would inherit "red".
Or a more complex scenarios:
A (red) -> B -> C (blue) -> D -> E
B would inherit red, and D/E would both be blue.
What would be the best way to solve something like this? I have a tree structure with over 6,000 leafs and about 100 of those leaves have inheritable properties. Those 100 or so leaves have their properties saved in the database. For leaves without explicit properties, I look up the ancestors and use memcached to save the properties. Then there are very overly-complex algorithms to handle expiring those caches. It's terribly convoluted and I'd like refactor to a more cleaner solution / data structure.
Does anybody have any ideas?
Thanks!
There is a data model that allows you to express this kind of information perfectly, that is RDF/RDFS . RDF is a W3C standard to model data based on triples (subject, predicate, object) and URIs; and RDFS , among other things, allows you to describe Class hierarchies and Property hierarchies. And the good thing is that there are many libraries out there that help you to create and query this type of data.
For instance if I want to say that a specific document Lion is of class Animal and programmer is of class Geek , I could say:
doc:lion rdf:type class:mamal .
doc:programmer rdf:type class:Geek .
Now I could declare a hierarchy of classes, and say that every mamal is an animal and every animal is a living thing.
class:mamal rdfs:subClassOf class:animal .
class:animal rdfs:subClassOf class:LivingThing .
And, that every geek is a human and that every human is living thing:
class:geek rdfs:subClassOf class:human .
class:human rdfs:subClassOf class:LivingThing .
There is a language , similar to SQL, called SPARQL to query this kind of data, so for instance if I issue the query:
SELECT * WHERE {
?doc rdf:type class:LivingThing .
}
Where ?doc is a variable that will bind things type of class:LivingThing. I would get as result of this query doc:lion and doc:programmer because the database technology will follow the semantics of RDFS and therefore by computing the closure of classes it'll know that doc:lion and doc:programmer are class:LivingThing.
In the same way the query:
SELECT * WHERE {
doc:lion rdf:type ?class .
}
Will tell me that doc:lion is rdf:type of class:mamal class:animal and class:LivingThing.
In the same way that as I just explained, with RDFS, you can create hierarchies of properties, and say:
doc:programmer doc:studies doc:computerscience .
doc:lion doc:instint doc:hunting .
And we can say that both properties doc:skill and doc:instint are sub-properties of doc:knows:
doc:studies rdfs:subPropertyOf doc:knows .
doc:instint rdfs:subPropertyOf doc:knows .
With the query:
SELECT * WHERE {
?s doc:knows ?o .
}
We will get that a lion knows how to hunt and programmers know computer science.
Most RDF/RDFS databases can easily deal with the numbers of elements you mentioned in your question, and there are many choices to start. If you are a Java person you could have a look at Jena, there are also frameworks for .Net lije this one or Python with RDFLIB
But most importantly, have a look at the documentation of your CMS, because maybe there are plugins to export metadata as RDF. Drupal, for instance, is quite advance in this case (see http://drupal.org/project/rdf
If your problem is performance-related...
I assume you'd want to save on memory of all these inheritable properties (or perhaps you have a lot of them), otherwise this can be trivially solved with virtual properties.
If you need sparse inheritable properties, say if you are modelling how HTML DOM properties or CSS properties propagate, you'll need to:
Keep a pointer to the parent node (for walking upwards)
Use a hash dictionary to store the properties inside each class (or each instance, depending on your needs), keyed by name
If the properties don't vary by instance, use a class-static dictionary
If the properties can be overridden instance-by-instance, add an instance dictionary on top
When accessing a property, start finding it at the leaf, look in the instance dictionary first, then the class-static dictionary, then walk up the tree
Of course you can add more functionalities on top of this. This is similar to how Windows Presentation Foundation solves this problem via DependencyProperty.
If your problem is database-related...
If instead your problem is to avoid reading the database to walk up the tree (i.e. loading the parents to find inherited properties), you'll need to do some sort of caching for the parent values. Or alternatively, when you load a leaf from the database, you can load all its parents and create a master merged properties dictionary in memory.
If you want to avoid multiple database lookups to find each parent, one trick is to encode the path to each node into a text field, e.g. "1.2.1.3.4" for a leaf on the 6th level. Then, only load up nodes that have paths which are beginning substrings. You can then get the entire parents path in one SQL query.

Resources