Related
I've been googling around for the last 15 minutes trying to find an answer to this. But I can't seem to figure it out.
I was tasked with building some small flowcharts for some applications I've developed at work. They don't need anything fancy because they are going to convert it into their preferred format in vizio. They even said we could do it pen and paper. So I figured I would play around with graphviz/dot.
They have 6 pre-defined shapes/colors that they like to use, so I figured I would use them. I've already built them all in dot...but if I plan to re-use them many times, I'd like to find a way to save them as a sort of template.
Is that possible?
For example...These are the predefined shapes.
digraph G {
node [color="#4271C6"]
process [
shape=Mrecord,
style=filled, fillcolor="#E1F4FF",
label="{1. Process\l | Description}"];
subprocess [
shape=record,
style=filled, color="#FFFFFF", fillcolor="#A5A5A5",
label="| Sub-Process |"];
database [
shape=cylinder, color="#18589A",
label="Database"];
inputoutput [
shape=polygon,
style=filled, fontcolor=white,
fixedsize=true, skew=0.3, margin=0,
width=2, label="Input / Output"];
file [
shape=folder,
label="File"];
external [
shape=box3d,
label="External entity"];
}
unfortunately there is no way to define macros or objects and reuse - especially across multiple graphs. However there are ways using other tools. Some folks use m4 (the macro language) or cpp (the C pre-processor) Both work, but there are potential OS issues. Python, awk, ... would also work.
Here is a gvpr program (gvpr is part of the Graphviz package) that also does what you want (I think):
digraph pre{
a [_type=process label="{1. Process\l | Something}"]
b [_type=process label="{2. Process\l | Something else}"]
c [_type=subprocess label="do it"]
d [_type=database label="lots of data"]
e [_type=database label="a bit of data"]
f [_type=inputoutput label="inOut"]
g [_type=file label="nail file"]
h [_type=external label="outside"]
a->b->c->d->e->f->g->h
}
The gvpr program:
BEG_G{
$G.newrank="true";
}
N{
$.color="#4271C6"; // default
}
N[_type=="process"]{
$.shape="Mrecord";
$.style="filled";
$.fillcolor="#E1F4FF";
// maybe redo $.label
}
N[_type=="subprocess"]{
$.shape="record";
$.style="filled";
$.color="#FFFFFF";
$.fillcolor="#A5A5A5";
$.label=sprintf("|%s|", $.label); // embed in pipes
}
N[_type=="database"]{
$.shape="cylinder";
$.color="#18589A";
}
N[_type=="inputoutput"]{
$.shape="polygon";
$.style='filled';
$.fontcolor="white",
$.ixedsize="true";
$.skew="0.3";
$.margin="0";
$.width="2";
}
N[_type=="file"]{
$.shape="folder";
}
N[_type=="external"]{
$.shape="box3d";
}
Produces:
There may currently be problems with gvpr on Windows, but I know the development team is working on it
Here is the command line:
gvpr -c -f predefined.gvpr predefined2.gv | dot -Tpng > predefined2.png
Okay, so I figured it out. I didn't realize you could do this...but apparently you can break up a node definition into multiple parts...so this is what I came up with, which solves my problem...
I have a "Styles" section that goes at the top. Here I can define each node style. I use comments as a way of naming them. And I don't need to copy paste, because I can just define multiple nodes as a comma separated list.
I also found that you can put them into subgraphs as well, like subgraph style_file {...}. But it seemed simpler to just use a comment as a way to name the style.
digraph G {
newrank=true;
///////////////////////////////////////////////////////////
// Styles
///////////////////////////////////////////////////////////
node [color="#4271C6"];
edge [color="#4271C6"];
//process
createfile, uploadfile
[shape=Mrecord, style=filled, fillcolor="#E1F4FF"];
//subprocess
exportfile, wait
[shape=record, style=filled, color="#FFFFFF", fillcolor="#A5A5A5"];
//external
ftp
[shape=box3d];
//datastore
database
[shape=cylinder, color="#18589A"];
//io
exportproc
[shape=polygon, style=filled, fontcolor=white, margin=0, width=3.1, fixedsize=true, skew=0.3];
//file
workfile
[shape=folder];
///////////////////////////////////////////////////////////
// Clusters
///////////////////////////////////////////////////////////
subgraph cluster_0 {
createfile [label="{1. Process\l | Create file}"];
exportfile [label="|Export Data\nfrom DB|"];
database [label="Database"];
exportproc [label="Export Data"];
workfile [label="Generated file\n(Archived on server)"];
}
subgraph cluster_1 {
uploadfile [label="{2. Process\l | Upload file}"];
ftp [label="FTP Server"];
wait [label="|Wait for\nresponse file|"];
}
///////////////////////////////////////////////////////////
// Relationships
///////////////////////////////////////////////////////////
{
rank=same;
createfile;
uploadfile;
}
///////////////////////////////////////////////////////////
// Relationships
///////////////////////////////////////////////////////////
# cluster_0
createfile -> exportfile;
exportfile -> database;
database -> exportproc;
exportproc -> workfile [style=dashed];
workfile -> uploadfile;
# cluster_1
uploadfile -> ftp [style=dashed];
ftp -> wait;
}
Which produces this:
No affiliation, but the Excel to Graphviz application can create re-usable styles as can be seen in this screenshot:
This question already has an answer here:
Distribute nodes on the same rank of a wide graph to different lines
(1 answer)
Closed 4 years ago.
I've the following graph :
digraph G {
user1 -> SuperUser
user2 -> SuperUser
user3 -> SuperUser
user4 -> SuperUser
user5 -> SuperUser
user6 -> SuperUser
user7 -> SuperUser
user8 -> SuperUser
user9 -> SuperUser
user10 -> SuperUser
user11 -> SuperUser
user12 -> SuperUser
user13 -> SuperUser
}
And I render it using :
$ dot -Tpng test_dot -o test_dot.png
Does a way exists to avoid a render too much horizontal like that ?
I know that I could use rankdir = LR but my problem would be the same I thought
I Want an organisation on more than one level, is it possible ?
Edit: the answer of tk421 is good but I forgot to add that my graph is pretty big and has an unpredictable size so the solution can't be "manual"
Yes. You can use rank and invisible links (style = invis) to create levels like so:
digraph G {
user1 -> SuperUser
user2 -> SuperUser
user3 -> SuperUser
user4 -> SuperUser
user5 -> SuperUser
user6 -> SuperUser
user7 -> SuperUser
user8 -> SuperUser
user9 -> SuperUser
user10 -> SuperUser
user11 -> SuperUser
user12 -> SuperUser
user13 -> SuperUser
user5 -> user4 [ style = invis ];
user9 -> user10 [ style = invis ];
{ rank = same; user1; user2; user3; user4 }
{ rank = same; user5; user6; user7; user8; user9 }
{ rank = same; user10; user11; user12; user13}
}
This would produce:
Of course, you could play around with this to get it to look how you want.
There are also other layout style tools as part of the graphviz package.
For example, if you want a more circular graph, you can use twopi instead of dot.
$ twopi -Granksep=2 sample.dot -o twopi.png
Refer to Graphviz's Documentation for more information.
I'm using spark 2.1 on yarn cluster. I have a RDD that contains data I would like to complete based on other RDDs (which correspond to different mongo databases that I get through https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage, but I don't think that is important, just mention it in case)
My problem is that the RDD I have to use to complete data depends on data itself because data contain the database to use. Here is a simplified exemple of what I have to do :
/*
* The RDD which needs information from databases
*/
val RDDtoDevelop = sc.parallelize(Array(
Map("dbName" -> "A", "id" -> "id1", "other data" -> "some data"),
Map("dbName" -> "C", "id" -> "id6", "other data" -> "some other data"),
Map("dbName" -> "A", "id" -> "id8", "other data" -> "some other other data")))
.cache()
/*
* Artificial databases for the exemple. Actually, mongo-hadoop is used. https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage
* This means that generate these RDDs COSTS so we don't want to generate all possible RDDs but only needed ones
*/
val A = sc.parallelize(Array(
Map("id" -> "id1", "data" -> "data1"),
Map("id" -> "id8", "data" -> "data8")
))
val B = sc.parallelize(Array(
Map("id" -> "id1", "data" -> "data1bis"),
Map("id" -> "id5", "data" -> "data5")
))
val C = sc.parallelize(Array(
Map("id" -> "id2", "data" -> "data2"),
Map("id" -> "id6", "data" -> "data6")
))
val generateRDDfromdbName = Map("A" -> A, "B" -> B, "C" -> C)
and the wanted output is :
Map(dbName -> A, id -> id8, other data -> some other other data, new data -> data8)
Map(dbName -> A, id -> id1, other data -> some data, new data -> data1)
Map(dbName -> C, id -> id6, other data -> some other data, new data -> data6)
Since nested RDDs are not possible, I would like to find the best way to use as possible as I can for Spark paralellism. I thought about 2 solutions.
First is creating a collection with the contents of the needed db, then convert it to RDD to benefit of RDD scalability (if the collection doesn't fit into driver memory, I could do it in several times). Finally do a join and filter the content on id.
Second is get the RDDs from all needed databases, key them by dbname and id and then do the join.
Here is the scala code :
Solution 1
// Get all needed DB
val dbList = RDDtoDevelop.map(map => map("dbName")).distinct().collect()
// Fill a list with key value pairs as (dbName,db content)
var dbContents = List[(String,Array[Map[String,String]])]()
dbList.foreach(dbName => dbContents = (dbName,generateRDDfromdbName(dbName).collect()) :: dbContents)
// Generate a RDD from this list to benefit to advantages of RDD
val RDDdbs = sc.parallelize(dbContents)
// Key the initial RDD by dbName and join with the contents of dbs
val joinedRDD = RDDtoDevelop.keyBy(map => map("dbName")).join(RDDdbs)
// Check for matched ids between RDD data to develop and dbContents
val result = joinedRDD.map({ case (s,(maptoDeveleop,content)) => maptoDeveleop + ("new data" -> content.find(mapContent => mapContent("id") == maptoDeveleop("id")).get("data"))})
Solution 2
val dbList = RDDtoDevelop.map(map => map("dbName")).distinct().collect()
// Create the list of the database RDDs keyed by (dbName, id)
var dbRDDList = List[RDD[((String,String),Map[String,String])]]()
dbList.foreach(dbName => dbRDDList = generateRDDfromdbName(dbName).keyBy(map => (dbName,map("id"))) :: dbRDDList)
// Create a RDD containing all dbRDD
val RDDdbs = sc.union(dbRDDList)
// Join the initial RDD based on the key with the dbRDDs
val joinedRDD = RDDtoDevelop.keyBy(map => (map("dbName"), map("id"))).join(RDDdbs)
// Reformate the result
val result = joinedRDD.map({ case ((dbName,id),(maptoDevelop,dbmap)) => maptoDevelop + ("new data" -> dbmap("data"))})
Both of them give the wanted output. To my mind, second one seems better since the match of the db and of the id use the paralellism of Spark, but I'm not sure of that. Could you please help me to choose the best, or even better, give me clues for a better solution than mines.
Any other comment is appreciated ( It's my first question on the site ;) ).
Thanks by advance,
Matt
I would suggest you to convert your RDDs to dataframes and then joins, distinct and other functions that you would want to apply to the data would be very easy.
Dataframes are distributed and with addition to dataframe apis, sql queries can be used. More information can be found in Spark SQL, DataFrames and Datasets Guide and Introducing DataFrames in Apache Spark for Large Scale Data Science Moreover your need of foreach and collect functions which makes your code run slow won't be needed.
Example to convert RDDtoDevelop to dataframe is as below
val RDDtoDevelop = sc.parallelize(Array(
Map("dbName" -> "A", "id" -> "id1", "other data" -> "some data"),
Map("dbName" -> "C", "id" -> "id6", "other data" -> "some other data"),
Map("dbName" -> "A", "id" -> "id8", "other data" -> "some other other data")))
.cache()
Converting the above RDD to dataFrame
val developColumns=RDDtoDevelop.take(1).flatMap(map=>map.keys)
val developDF = RDDtoDevelop.map{value=>
val list=value.values.toList
(list(0),list(1),list(2))
}.toDF(developColumns:_*)
And the dataFrame looks as below
+------+---+---------------------+
|dbName|id |other data |
+------+---+---------------------+
|A |id1|some data |
|C |id6|some other data |
|A |id8|some other other data|
+------+---+---------------------+
Coverting your A rdd to dataframe is as below
Source code for A:
val A = sc.parallelize(Array(
Map("id" -> "id1", "data" -> "data1"),
Map("id" -> "id8", "data" -> "data8")
))
DataFrame code for A :
val aColumns=A.take(1).flatMap(map=>map.keys)
val aDF = A.map{value =>
val list=value.values.toList
(list(0),list(1))
}.toDF(aColumns:_*).withColumn("name", lit("A"))
A new column name is added with database name to have the correct join at the end with developDF.
Output for DataFrame A:
+---+-----+----+
|id |data |name|
+---+-----+----+
|id1|data1|A |
|id8|data8|A |
+---+-----+----+
You can convert B and C in similar ways.
Source for B:
val B = sc.parallelize(Array(
Map("id" -> "id1", "data" -> "data1bis"),
Map("id" -> "id5", "data" -> "data5")
))
DataFrame for B :
val bColumns=B.take(1).flatMap(map=>map.keys)
val bDF = B.map{value =>
val list=value.values.toList
(list(0),list(1))
}.toDF(bColumns:_*).withColumn("name", lit("B"))
Output for B :
+---+--------+----+
|id |data |name|
+---+--------+----+
|id1|data1bis|B |
|id5|data5 |B |
+---+--------+----+
Source for C:
val C = sc.parallelize(Array(
Map("id" -> "id2", "data" -> "data2"),
Map("id" -> "id6", "data" -> "data6")
))
DataFrame code for C:
val cColumns=C.take(1).flatMap(map=>map.keys)
val cDF = C.map{value =>
val list=value.values.toList
(list(0),list(1))
}.toDF(cColumns:_*).withColumn("name", lit("C"))
Output for C:
+---+-----+----+
|id |data |name|
+---+-----+----+
|id2|data2|C |
|id6|data6|C |
+---+-----+----+
After the conversion, A, B and C can be merged using union
var unionDF = aDF.union(bDF).union(cDF)
Which would be
+---+--------+----+
|id |data |name|
+---+--------+----+
|id1|data1 |A |
|id8|data8 |A |
|id1|data1bis|B |
|id5|data5 |B |
|id2|data2 |C |
|id6|data6 |C |
+---+--------+----+
Then its just joining the developDF and unionDF after renaming of id column of unionDF for dropping it later on.
unionDF = unionDF.withColumnRenamed("id", "id1")
unionDF = developDF.join(unionDF, developDF("id") === unionDF("id1") && developDF("dbName") === unionDF("name"), "left").drop("id1", "name")
Finally we have
+------+---+---------------------+-----+
|dbName|id |other data |data |
+------+---+---------------------+-----+
|A |id1|some data |data1|
|C |id6|some other data |data6|
|A |id8|some other other data|data8|
+------+---+---------------------+-----+
You can do the needful after that.
Note : lit function would work with following import
import org.apache.spark.sql.functions._
I'm trying to redefine an option of the PlotLegends package after having loaded it,
but I get for example
Needs["PlotLegends`"]
SetOptions[ListPlot,LegendPosition->{0,0.5}]
=> SetOptions::optnf: LegendPosition is not a known option for ListPlot.
I expect such a thing as the options in the PlotLegends package aren't built-in to Plot and ListPlot.
Is there a way to redefine the default options of the PlotLegends package?
The problem is not really in the defaults for PlotLegends`. To see it, you should inspect the ListPlot implementation:
In[28]:= Needs["PlotLegends`"]
In[50]:= DownValues[ListPlot]
Out[50]=
{HoldPattern[ListPlot[PlotLegends`Private`a:PatternSequence[___,
Except[_?OptionQ]]|PatternSequence[],PlotLegends`Private`opts__?OptionQ]]:>
PlotLegends`Private`legendListPlot[ListPlot,PlotLegends`Private`a,
PlotLegend/.Flatten[{PlotLegends`Private`opts}],PlotLegends`Private`opts]
/;!FreeQ[Flatten[{PlotLegends`Private`opts}],PlotLegend]}
What you see from here is that options must be passed explicitly for it to work, and moreover, PlotLegend option must be present.
One way to achieve what you want is to use my option configuration manager, which imitates global options by passing local ones. Here is a version where option-filtering is made optional:
ClearAll[setOptionConfiguration, getOptionConfiguration, withOptionConfiguration];
SetAttributes[withOptionConfiguration, HoldFirst];
Module[{optionConfiguration}, optionConfiguration[_][_] = {};
setOptionConfiguration[f_, tag_, {opts___?OptionQ}, filterQ : (True | False) : True] :=
optionConfiguration[f][tag] =
If[filterQ, FilterRules[{opts}, Options[f]], {opts}];
getOptionConfiguration[f_, tag_] := optionConfiguration[f][tag];
withOptionConfiguration[f_[args___], tag_] :=
f[args, Sequence ## optionConfiguration[f][tag]];
];
To use this, first define your configuration and a short-cut macro, as follows:
setOptionConfiguration[ListPlot,"myConfig", {LegendPosition -> {0.8, -0.8}}, False];
withMyConfig = Function[code, withOptionConfiguration[code, "myConfig"], HoldAll];
Now, here you go:
withMyConfig[
ListPlot[{#, Sin[#]} & /# Range[0, 2 Pi, 0.1], PlotLegend -> {"sine"}]
]
LegendsPosition works in ListPlot without problems (for me at least). You don't happen to have forgotten to load the package by using Needs["PlotLegends"]`?
#Leonid, I added the possibility to setOptionConfiguration to set default options to f without having to use a short-cut macro.
I use the trick exposed by Alexey Popkov in What is in your Mathematica tool bag?
Example:
Needs["PlotLegends`"];
setOptionConfiguration[ListPlot, "myConfig", {LegendPosition -> {0.8, -0.8}},SetAsDefault -> True]
ListPlot[{#, Sin[#]} & /# Range[0, 2 Pi, 0.1], PlotLegend -> {"sine"}]
Here is the implementation
Options[setOptionConfiguration] = {FilterQ -> False, SetAsDefault -> False};
setOptionConfiguration[f_, tag_, {opts___?OptionQ}, OptionsPattern[]] :=
Module[{protectedFunction},
optionConfiguration[f][tag] =
If[OptionValue#FilterQ, FilterRules[{opts},
Options[f]]
,
{opts}
];
If[OptionValue#SetAsDefault,
If[(protectedFunction = MemberQ[Attributes[f], Protected]),
Unprotect[f];
];
DownValues[f] =
Union[
{
(*I want this new rule to be the first in the DownValues of f*)
HoldPattern[f[args___]] :>
Block[{$inF = True},
withOptionConfiguration[f[args], tag]
] /; ! TrueQ[$inF]
}
,
DownValues[f]
];
If[protectedFunction,
Protect[f];
];
];
];
How to create a Button which will be displayed only when the value of some global FrontEnd setting is False and will self-destruct with entire row of the Column after pressing it setting this value to True?
I need something like this:
Column[{"Item 1", "Item 2",
Dynamic[If[
Last#Last#Options[$FrontEnd, "VersionedPreferences"] === False,
Button["Press me!",
SetOptions[$FrontEnd, "VersionedPreferences" -> True]],
Sequence ## {}]]}]
But with this code the Button does not disappear after pressing it. Is it possible to make it self-destructive?
The final solution based on ideas by belisarius and mikuszefski:
PreemptProtect[SetOptions[$FrontEnd, "VersionedPreferences" -> False];
b = True];
Dynamic[Column[
Join[{"Item 1", "Item 2"},
If[Last#Last#Options[$FrontEnd, "VersionedPreferences"] === False &&
b == True, {Button[
Pane[Style[
"This FrontEnd uses shared preferences file. Press this \
button to set FrontEnd to use versioned preferences file (all the \
FrontEnd settings will be reset to defaults).", Red], 300],
AbortProtect[
SetOptions[$FrontEnd, "VersionedPreferences" -> True];
b = False]]}, {}]], Alignment -> Center],
Initialization :>
If[! Last#Last#Options[$FrontEnd, "VersionedPreferences"], b = True,
b = False]]
The key points are:
introducing additional Dynamic variable b and binding it with the value of Options[$FrontEnd, "VersionedPreferences"],
wrapping entire Column construct
with Dynamic instead of using
Dynamic inside Column.
Perhaps
PreemptProtect[SetOptions[$FrontEnd, "VersionedPreferences" -> False]; b = True];
Column[{"Item 1", "Item 2", Dynamic[
If[Last#Last#Options[$FrontEnd, "VersionedPreferences"]===False && b == True,
Button["Here!", SetOptions[$FrontEnd, "VersionedPreferences"->True];b=False],
"Done"]]}]
Edit
Answering your comment. Please try the following. Encompassing the Column[ ] with Dynamic[ ] allows resizing it:
PreemptProtect[SetOptions[$FrontEnd, "VersionedPreferences" -> False]; b = True];
Dynamic[
Column[{
"Item 1",
"Item 2",
If[Last#Last#Options[$FrontEnd, "VersionedPreferences"] === False && b == True,
Button["Press me!", SetOptions[$FrontEnd, "VersionedPreferences" -> True]; b=False],
Sequence ## {}]}]]
Hmm, dunno if I get it right, but maybe this:
x = True;
Dynamic[Column[{Button["reset", x = True],
If[x, Button["Press me", x = False]]}]
]