graphviz dot flow chart table-labels place and sometimes truncated - graphviz

I've created a quite big flow diagram. Some of the edge-labels (rendered as tables) have these problems:
the text in some table cells ends up outside the table cell
the table sometimes crosses the edge
when the flow diagram is rendered as a PNG image (which is my desired output),
then some parts of these tables are outside the image area
The idea of this graph is to have a horizontal timeline, with "column nodes" happening at the same time (or close together in the timeline). So to enforce this "time flow" I ended up using rankdir="LR"; along with {rank=same; my_first_node; my_second_node; }.
How do I make those "table labels" a bit better rendered? Like not crossing the edges, having the text completely inside their table cell, seeing the full graph when exporting to PNG?
I generate the PNG output image with this command: dot -Tpng foo.dot -o foo.png, see below the "table label" issues:
digraph my_flow {
// global graph conf
rankdir="LR"; // orziontal
nodesep=0.9;
// shared conf
edge [ fontname="Courier New", fontsize=20];
node [ fontname=Helvetica, fontsize=26, style="rounded,filled", nojustify=true];
// many different node "classes"
node[shape=doublecircle, color=navajowhite]
my_first_node; my_second_node;
node[shape=rect, color=aquamarine2]
first_std_horiz_node; second_std_horiz_node;
// custom configuration for each node
first_std_horiz_node[label="First \l std \l horizontal \l node"]
second_std_horiz_node[label="Second \l std \l horizontal \l node"]
my_first_node[label="My \l first \l node"]
my_second_node[label="My \l second \l node"]
// sets of nodes in the same "column"
{rank=same; my_first_node; my_second_node; }
first_std_horiz_node -> second_std_horiz_node
second_std_horiz_node -> my_first_node
my_first_node -> my_second_node [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
<TR><TD BGCOLOR="gray">action type 1</TD></TR>
<TR><TD>action 1 very very very very long description</TD></TR>
<TR><TD BGCOLOR="gray">action type 2</TD></TR>
<TR><TD>action X</TD></TR>
<TR><TD>action Y</TD></TR>
<TR><TD BGCOLOR="gray">action type 3</TD></TR>
<TR><TD>action A</TD></TR>
<TR><TD>action B</TD></TR>
<TR><TD>action C</TD></TR>
<TR><TD BGCOLOR="gray">action type 4</TD></TR>
<TR><TD>action Q</TD></TR>
<TR><TD>action W</TD></TR>
</TABLE>>];
}

If you put your table in a node rather than an edge label, things look better; and using the HTML Tag <BR/>, you can break lines in the table. Editing your code accordingly, I come up with
digraph my_flow {
// global graph conf
rankdir="LR"; // horizontal
nodesep=0.9;
// shared conf
node [ fontname=Helvetica, fontsize=26, style="rounded,filled", nojustify=true];
// node instead of edge label
my_table[ shape=none, margin=0, fontname="Courier New", fontsize=20, label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
<TR><TD BGCOLOR="gray">action type 1</TD></TR>
<TR><TD BGCOLOR="white">action 1<BR/>very very very very<BR/>long description</TD></TR>
<TR><TD BGCOLOR="gray">action type 2</TD></TR>
<TR><TD BGCOLOR="white">action X</TD></TR>
<TR><TD BGCOLOR="white">action Y</TD></TR>
<TR><TD BGCOLOR="gray">action type 3</TD></TR>
<TR><TD BGCOLOR="white">action A</TD></TR>
<TR><TD BGCOLOR="white">action B</TD></TR>
<TR><TD BGCOLOR="white">action C</TD></TR>
<TR><TD BGCOLOR="gray">action type 4</TD></TR>
<TR><TD BGCOLOR="white">action Q</TD></TR>
<TR><TD BGCOLOR="white">action W</TD></TR>
</TABLE>> ]
// many different node "classes"
node[shape=doublecircle, color=navajowhite]
my_first_node; my_second_node;
node[shape=rect, color=aquamarine2]
first_std_horiz_node; second_std_horiz_node;
// custom configuration for each node
first_std_horiz_node[label="First \l std \l horizontal \l node"]
second_std_horiz_node[label="Second \l std \l horizontal \l node"]
my_first_node[label="My \l first \l node"]
my_second_node[label="My \l second \l node"]
// sets of nodes in the same "column"
{rank=same; my_first_node; my_table; my_second_node; }
first_std_horiz_node -> second_std_horiz_node -> my_first_node;
my_first_node -> my_table[ dir = none ];
my_table -> my_second_node;
}
which yields
EDIT
After the revisions in the table code, it is also possible to use the table as a label; for easier reference here the full code again:
digraph my_flow {
// global graph conf
rankdir="LR"; // horizontal
nodesep=0.9;
// shared conf
node [ fontname=Helvetica, fontsize=26, style="rounded,filled", nojustify=true];
// node instead of edge label
// many different node "classes"
node[shape=doublecircle, color=navajowhite]
my_first_node; my_second_node;
node[shape=rect, color=aquamarine2]
first_std_horiz_node; second_std_horiz_node;
// custom configuration for each node
first_std_horiz_node[label="First \l std \l horizontal \l node"]
second_std_horiz_node[label="Second \l std \l horizontal \l node"]
my_first_node[label="My \l first \l node"]
my_second_node[label="My \l second \l node"]
// sets of nodes in the same "column"
{rank=same; my_first_node; my_second_node; }
first_std_horiz_node -> second_std_horiz_node -> my_first_node;
my_first_node -> my_second_node[ fontname="Courier New", fontsize=20, label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
<TR><TD BGCOLOR="gray">action type 1</TD></TR>
<TR><TD BGCOLOR="white">action 1<BR/>very very very very<BR/>long description</TD></TR>
<TR><TD BGCOLOR="gray">action type 2</TD></TR>
<TR><TD BGCOLOR="white">action X</TD></TR>
<TR><TD BGCOLOR="white">action Y</TD></TR>
<TR><TD BGCOLOR="gray">action type 3</TD></TR>
<TR><TD BGCOLOR="white">action A</TD></TR>
<TR><TD BGCOLOR="white">action B</TD></TR>
<TR><TD BGCOLOR="white">action C</TD></TR>
<TR><TD BGCOLOR="gray">action type 4</TD></TR>
<TR><TD BGCOLOR="white">action Q</TD></TR>
<TR><TD BGCOLOR="white">action W</TD></TR>
</TABLE>> ];
}
which yields
In the given context I find the node solution preferable / cleaner, as it makes it clearer where the info in the table belongs to. But if there is more to it, the edge way will also work.

Related

XPATH start scraping after certain word

I am trying to get the location from this html using XPATH. So what I want to say is [in human terms] "when you see Location: grab the next piece of text then stop.
<td width="670">
<h1>Accor Vacation Club - SOLD</h1>
<h2>All Australia, Australia</h2>
<p class="property_number">Property ref: 002</p>
<h3 class="cl2">Description</h3><p class="xh-highlight">Resort: Accor Vacation Club. <br>Location: Australia. <br>Type of Ownership: Points. <br>Season: All. <br>Size of Unit: Studio. <br>Price: SOLD</p><p class="xh-highlight"> </p><p class="xh-highlight"><span style="font-size: 16pt">SOLD</span> </p>
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="photorealestate">
<tbody><tr>
I got this far but can't seem to isolate that word:
//p[./preceding-sibling::h3[contains(., 'Description')]]
//p/text()[./preceding-sibling::h3[contains(., 'Description')]]
If you need to get "Australia" as output you can use below expression
substring-after(//text()[starts-with(., 'Location')], 'Location: ')
This will select text node that starts with word "Location" and return sub-string preceded by "Location: "

xpath selecting text after a certain element or between elements

I'm trying to use xpath to select all the text within the elements:
between the h3 elements "Hay Point" and "Darymple Bay"
after h3 element "Darymlple Bay"
I've got this xpath syntax working which selects all the text within the td tags after <h3>Dalrymple Bay Coal Terminal</h3>.
.//h3[2]/following::td/text()
But I'm having trouble figuring out how to select all the text between the tags that fall between <h3>Hay Point Coal Terminal</h3> and <h3>Dalrymple Bay Coal Terminal</h3>
A sample of the structure of the html is below:
<h3>Hay Point Coal Terminal</h3>
<tr role="row" class="odd"><td headers="table06762r1c1" tabindex="0">July
</td><td style="text-align: left;"
headers="table06762r1c2">4,517,445</td>
<td headers="table06762r1c3">4,261,253</td>
<td headers="table06762r1c4">4,057,239</td>
<td headers="table06762r1c5">3,535,507</td>
</tr>
<h3>Dalrymple Bay Coal Terminal</h3>
<tr><td headers="table06762r1c1">July</td><td style="text-align: left;"
headers="table06762r1c2">5,462,591</td>
<td headers="table06762r1c3">5,625,700</td>
<td headers="table06762r1c4">5,816,977</td>
<td headers="table06762r1c5">5,396,644</td>
</tr>
If I understand your question correctly and given the html in the question, in order to get text nodes related to the <h3>Hay Point Coal Terminal</h3> node, try:
//h3[1]/following-sibling::tr[1]/td/text()
Output:
July
4,517,445
4,261,253
4,057,239
3,535,50
To get those related to the <h3>Dalrymple Bay Coal Terminal</h3> node, use:
//h3[2]/following-sibling::tr[1]/td/text()
or just
//h3[2]/following-sibling::tr/td/text()
Output:
July
5,462,591
5,625,700
5,816,977
5,396,644
To get both:
//h3/following-sibling::tr/td/text()
Assuming you want to group those you would do something like:
for h3 in response.css('h3'):
item = {
"h3": h3.css('*::text').extract()[0],
"tds": h3.css('* + tr td::text').extract()
}

Html Horizontal line error graphviz

I'm needing to create subgraph cluster have a label with line separation from nodes.
subgraph cluster_0{
label=< <B>process #1</B> <HR/> >
node [shape=none]
t1 [label="label1"]
t2 [label="label2"]
t3 [label="label 3"]
node [shape=box group=a style=filled fillcolor="red;.5:white" height=.2 label = "" ]
A [ fillcolor="red;0.3:white" ]
B [fillcolor="red;.9:white"]
C
node [shape=none fillcolor=white]
t11 [label="label1"]
t21 [label="label2"]
t31 [label="label 3"]
edge[style=invis];
A->B->C
t1->t2->t3
t11->t21->t31
}
Then I get in error on Syntax.
error stack
pydot.InvocationException: Program terminated with status: 1. stderr follows: Error: syntax error in line 1
... <HR/> ...
in label of graph cluster_0
My graphviz version is
dot - graphviz version 2.36.0 (20140111.2315)
On the graphviz web site, the page called "Node Shapes" contains a grammar (about half-way down) for html-like labels:
For <HR/>, it says:
rows : row
| rows row
| rows <HR/> row
This means that <HR/> is only allowed in between two rows. And rows are only allowed within a <TABLE>, so you'll have to wrap everything in a table and then it may work.
Depending on what exactly you'd like to achieve, an other possible solution might be to simply underline the label using <U>text</U>.

"<i>" interrupting correct node selection

I'm trying to select a table field with the following structure:
<td class='postac'>proszek do sporz. roztworu do wlewu <I>i.v.</I>
1,5 g
1 fiol. typu Monovial
</td>
After using xpath expression sel.xpath("//table[#class='table-postaci']/tbody/tr/td[2]/text()").extract() I get two values instead of one:
u'proszek do sporz. roztworu do wlewu ',
u'\r\n 1,5 g\r\n 1 fiol. typu Monovial\r\n '
Is it some clean xpath method to get this "td" field as a single value? I know I could get the field with //table[#class='table-postaci']/tbody/tr/td[2] and then strip the tags in the scrapy pipeline. However, I'm looking for some simplier solution. Thank you
You can loop over each table row tr and for each row join all text node descendants of the 2nd td cell:
In [13]: from scrapy.selector import Selector
In [14]: selector = Selector(text="""<table class='table-postaci'>
....: <thead><th>Nazwa preparatu</th><th>Postać i dawka</th><th>Producent</th><th>Cena 100%</th>
....: <th>Odpłatność po refundacji</th>
....: </thead>
....: <tbody>
....:
....: <tr>
....: <td class='postac'>Zinacef </td>
....: <td class='postac'>proszek do sporz. roztworu do wlewu <I>i.v.</I>
....: 1,5 g
....: 1 fiol. typu Monovial
....: </td>
....: <td>GlaxoSmithKline – Wielka Brytania</td>
....: <td class='cena'> b/d </td>
....: <td>
....: </td>
....: </tr>
....: <tr>
....: <td class='postac'>Zinacef </td>
....: <td class='postac'>proszek do sporz. roztworu do wlewu <I>i.v.</I>
....: 750 mg
....: 1 fiol. typu Monovial
....: </td>
....: <td>GlaxoSmithKline – Wielka Brytania</td>
....: <td class='cena'> b/d </td>
....: <td>
....: </td>
....: </tr>
....: </tbody>
....: </table""")
In [15]: selector.xpath('//table/tr')
Out[15]: []
In [16]: selector.xpath('//table//tr')
Out[16]:
[<Selector xpath='//table//tr' data=u'<tr><td class="postac">Zinacef </td>\n\t\t<'>,
<Selector xpath='//table//tr' data=u'<tr><td class="postac">Zinacef </td>\n\t\t<'>]
In [17]: for row in selector.xpath('//table//tr'):
....: print row.xpath('td[2]//text()').extract()
....:
[u'proszek do sporz. roztworu do wlewu ', u'i.v.', u'\n 1,5 g\n 1 fiol. typu Monovial\n ']
[u'proszek do sporz. roztworu do wlewu ', u'i.v.', u'\n 750 mg\n 1 fiol. typu Monovial\n ']
In [18]: [u''.join(row.xpath('td[2]//text()').extract()) for row in selector.xpath('//table//tr')]
Out[18]:
[u'proszek do sporz. roztworu do wlewu i.v.\n 1,5 g\n 1 fiol. typu Monovial\n ',
u'proszek do sporz. roztworu do wlewu i.v.\n 750 mg\n 1 fiol. typu Monovial\n ']
In [19]:
You should avoid /text() for exactly this reason. Usually you don't want the individual text nodes, you want the string value of the element, which you can get with the string() function. It's not clear what programming language you are calling XPath from, or whether it's XPath 1.0 or 2.0 - that will affect the detail, e.g. whether to get the string value of the element in the XPath expression or in the host language.
The td node in your question has three child nodes – first a text node with the contents:
proszek do sporz. roztworu do wlewu
second an I element node that has its own child text node, and last another text node with the contents:
\n 1,5 g\n 1 fiol. typu Monovial\n
Your query, the end of which looks like td[2]/text(), only selects the immediate text node children of the td element, so it doesn’t select the I element node or its text node child. The result is the two text nodes that you are seeing.
You could select all text node decedents of the td element using td[2]//text() (note the double slash //). This will return three text nodes in the result – the two as above and a third containing i.v. in between them. You could then join them outside XPath (I’m not familiar with scrapy so I can’t tell you how to that in this case).
As far as I know you can’t join the three nodes directly using XPath 1.0, but it might be possible with XPath 2.0.

how do i create random letters in selenium ide?

I'm not an expert in Selenium IDE, I want to declare an array in Selenium IDE HTML and call it in the next line.
<tr>
<td>storeEval</td>
<td>new Array('en','de','da','cs','fi','fr','it','ja','ko','nl','no','pl','pt','ru','sv','tr')</td>
<td>myArray</td>
</tr>
<tr>
<td>type</td>
<td>FieldName</td>
<td>${myArray}</td>
</tr>
Thanks
Code below will randomly select item from array and type it to element with id=FieldName:
<tr>
<td>storeEval</td>
<td>var chars = 'en de da cs fi fr it ja ko nl no pl pt ru sv tr'.split(' '); str = chars[Math.floor(Math.random() * chars.length)];</td>
<td>item</td>
</tr>
<tr>
<td>type</td>
<td>FieldName</td>
<td>${item}</td>
</tr>
To access item from your initial array (lets say second item), you can add one more command:
<tr>
<td>storeEval</td>
<td>new Array('en','de','da','cs','fi','fr','it','ja','ko','nl','no','pl','pt','ru','sv','tr')</td>
<td>myArray</td>
</tr>
<tr>
<td>getEval</td>
<td>storedVars['item'] = storedVars['myArray'][2]</td>
<td></td>
</tr>
<tr>
<td>type</td>
<td>FieldName</td>
<td>${item}</td>
</tr>
You can pass random int in range [0 .. length_of_array] to storedVars['myArray'][randomInt] to retrieve values randomly.

Resources