XQuery - Retrieve only a part of information - filter

I work with XQuery to do statistics. I have one document like this :
<tr>
<td>Element 1</td>
<td>100</td>
</tr>
<tr>
<td>Element 2</td>
<td>80</td>
</tr>
<tr>
<td>Element 3</td>
<td>40</td>
</tr>
<tr>
<td>Element 4</td>
<td>12</td>
</tr>
<tr>
<td>Element 5</td>
<td>8</td>
</tr>
And want to retrieve only a part of this document : I want to have only 80% of the sum of the number of element (The Pareto distribution in fact).
In this case, I have a total of 240 elements. I want in my output the first elements so that the total of the elements is 192 (240*80/100).
In this example, the ideal output would have only the first three elements, like this :
<tr>
<td>Element 1</td>
<td>100</td>
</tr>
<tr>
<td>Element 2</td>
<td>80</td>
</tr>
<tr>
<td>Element 3</td>
<td>40</td>
</tr>
I hope I'm clear :s. I am looking for long time, without success, I don't find how to do...
Thank so much

Use:
for $total in sum(/*/*/td[2]),
$pareto in $total*80 div 100,
$i in 1 to count(/*/*)
return
if(sum(/*/*[position() le $i]/td[2]) ge $pareto
and
sum(/*/*[position() lt $i]/td[2]) lt $pareto
)
then /*/*[position() le $i]
else ()
When this XPath expression (yes this is an XQuery expression that is also an XPath 2.0 expression) is evaluated against the provided XML (wrapped into a single tope element to be made a well-formed XML document):
<table>
<tr>
<td>Element 1</td>
<td>100</td>
</tr>
<tr>
<td>Element 2</td>
<td>80</td>
</tr>
<tr>
<td>Element 3</td>
<td>40</td>
</tr>
<tr>
<td>Element 4</td>
<td>12</td>
</tr>
<tr>
<td>Element 5</td>
<td>8</td>
</tr>
</table>
the wanted, correct result is produced:
<tr>
<td>Element 1</td>
<td>100</td>
</tr>
<tr>
<td>Element 2</td>
<td>80</td>
</tr>
<tr>
<td>Element 3</td>
<td>40</td>
</tr>

Related

Correct mrtg cfgmaker file

mrtg cfgmaker does read incorrect values over SNMP V1 and V2 and I need to correct the resulting file.
I would like to run a script after creation and use sed if possible.
Lines that needs to be corrected in my case are for LAG's and normal ports:
MaxBytes[switch01_lag_26]: 125000000 should go to MaxBytes[switch01_lag_26]: 250000000
(switch01_lag_26 can be switch01_lag_1 until switch01_lag_26)
MaxBytes[switch01_g1]: 12500000 should go to MaxBytes[switch01_g1]: 125000000
(switch01_g1 can be switch01_g1 until switch01_g16)
What sed patterns I have to use to analyze if its a lag or port in the square brackets and then replace the number after the : ?
The html part should show the correct speed if possible too, this is original for port g1:
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>12.5 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
and should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
This is original for LAG 1:
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
which should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>250.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
I can change all speeds in HTML using sed -i 's/\([0-9.]\+\) MBytes/125.0 MBytes/' /switch01.cfg but this changes for LAG's too. How to detect if the HTML part belongs to a LAG?

Using contains returns too many results

In the html below, I'm trying to get the two nodes that contain values for shipment_number, but instead I get 6 <td> nodes - why? Doesn't contains limit the nodes to only those that match the text value? If so the statement below should only return two, not six?
In Chrome dev console:
$x("//tr//td[contains(.,'shipment number')]/following::td[1]")
html:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/16/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_222_florida-35</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0630</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>sue smith</td>
</tr>
<tr>
<td>box type</td>
<td>rect</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>33.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>1.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>27.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>299.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
You need
//tr//td[contains(text(),'shipment number')]/following::td[1]
That's because contains(., '...') converts . to string by expanding all its text descendants, not just children.
I'm adding this answer because text() node test might conflict with others requirements, mainly those dealing with inline markup.
The reason because you are getting six td elements is that there is six td having "shipment number" as part of theirs string value (concatenation of all descendant text nodes). And that is because you have nested tables, thus nested td elements. So, you want a td element not having a descendant td element.
The expression:
//tr//td[not(.//td)][contains(.,'shipment number')]/following::td[1]
It selects:
<td>abc_123_florida-45</td>
<td>abc_222_florida-35</td>
Check in http://www.xpathtester.com/xpath/37bd889231ad68bb7bfa377433aeca00
Do note that your input sample has a default namespace declaration with the namespace URI http://www.w3.org/1999/xhtml. Because niether your code sample nor your selected answer are ussing namespaces, I asume you know how to work with them.

Select input field according to TH and TR

I have a next table with TH:
<table cellspacing="1" border="1" id="FinancialsGrid">
<thead>
<tr>
<th>Product</th>
<th>Slice</th>
<th>Units</th>
<th>Accrual Rate</th>
<th>Trend Factor</th>
<th>Base Units</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">Lorem Ipsum</td>
<td>Previous</td>
<td>6,866</td>
<td>0.00 %</td>
<td>0.00 %</td>
<td>6,866</td>
</tr>
<tr>
<td>Current</td>
<td>6,866</td>
<td>0.00 %</td>
<td>0.00 %</td>
<td>6,866</td>
</tr>
<tr>
<td>Proposed</td>
<td>6,866</td>
<td><input type="text" style="width:60px;" value="0.00 %"></td>
<td style="width:60px;"><input type="text" style="width:60px;" value="0.00 %"></td>
<td style="width:60px;"><input type="text" style="width:90px;" value="6,866"></td>
</tr>
</tbody>
</table>
I need to select an xpath for input field according to e.g. "Trend Factor" column.
My variant which I wrote doesn't work:
//table[#id='FinancialsGrid']/tbody/tr/td/input[count(//table/thead/tr/th[.='Trend Factor'])]
Table view:
Here is the XPath
(//table[#id='FinancialsGrid']//input)[count(//tr/th)-index-of(//tr/th, //tr/th[text()='Trend Factor']) + 1]
or another one without index-of() function (for Firebug):
(//table[#id='FinancialsGrid']//input)[count(//tr/th)-count(//tr/th[text()='Trend Factor']/preceding-sibling::*)]
Will grab required input node, where:
count(//tr/th) is count of columns
index-of(//tr/th, //tr/th[text()='Trend Factor']) - current number of column
(//table[#id='FinancialsGrid']//input) - sequence of input nodes.
So we calculating position of input node and grubbing node from sequence.

how to avoid double borders in HTML graphviz

I have the following simple
Node in a graph:
digraph "graph.svg" {
graph [bgcolor="#333333" fontcolor=white fontname=Helvetica fontsize=16 label="Title" rankdir=TB]
0 [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="2" BGCOLOR="#006699">
<TR>
<TD COLSPAN="2">Node Titel</TD>
</TR>
<TR>
<TD COLSPAN="2">Sieve</TD>
</TR>
<TR>
<TD CELLPADDING="0">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" BGCOLOR="#006699">
<TR>
<TD BORDER="1">in 1</TD>
</TR>
<TR>
<TD BORDER="1">in 2</TD>
</TR>
</TABLE>
</TD>
<TD CELLPADDING="0">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" BGCOLOR="#006699">
<TR>
<TD BORDER="1">out 1</TD>
</TR>
<TR>
<TD BORDER="1">out 2</TD>
</TR>
<TR>
<TD BORDER="1">out 3</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>> shape=plaintext]
}
Which produces this output:
How can I make the borders align such that no double borders appear anywhere between the nested tables?
I managed to fiddle around with the CELLSPADING=-1
but I don't think that is the way to go?
I cannot use the COLSPAN option because the inputs and outputs ports are variable in size, that's why I solved this with a nested table for both input and output cells.
you were near there
digraph "graph.svg" {
graph [bgcolor="#333333" fontcolor=white fontname=Helvetica fontsize=16 label="Title" rankdir=TB]
0 [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="2" BGCOLOR="#006699">
<TR>
<TD COLSPAN="2">Node Titel</TD>
</TR>
<TR>
<TD COLSPAN="2">Sieve</TD>
</TR>
<TR>
<TD CELLPADDING="0" BORDER="0">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" BGCOLOR="#006699">
<TR>
<TD BORDER="1">in 1</TD>
</TR>
<TR>
<TD BORDER="1">in 2</TD>
</TR>
</TABLE>
</TD>
<TD CELLPADDING="0" BORDER="0">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" BGCOLOR="#006699">
<TR>
<TD BORDER="1">out 1</TD>
</TR>
<TR>
<TD BORDER="1">out 2</TD>
</TR>
<TR>
<TD BORDER="1">out 3</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>> shape=plaintext]
}

Trying to find XPath for multiple TDs

I want to extract the Address for specific Numbers (the first TD) of this table. The only unique identifier for the table is the H3.
Here is the code for the table:
<table width="95%" cellpadding=5 cellspacing=0 border=1>
<tr><td colspan="4"><h3>The list</td></tr>
<tr>
<td>Number</td><td>First Name</td>
<td>Last Name</td><td>Address</td>
</tr>
I have tried:
//table[#h3=’See this now’]/’tr/td[87] and td[107] and td[116]
I am new to xpath, and programming in general. It's pretty fun, but would love to be able to figure this one out!! Appreciate any help :D
First, your HTML is wrong.
You did not close your Table element.
You did not close your H3 element.
You must enclose your attributes in quotes.
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h3>The list</h3>
</td>
</tr>
<tr>
<td>Number</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
Once you have fixed the formatting of your XHTML. You can traverse the document tree.
XPATH
Any table, with any td that has a h3.
//table//td/h3
Will return
<h3>The list</h3>
For the number
//table//tr[2]/td[1] <-- any table, the second tr element in this table, the first td in that second tr
Will return
<td>Number</td>
So if we add multiple tables to a document and you want to find multiple results for each element in any table, this is quite simple. Say we have a XHTML document with many tables inside a parent element, for example 'root' element.
<root>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h3>The list</h3>
</td>
</tr>
<tr>
<td>123</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h3>The list</h3>
</td>
</tr>
<tr>
<td>456</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h3>The list</h3>
</td>
</tr>
<tr>
<td>789</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
</root>
We can extract the number of the first table data in each second row in every table using the following XPATH expression:
//table/tr[2]/td[1]
This will give us the result of
<td>123</td>
-----------------------
<td>456</td>
-----------------------
<td>789</td>
Now, say we have several tables, but only one table is very important to us, the table must have a H3 element, no other element is important to us, and if this table has a H3 element, we want to extract the second rows first td.
<root>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h4>Ignore me!</h4>
</td>
</tr>
<tr>
<td>1164961564896</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h1>I'm not interesting</h1>
</td>
</tr>
<tr>
<td>456456466465</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
<table width="95%" cellpadding="5" cellspacing="0" border="1">
<tr>
<td colspan="4">
<h3>IM THE IMPORTANT TABLE!</h3>
</td>
</tr>
<tr>
<td>123456789</td>
<td>First Name</td>
<td>Last Name</td>
<td>Address</td>
</tr>
</table>
</root>
We can acomplish this by traversing back up the tree if we are successful in finding the H3 element, then go to the next tr.
//table//h3/../../../tr/td[1]
Will return
<td colspan="4">
<h3>IM THE IMPORTANT TABLE!</h3>
</td>
-----------------------
<td>123456789</td>

Resources