Using contains returns too many results - xpath

In the html below, I'm trying to get the two nodes that contain values for shipment_number, but instead I get 6 <td> nodes - why? Doesn't contains limit the nodes to only those that match the text value? If so the statement below should only return two, not six?
In Chrome dev console:
$x("//tr//td[contains(.,'shipment number')]/following::td[1]")
html:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/16/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_222_florida-35</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0630</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>sue smith</td>
</tr>
<tr>
<td>box type</td>
<td>rect</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>33.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>1.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>27.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>299.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>

You need
//tr//td[contains(text(),'shipment number')]/following::td[1]
That's because contains(., '...') converts . to string by expanding all its text descendants, not just children.

I'm adding this answer because text() node test might conflict with others requirements, mainly those dealing with inline markup.
The reason because you are getting six td elements is that there is six td having "shipment number" as part of theirs string value (concatenation of all descendant text nodes). And that is because you have nested tables, thus nested td elements. So, you want a td element not having a descendant td element.
The expression:
//tr//td[not(.//td)][contains(.,'shipment number')]/following::td[1]
It selects:
<td>abc_123_florida-45</td>
<td>abc_222_florida-35</td>
Check in http://www.xpathtester.com/xpath/37bd889231ad68bb7bfa377433aeca00
Do note that your input sample has a default namespace declaration with the namespace URI http://www.w3.org/1999/xhtml. Because niether your code sample nor your selected answer are ussing namespaces, I asume you know how to work with them.

Related

Correct mrtg cfgmaker file

mrtg cfgmaker does read incorrect values over SNMP V1 and V2 and I need to correct the resulting file.
I would like to run a script after creation and use sed if possible.
Lines that needs to be corrected in my case are for LAG's and normal ports:
MaxBytes[switch01_lag_26]: 125000000 should go to MaxBytes[switch01_lag_26]: 250000000
(switch01_lag_26 can be switch01_lag_1 until switch01_lag_26)
MaxBytes[switch01_g1]: 12500000 should go to MaxBytes[switch01_g1]: 125000000
(switch01_g1 can be switch01_g1 until switch01_g16)
What sed patterns I have to use to analyze if its a lag or port in the square brackets and then replace the number after the : ?
The html part should show the correct speed if possible too, this is original for port g1:
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>12.5 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
and should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
This is original for LAG 1:
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
which should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>250.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
I can change all speeds in HTML using sed -i 's/\([0-9.]\+\) MBytes/125.0 MBytes/' /switch01.cfg but this changes for LAG's too. How to detect if the HTML part belongs to a LAG?

Get a cell that is in a table before the current table

See html below. Have a series of tables that include rows with a name attribute name="laneStop". I can select those rows like this in the Chrome dev console
$x("/html[1]/body[1]//TR[#name='laneStop']")
However, I also need to get the 2nd cell of the 2nd row of the 1st table ABOVE these rows, eg. the value
abc_123_florida-45
Here is the html. Whats a way to refer to this value above - knowing that Im getting the "laneStop" rows first
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
Try the following xpath.
//td[text()='shipment number']/following::td[1]
Demo:
If you want to travel from your current node (i.e., the "laneStop" rows), one way to do that is to use this xpath expression:
./preceding-sibling::*/ancestor::*[6]/preceding-sibling::table[1]//tr[1]/td[1]/table[1]//td[1]//tr[2]/td[2]
I'm curious to see if it works for you.

How to get the table immediately previous to current table row

Say I get a list of rows like this
var table_stop_rows = (from r in doc.Descendants("TR").Cast<HtmlNode>()
where r.Attributes["name"]?.Value == "laneStop"
select r).ToList();
Now, for each of those "laneStop" rows, I want to refer back to the smaller table containing the "shipment_number" field and read its corresponding node value, eg "abc_123_florida-4". However, I cant simply get a list of all rows where there is a shipment_number, each one has to be in a table that precedes the "laneStop" row in the row collection I'm getting.
I suppose my question then is - if I have a collection of rows, can I then use an xpath statement relative to each row to get back to this shipment_number field in the table preceding?
Here is the html doc, note there would be dozens of these "table pairs". Since I can't control the structure of these files, I need a way to extract the data from the existing structure
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
Try this xpath expression:
(//tr[#name="laneStop"]/ancestor::table/preceding-sibling::table//tr[2]/td[2])[1]

Why Xpath 3.0 works, but Xquery 3.0 doesn't work with the same expression

I launched Xpath in Oxygen. In Xpath 3.0 found what i need but in Xquery 3.0 doesn't find.
This is my Xpath expression
//table[tbody/tr/th/p[contains(text(), 'All Water System Contacts')]]/tbody/tr[3]/td[1]
This is my xml code
I put part code.
<table border="1" cellpadding="1" cellspacing="1" summary="." width="640">
<tbody>
<tr>
<th colspan="3">
<p>All Water System Contacts </p></th>
</tr>
<tr>
<th>Type</th>
<th>Contact</th>
<th>Communication</th>
</tr>
<tr>
<td align="center">AC - Administrative Contact - GENERAL MANAGER </td>
<td align="center">GRANT, JOHN, W <br/> PO BOX 869<br/> BIG SPRING, TX 79721-0869 </td>
<td align="center">
<table border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse"
width="100%">
<tbody>
<tr>
<th><b>Electronic Type</b></th>
<th><b>Value</b></th>
</tr>
</tbody>
</table>
<table border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse"
width="100%">
<tbody>
<tr>
<th><b>Phone Type</b></th>
<th><b>Value</b></th>
</tr>
<tr>
<td align="center">BUS - Business</td>
<td align="center">432-267-6341 </td>
</tr>
<tr>
<td align="center">FAX - Facsimile</td>
<td align="center">432-267-3121 </td>
</tr>
<tr>
<td align="center">BUS - Business</td>
<td align="center">432-267-6070 </td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td align="center">OW - Owner </td>
<td align="center">COLORADO RIVER MUNICIPAL WATER DISTRICT <br/> PO BOX 869<br/> BIG
SPRING, TX 79721-0869 </td>
<td align="center"> </td>
</tr>
</tbody>
</table>
I tried different functions.
I don't know why it doesn't work and what difference
Please help me.
I suspect your real, complete input has an XHTML default namespace declaration xmlns="http://www.w3.org/1999/xhtml" and in oXygen for XPath you have the setting enabled to "use the default namespace of the root element" so your path works with XPath out of the box while for XQuery you need to make sure you explicitly set
declare default element namespace 'http://www.w3.org/1999/xhtml';
in the prolog of your XQuery file or code sample.

problems with xpath evaluation

If I want to extract hrefs only under Type1, basically, 1,2,3,4. htm but not including 5.htm, how to do that?
What I have for now is ://table[#class='leftnav']//a"
Thanks !
<table width="240" border="0" cellpadding="0" cellspacing="0" class="leftnav">
<tr class="leftnav">
<th>Type1</th>
</tr>
<tr class="leftnav">
<td>2013</td>
</tr>
<tr class="leftnav">
<td>2012</td>
</tr>
<tr class="leftnav">
<td>2011</td>
</tr>
<tr class="leftnav">
<td>2010</td>
</tr>
<tr class="leftnav">
<th>Type2</th>
</tr>
<tr class="leftnav">
<td>2013</td>
</tr>
</table>
Try this xpath:
//tr[(preceding-sibling::tr/th)[last()]="Type1"]/td/a

Resources