How to get the table immediately previous to current table row - xpath

Say I get a list of rows like this
var table_stop_rows = (from r in doc.Descendants("TR").Cast<HtmlNode>()
where r.Attributes["name"]?.Value == "laneStop"
select r).ToList();
Now, for each of those "laneStop" rows, I want to refer back to the smaller table containing the "shipment_number" field and read its corresponding node value, eg "abc_123_florida-4". However, I cant simply get a list of all rows where there is a shipment_number, each one has to be in a table that precedes the "laneStop" row in the row collection I'm getting.
I suppose my question then is - if I have a collection of rows, can I then use an xpath statement relative to each row to get back to this shipment_number field in the table preceding?
Here is the html doc, note there would be dozens of these "table pairs". Since I can't control the structure of these files, I need a way to extract the data from the existing structure
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>

Try this xpath expression:
(//tr[#name="laneStop"]/ancestor::table/preceding-sibling::table//tr[2]/td[2])[1]

Related

Outlook can't parser multiple tables from html with pywin32

I try to convert the html file to msg file, it stopped convert when meet third table tag in html.
I searched this question but didn't get any result -- it seems like only myself meet this problem.
So this is the example html code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>This Email is using for understand outlook mail synthesis</h1>
<h2>0001</h2>
<table>
<tr>
<td>This is image001.jpg</td>
</tr>
<tr>
<td>
// stop parsing after parse this table
<table>
<tr>
<td>
<img src="cid:image001.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image002.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image002.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image003.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image003.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image004.jpg</td>
</tr>
<tr>
<table>
<tr>
<td>
<img src="cid:image004.png" alt="">
</td>
</tr>
</table>
</tr>
</table>
<h1>No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</h1>
</body>
</html>
And this is my python code:
from win32com import client as win32
import os
outlook = win32.Dispatch("outlook.application")
mail = outlook.CreateItem(0)
mail.Subject = "This is a subject"
with open(".\\mix.html", "r", encoding="utf-8") as f:
html = f.read()
mail.HtmlBody = html
current_path = os.getcwd()
at = mail.Attachments.Add(current_path + "\\image001.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image001.jpg")
at = mail.Attachments.Add(current_path + "\\image002.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image002.jpg")
at = mail.Attachments.Add(current_path + "\\image003.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image003.jpg")
at = mail.Attachments.Add(current_path + "\\image004.png")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image004.png")
mail.SaveAs(current_path + "\\rst.msg")
This is what I see when I open the "rst.msg" file:
stop parsing after parse the table
I deleted the table in second tr tag and run the python script, this is what I get:
stop parsing again after parse the table
This is the html code I deleted the table in second tr tag:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>This Email is using for understand outlook mail synthesis</h1>
<h2>0001</h2>
<table>
<tr>
<td>This is image001.jpg</td>
</tr>
<tr>
<td>
</td>
</tr>
<tr>
<td>This is image002.jpg</td>
</tr>
<tr>
<td>
// stop parsing after parse this table
<table>
<tr>
<td>
<img src="cid:image002.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image003.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image003.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image004.jpg</td>
</tr>
<tr>
<table>
<tr>
<td>
<img src="cid:image004.png" alt="">
</td>
</tr>
</table>
</tr>
</table>
<h1>No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</h1>
</body>
</html>
Hope you can help me! Thank you very much!
This question was solved by upgrading outlook version. In the new version, outlook can generate email as expected, but if you add some picture to the email used for html generating and save this email, then pictures are added to the email as appendix. If you send the email after generate it and before save it, the receiver can't see those appendix. That's what you might want.
Upgrading the outlook version is a way to solve this question.

Correct mrtg cfgmaker file

mrtg cfgmaker does read incorrect values over SNMP V1 and V2 and I need to correct the resulting file.
I would like to run a script after creation and use sed if possible.
Lines that needs to be corrected in my case are for LAG's and normal ports:
MaxBytes[switch01_lag_26]: 125000000 should go to MaxBytes[switch01_lag_26]: 250000000
(switch01_lag_26 can be switch01_lag_1 until switch01_lag_26)
MaxBytes[switch01_g1]: 12500000 should go to MaxBytes[switch01_g1]: 125000000
(switch01_g1 can be switch01_g1 until switch01_g16)
What sed patterns I have to use to analyze if its a lag or port in the square brackets and then replace the number after the : ?
The html part should show the correct speed if possible too, this is original for port g1:
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>12.5 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
and should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for g1-- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>1-Gigabit---Level </td>
</tr>
<tr>
<td>ifType:</td>
<td>ethernetCsmacd (6)</td>
</tr>
<tr>
<td>ifName:</td>
<td>g1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
This is original for LAG 1:
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>125.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
which should read at the end (Line below "Max Speed" is changed):
<h1>Traffic Analysis for lag 1 -- switch01</h1>
<div id="sysdetails">
<table>
<tr>
<td>System:</td>
<td>switch01</td>
</tr>
<tr>
<td>Maintainer:</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>lag-1 </td>
</tr>
<tr>
<td>ifType:</td>
<td>IEEE 802.3ad Link Aggregate (161)</td>
</tr>
<tr>
<td>ifName:</td>
<td>lag 1</td>
</tr>
<tr>
<td>Max Speed:</td>
<td>250.0 MBytes/s</td>
</tr>
<tr>
<td>Ip:</td>
<td>No Ip (No DNS name)</td>
</tr>
</table>
</div>
I can change all speeds in HTML using sed -i 's/\([0-9.]\+\) MBytes/125.0 MBytes/' /switch01.cfg but this changes for LAG's too. How to detect if the HTML part belongs to a LAG?

Get a cell that is in a table before the current table

See html below. Have a series of tables that include rows with a name attribute name="laneStop". I can select those rows like this in the Chrome dev console
$x("/html[1]/body[1]//TR[#name='laneStop']")
However, I also need to get the 2nd cell of the 2nd row of the 1st table ABOVE these rows, eg. the value
abc_123_florida-45
Here is the html. Whats a way to refer to this value above - knowing that Im getting the "laneStop" rows first
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
Try the following xpath.
//td[text()='shipment number']/following::td[1]
Demo:
If you want to travel from your current node (i.e., the "laneStop" rows), one way to do that is to use this xpath expression:
./preceding-sibling::*/ancestor::*[6]/preceding-sibling::table[1]//tr[1]/td[1]/table[1]//td[1]//tr[2]/td[2]
I'm curious to see if it works for you.

Using contains returns too many results

In the html below, I'm trying to get the two nodes that contain values for shipment_number, but instead I get 6 <td> nodes - why? Doesn't contains limit the nodes to only those that match the text value? If so the statement below should only return two, not six?
In Chrome dev console:
$x("//tr//td[contains(.,'shipment number')]/following::td[1]")
html:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/16/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_222_florida-35</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0630</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>sue smith</td>
</tr>
<tr>
<td>box type</td>
<td>rect</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>33.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>1.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>27.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>299.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
You need
//tr//td[contains(text(),'shipment number')]/following::td[1]
That's because contains(., '...') converts . to string by expanding all its text descendants, not just children.
I'm adding this answer because text() node test might conflict with others requirements, mainly those dealing with inline markup.
The reason because you are getting six td elements is that there is six td having "shipment number" as part of theirs string value (concatenation of all descendant text nodes). And that is because you have nested tables, thus nested td elements. So, you want a td element not having a descendant td element.
The expression:
//tr//td[not(.//td)][contains(.,'shipment number')]/following::td[1]
It selects:
<td>abc_123_florida-45</td>
<td>abc_222_florida-35</td>
Check in http://www.xpathtester.com/xpath/37bd889231ad68bb7bfa377433aeca00
Do note that your input sample has a default namespace declaration with the namespace URI http://www.w3.org/1999/xhtml. Because niether your code sample nor your selected answer are ussing namespaces, I asume you know how to work with them.

Page.GetRouteUrl in LayoutTemplate of listview?

I am tried to create table structure with header,body,footer in listview which works fine.
But in footer which is in layouttemplate, i tried to add below code which gives error.
<LayoutTemplate>
<table class="sampletable" cellpadding="0" cellspacing="0">
<thead class="tableheader">
<tr>
<th>
<a>Samples </a>
</th>
</tr>
</thead>
<tbody class="tablebody">
<tr id="itemplaceHolder" runat="server">
</tr>
</tbody>
<tfoot class="tablefooter">
<tr>
<td>
<a href='<%:Page.GetRouteUrl("samplelist",null) %>'>more sample</a>
</td>
</tr>
</tfoot>
</table>
</LayoutTemplate>
Is it not allowed to place in layouttemplate?
The error is
"The Controls collection cannot be modified because the control contains code blocks (i.e. <% ... %>)."
Use the 'RouteUrlExpressionBuilder'.
Link
Properly documented at MSDN.

Resources