Based on multiple criterias on parent's siblings - xpath

I would like to know if I can combine at the same time an XPath looking for the previous sibling of a certain class with a certain text and at the same time a sibling at the same level with a certain text.
For example I would like to find the following cells:
<td class="sdawatt_booknow">Book</td>
by looking up a sibling of class sdawatt_hrdcell containing the text Spin preceded by a td of class sdawatt_banner with the text Monday - 16 September 2013.
Or the following td:
<td class="sdawatt_booknow">Book</td>
if we look for the date of the 'Friday - 13 September 2013'.
Is this something doable in Xpath ?
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;" class="sdawatt_outer">
<tbody><tr>
<td class="sdawatt_hdrcell">Time</td>
<td class="sdawatt_hdrcell">Class</td>
<td class="sdawatt_hdrcell">Level</td>
<td class="sdawatt_hdrcell">Spaces</td>
<td class="sdawatt_hdrcell">Location</td>
<td class="sdawatt_hdrcell">Instructors</td>
<td class="sdawatt_hdrcell">Tags</td>
<td class="sdawatt_hdrcell">Info</td>
<td class="sdawatt_hdrcell">Book</td>
</tr><tr>
<td colspan="9" class="sdawatt_banner">Friday - 13 September 2013</td>
</tr><tr class="sdawatt_classrow">
<td class="sdawatt_time">07:45-08:15</td>
<td class="sdawatt_classname">Boxing</td>
<td class="sdawatt_level"> </td>
<td class="sdawatt_spaces">14 Left</td>
<td class="sdawatt_location">Main Studio</td>
<td class="sdawatt_resources"> Darren</td>
<td class=" sdawatt_infotags"></td>
<td class="sdawatt_info"><img src="https://v4.fitnessandlifestylecentre.com/webaccess/TimetableView/information.gif" class="tiptip" /></td>
<td class="sdawatt_booknow">Book</td>
</tr><tr class="sdawatt_classrow">
<td class="sdawatt_time">12:00-12:45</td>
<td class="sdawatt_classname">Spin</td>
<td class="sdawatt_level"> </td>
<td class="sdawatt_spaces">8 Left</td>
<td class="sdawatt_location">Main Studio</td>
<td class="sdawatt_resources"> Matt</td>
<td class=" sdawatt_infotags"></td>
<td class="sdawatt_info"><img src="https://v4.fitnessandlifestylecentre.com/webaccess/TimetableView/information.gif" class="tiptip" /></td>
<td class="sdawatt_booknow">Book</td>
</tr><tr>
<td colspan="9" class="sdawatt_banner">Monday - 16 September 2013</td>
</tr><tr class="sdawatt_classrow">
<td class="sdawatt_time">13:00-13:45</td>
<td class="sdawatt_classname">Spin</td>
<td class="sdawatt_level"> </td>
<td class="sdawatt_spaces">12 Left</td>
<td class="sdawatt_location">Main Studio</td>
<td class="sdawatt_resources"> Marzena</td>
<td class=" sdawatt_infotags"></td>
<td class="sdawatt_info">
<img src="https://v4.fitnessandlifestylecentre.com/webaccess/TimetableView/information.gif" class="tiptip" /></td>
<td class="sdawatt_booknow">Book</td>
</tr>
</tbody></table>

//tr[
contains(
td[#class="sdawatt_banner"],
"Monday - 16 September 2013")
]
/following-sibling::tr[
contains(
td[#class="sdawatt_classname"],
"Spin")
]/td[#class="sdawatt_booknow"]
yields
<td class="sdawatt_booknow">
Book
</td>

Related

Joining text from parsing a complex table structure in ruby nokogiri

I have an HTML table and I want to get the text from some td's. Now sometime the text is in single td but sometimes its spread into multiple td's. How can I join the text in case if its spread in multiple td's. Here is the HTML code
<table class="detailRecordTable">
<tbody>
<tr><td class="detailSeperator" colspan="6"> </td></tr>
<tr>
<td valign="top" style="width: 11% " class="detailData"><b>02/03/2016</b></td> <td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3"> <b>Disposed- Pet for Writ Denied</b> /td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr>
<td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">ORDER ISSUED: PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET.</td>
</tr>
<tr><td class="detailSeperator" colspan="6"> </td></tr>
<tr>
<td valign="top" style="width: 11% " class="detailData"><b>01/29/2016</b></td>
<td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3">
<b>Suggestions in Opposition</b></td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr>
<td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service.</td>
</tr>
<tr>
<td colspan="2" style="width: 14%" class="detailLabels"> </td>
<td style="width: 86%" class="detailData" colspan="2"> <b>Filed By:</b>JOHN RICHARD SHANK JR
</td>
</tr><tr>
<td style="width: 14%" class="detailLabels" colspan="2"></td>
<td style="width: 86%" class="detailData" colspan="2"> <b>On Behalf Of:</b>ELIZABETH DAVIS
</td>
</tr>
<tr>
<td class="detailSeperator" colspan="6"> </td></tr>
<tr><td valign="top" style="width: 11% " class="detailData"><b>01/22/2016</b></td><td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3"><b>Court Order Issued</b></td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr><td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016.</td>
</tr>
</tbody></table>
I want the output like this,I put the asterisks around where the text should be joined
["ORDER ISSUED: PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET." , "**SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service. Filed By:JOHN RICHARD SHANK JR On Behalf Of:ELIZABETH DAVIS**" , "ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016"]
I have tried this but it not joining the text and I'm getting the text like a separate item, especially the text surrounded by asterisks
if !tr.css('td.detailData').empty?
ac_desc = tr.css('td.detailData')[0].text.strip.gsub("\n", '').gsub("\t", '')
end
if ac_desc != ""
acc_descs << ac_desc
end

Graphviz draw tile with different cell colour and length

How to represent this in a dot file?
digraph structs {
node1 [shape=plaintext,
label = <<table border="0" cellspacing="0">
<tr>
<td width="20">0</td>
<td width="20">1</td>
<td width="20">2</td>
<td width="20">3</td>
<td width="20">4</td>
<td width="20">5</td>
<td width="20">6</td>
<td width="20">7</td>
<td width="20">8</td>
<td width="20">9</td>
<td width="20">10</td>
<td width="20">11</td>
<td width="20">12</td>
<td width="20">13</td>
<td width="20">14</td>
</tr>
<tr>
<td border="1" colspan="3" bgcolor="yellow">A</td>
<td border="1" colspan="1" bgcolor="white"></td>
<td border="1" colspan="1" bgcolor="white"></td>
<td border="1" colspan="1" bgcolor="white"></td>
<td border="1" colspan="2" bgcolor="pink">B</td>
<td border="1" colspan="1" bgcolor="white"></td>
<td border="1" colspan="2" bgcolor="green">C</td>
<td border="1" colspan="4" bgcolor="#40e0d0">D</td>
</tr>
</table>>
];
}

Optimal XPath Query for processing the sample HTML fragment

I have a feed that outputs HTML. The following segment is part of the output
<div class="leftnav">
<table border="0" cols="2">
<tr>
<td colspan="2" class="topline"><span style="font-size: 1px"> </span></td>
</tr>
<tr>
<td colspan="2"><span class="bold">Article Cat1 </span></td>
</tr>
<tr>
<td class="date" colspan="2">
ArticleTitle1</td>
</tr>
<tr>
<td width="20"></td>
<td class="date">
ArticleLink1
</td>
</tr>
<tr>
<td colspan="2" class="topline"><span style="font-size: 1px"> </span></td>
</tr>
<tr>
<td colspan="2"><span class="bold">Article Cat2 </span></td>
</tr>
<tr>
<td class="date" colspan="2">
ArticleTitle2</td>
</tr>
<tr>
<td width="20"></td>
<td class="date">
ArticleLink2
</td>
</tr>
</table>
</div>
I want to process above segment using XPATH so that output looks like this
Article Cat1
ArticleTitle1
ArticleLink1 Article Cat2
ArticleTitle2
ArticleLink2
What is the optimal XPATH that will produce the desired output? I tried //div[#class="leftnav"]/table/tr but this gives all the TR elements. I want to skip the first TR element so that I can get the output in the format I described above.
//div[#class="leftnav"]/table/tr[position() > 1]
Try the above
Stupid simple way:
substring-after(normalize-space(string(//*:div)), normalize-space(string(//*:div/*:table/*[1])))
Result: "Article Cat1 ArticleTitle1 ArticleLink1 nbsp Article Cat2 ArticleTitle2 ArticleLink2"
I don't know why, but (position() > 1) doesn't work in my environment, so I've used strings instead.

How to create a two column email newsletter

I am trying to create a two column email flyer but I'm having trouble with the coding as Outlook hates CSS.
I'm using tables to keep it as simple as possible but I want two separate tables on the left and the right so I can add data into it as I wish.
I tried using float left and right on the two tables but Outlook ignores this style.
I know the two grey tables at the bottom are each in their own separate "holder" tables but this is so I can duplicate the grey "data" tables for when I add new articles.
<table class="all" width="auto" height="auto" border="0" cellspacing="0"><tr><td height="504">
<table width="750" height="140" border="0" cellspacing="0">
<tr>
<td width="200" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="345" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="152" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="45" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
</tr>
<tr>
<td width="200" valign="bottom" bgcolor="#E6E6E6"> </td>
<td align="center" valign="bottom" bgcolor="#E6E6E6"><font color="#111111" face="Arial Narrow" size="+2">DECEMBER NEWSLETTER</font></td>
<td width="152" align="center" valign="bottom" bgcolor="#E6E6E6"><font size="2"><strong>#4 - <span class="orange">04.12.13</span></strong></font></td>
<td width="45" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
</tr>
</table>
<table width="750" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="75" height="50" bgcolor="#E6E6E6" scope="row"> </td>
<td width="600" rowspan="2" scope="row"><img src="http://placehold.it/600x200"/></td>
<td width="75" bgcolor="#E6E6E6" scope="row"> </td>
</tr>
<tr>
<td width="75" height="81" scope="row"> </td>
<td scope="row"> </td>
</tr>
</table>
<table class="holder" width="750" border="0" cellspacing="0" cellpadding="0">
<tr>
<td valign="top" scope="row">
<table class="inlinetableleft" width="360">
<tr>
<td width="371" align="left">
<!------------LEFT COLUMN------------------>
<table width="360" border="0" cellspacing="0" cellpadding="0">
<tr>
<th height="103" colspan="4" align="left" valign="middle" bgcolor="#CCCCCC" scope="row"> </th>
</tr>
</table>
<!--------------LEFT COLUMN END------------->
</td>
</tr>
</table>
<table class="inlinetableright" width="360">
<tr>
<td align="left">
<!------------RIGHT COLUMN------------------>
<table width="360" border="0" cellspacing="0" cellpadding="0">
<tr>
<td height="106" align="left" bgcolor="#CCCCCC" scope="row"> </td>
</tr>
</table>
<!-----------RIGHT COLUMN END-------------->
</td></tr>
</table>
</td>
</tr>
</table>
Here is a fiddle of my newsletter so far, it's the bottom two grey tables that I want to be side by side.
Fiddle
For HTML emails, nested tables are your friend :)
JSFiddle
Note: the border around the table is just to show you where the tables are.
<table border="0" width="600" cellpadding="0" cellspacing="0" align="center">
<tr>
<td colspan="2">
header content here
</td>
</tr>
<tr>
<td width="300">
<table border="0" width="300" cellpadding="1" cellspacing="0" align="left">
<tr>
<td>Left Content</td>
</tr>
</table>
</td>
<td width="300">
<table border="0" width="300" cellpadding="1" cellspacing="0" align="left">
<tr>
<td>Right content</td>
</tr>
</table>
</td>
</tr>
</table>

Need query for XPath that finds all <tr> elements that contain 7 <td> elements

Hello and hopefully thanks for the help.
Honestly I am not very experienced at XPath and I am hoping a guru out there will have a quick answer for me.
I am scraping a web page for data. The defining aspect of the data I want is that it is contained in a row <tr> that has 7 <td> elements. Each <td> element has one of the pieces of data I need to import. I am using the HTML Agility Pack on CodePlex to grab the data, but I can't seem to figure out how to define the query.
Contained in the web page is a section like this:
<table border="0" cellpadding="3" cellspacing="1" width="100%">
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td class="dataHdrText02" valign="top" width="50" align="center"><nobr>SYMBOL</nobr></td>
<td class="dataHdrText02" valign="top" align="center">PERIOD</td>
<td class="dataHdrText02" valign="top" align="center" width="*">EVENT TITLE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ESTIMATE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center">PREV. YEAR ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center"><nobr>DATE/TIME (ET)</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO </nobr></td>
<td align="center">Q4 2011</td>
<td align="left" width="*">Q4 2011 CISCO Systems Inc Earnings Release</td>
<td align="center">$ 0.38 </td>
<td align="center">n/a </td>
<td align="center">$ 0.43 </td>
<td align="center"><nobr>10-Aug-11</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO  </nobr></td>
<td align="center">Q3 2011</td>
<td align="left" width="*">Q3 2011 Cisco Systems Earnings Release</td>
<td align="center">$ 0.37 </td>
<td align="center">$ 0.42 </td>
<td align="center">$ 0.42 </td>
<td align="center"><nobr>11-May-11 AMC</nobr></td>
</tr>
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td align="center" colspan="7"><img src="/format/cb/images/spacer.gif" width="1" height="4"></td>
</tr>
</table>
My goal is to grab the earnings event data and place it into a database for analysis. My original thought was to grab all <tr> elements with 7 <td> elements then work with that data. Any advice or alternative suggestions would be welcome.
This should do it for you.
//tr[count(td)=7]

Resources