I'm hoping to convert the content of 3 adjacent TD elements in a TR using Yahoo Pipes to a comma-delimited list of values. Source: Epic Systems Hospitals.
HTML snippet:
...
<table width="623" cellspacing="0" cellpadding="0" border="0">
<colgroup>
<tbody>
<tr height="20">
<td width="425" height="20">Institution 0</td>
<td width="134">Minneapolis</td>
<td width="64">MN</td>
</tr>
<tr height="20">
<td height="20">Institution 1</td>
<td>Philadelphia</td>
<td>PA</td>
</tr>
...
I've used the "XPath fetch page" source to correctly isolate the TR elements using an XPATH=//tr[#height='20'].
I'm having difficulty getting the TD elements, however. It's not obvious to me which component I should be using, so I chose a Sub-element with the 'special variable substitution' syntax. Unfortunately, ${td.0.content} doesn't work.
What am I not understanding?
** edit **
My goal is to create an XML stream that resembles:
<institutions>
<institution name='Institution 0' city='Minneapolis' region='MN'/>
<institution name='Institution 1' city='Philadelphia' region='PA'/>
...
<institutions/>
If you always have 3 td cells, you could use a Loop operator with a String Builder inside, and build a string by concatenating item.td.0, item.td.1, item.td.2.
I created an example of this for you here:
http://pipes.yahoo.com/pipes/pipe.info?_id=3d24486f7c6e8413dc6252ef37c2f086
Related
I am using UFT to write the automation script, I want to to get the count number under one tbody element,(count how many 'tr', and return the result)
<tbody role="rowgroup">
<tr _calss="01">
<tr _calss="02">
<tr _calss="03">
<tr _calss="04">
<tr _calss="05">
</tbody>
How to write the code in UFT?
Since it is unclear if there can be more than one tbody you could use this for the first one.
count((//tbody)[1]/tr)
I have an Xpath like following:
"//<path to some table>/*/td[1]/text()"
and it returns text values of all non-empty tds, for example:
<text1>, <text2>, <text3>
But the problem is that between nodes, that contain mentioned values could be some empty tds elements:
What i want is to get result that contain some identifiers, that there is those empty values, for example:
<text1>,<>, <>, <text2>, <text3>, <>
or
<text1>,<null>, <null>, <text2>, <text3>, <null>
I tried to use next one:
"//<path to some table>/*/string(td[1]/text())"
but it returns undefined
Of course, I could just get whole node and then work with it in my code (cut all unnecessary info), but may be there is a better way?
html example for that case:
<html>
<body>
<table class="tablesorter">
<tbody>
<tr class="tr_class">
<td>text1</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text2</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text3</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
</tbody>
</table>
</body>
</html>
Well simply select the td elements, not its text() child nodes. So with the path changed to //<path to some table>/*/td[1] or maybe //<path to some table>/*/td you will get a node-set of td elements, whether they are empty or not, and you can then access the string contents of each node (with XPath (select string(.) for each element node) or host environment method e.g. textContent in the W3C DOM or text in the MSXML DOM.). That way the empty strings will be included.
In case you use XPath 2.0 or XQuery you can directly select //<path to some table>/*/td/string(.) to have a sequence of string values. But that approach with a function call in the last step is not supported in XPath 1.0, there you can select the td element nodes and then access the string value of each in a separate step.
Do you mean you want only the td[1] with text and get rid of ones without text? If so, you can use this xpath
//td[1][string-length(text()) > 1]
I have a table that looks like this
<table cellpadding="1" cellspacing="0" width="100%" border="0">
<tr>
<td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td>
</tr>
<tr style="background-color: #D8E4F6;vertical-align: top;">
<td nowrap="nowrap"><b>Bill Date</b></td>
<td nowrap="nowrap"><b>Bill Amount</b></td>
<td nowrap="nowrap"><b>Bill Due Date</b></td>
<td nowrap="nowrap"><b>Bill (PDF)</b></td>
</tr>
</table>
I am trying to create the XPATH to find this table where it contains the test Bill Statement Detail. I want the entire table and not just the td.
Here is what I have tried so far:
page.parser.xpath('//table[contains(text(),"Bill")]')
page.parser.xpath('//table/tbody/tr[contains(text(),"Bill Statement Detail")]')
Any Help is appreciated
Thanks!
Your first XPath example is the closest in that you're selecting table. The second example, if it ever matched, would select tr—this one will not work mainly because, according to your example, the text you want is in a b node, not a tr node.
This solution is as vague as I could make it, because of *. If the target text will always be under b, change it to descendant::b:
//table[contains(descendant::*, 'Bill Statement Detail')]
This is as specific, given the example, as I can make:
//table[tr[1]/td/b['Bill Statement Detail']]
You might want
//table[contains(descendant::text(),"Bill Statement Detail")]
The suggested codes don't work well if the match word is not in the first row. See the related post Find a table containing specific text
i want to get the values of every table and the href value for every within the table given below.
Being new to xpath, i am finding it difficult to write xpath expression.
However understanding what an xpath expression does lies somewhat in an easier category.
the expected output
http://a.com/ data for a 526735 Z
http://b.com/ data for b 522273 Z
http://c.com/ data for c 513335 Z
<table class = dataTabe>
<tbody>
<tr>
<td>data for a</td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td>data for b</td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
<tr>
<td>data for c</td>
<td class="numericalColumn">513335</td>
<td class="numericalColumn">B</td></tr>
</tbody>
</table>
You'll need two things: an XPath query which locates the wanted nodes and a second which outputs the text as you want it. Since you don't give more information about the languages you're using I'm putting together some pseudocode:
foreach node in document.select("//table[class='dataTable']//tr[td/a/#HREF]")
write node.select("concat(td/a/#HREF,' ',.)")
This site has a great free tool for building XPath Expressions (XPath Builder):
http://www.bubasoft.net/
Use this XPath: //tr/td/a/#HREF | //tr//text()
I have an html file that I need to take any tag and put an align='left' into it.
So given the line :
<td><img alt="" src="oooh.html_files/px" style="width: 20px; height: 1px;"/></td>
I need it to do :
<td align='left'><img alt="" src="oooh.html_files/px" style="width: 20px; height: 1px;"/></td>
If it already specifies an alignment I need it to just leave this. So given the line :
<tr><td width="50%"> </td><td align="center">
I need it to do :
<tr><td width="50%" align='left'> </td><td align="center">
Note it puts an align into the first td, but ignores the second one because that already specifies an alignment.
Is this possible to do it Ruby with regular expressions?
I know its not really worth using regular expressions with html.. but basically I'm just after a quick hack to get over a bug in another library. Hopefully this bug will be fixed soon and I wont need to worry about it! :)
#!/usr/bin/env ruby
require 'nokogiri'
doc = Nokogiri::XML('<tr><td width="50%"> </td><td align="center"></tr>')
(doc / '//td[not(#align)]').each {|td| td['align'] = 'left' }
puts doc
# <?xml version="1.0"?>
# <tr>
# <td width="50%" align="left"/>
# <td align="center"/>
# </tr>
Look, ma! No Regexp!
It's literally a one-liner if you don't bother with Regexp.
Frequently Given Answer: regular expressions are not able to parse HTML; use an HTML parsing library of which there are plenty.