I'm trying to parse a website to extract people's names and countries.
The page sometimes looks like:
<th>Inventors:</th>
<td align="left" width="90%">
<b>Harvey; John Christopher</b> (New York, NY)<b>, Cuddihy; James William</b> (New York, NY)
</td>
I can get the country using
//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::text()
[(New York, NY), (New York, NY)]
Sometimes the page looks like ( added around country name):
<th>Inventors:</th>
<td align="left" width="90%">
<b>Harvey; John Christopher</b> (New York, <b>NY</b>)<b>, Cuddihy; James William</b> (New York, <b>NY</b>)
</td>
I can get the country with:
//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::b
[NY, NY]
Now, I want to be able to get the countries in both cases.
I tried with:
//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::*[self::text() or self::b]
but then I get only the "b"s...
I've also tried:
//.../following-sibling::text() | //.../following-sibling::b
but I also get only the "b"s...
Any idea why this does not work as expected? Any solution to get both entries?
You can use
string(//th[.="Inventors:")]/following-sibling::td)
So that you'll select
Harvey; John Christopher (New York, NY), Cuddihy; James William (New York, NY)
in both cases. Then use XPath 2.0 string/regex processing functions, or use those facilities in the calling language if only XPath 1.0 is available.
You may also try something like:
//th[contains(text(), "Inventors:")]
/following-sibling::td/b[contains(text(),";")]
/following-sibling::node()[not(self::b[contains(text(),";")])]
This will select all following-sibling nodes but ignore b nodes containing a ";".
Related
I am writing a CFM file to display a dropdown menu, in the dropdown options I want Var1 [Var2, Var3] to display for each line. I tried both concatenation and commas, tried a mix of brackets and quotes and cannot find the right combination. Using Oracle SQL developer. I am a student taking a database management class. I read through Adobe's site and cannot find an example like this. I also combed stack, but everything is PHP and javascript related. The result should have a dropdown, where each selection shows: cus_code with corresponding [cus_fname, cus_lname] for each line. Below is the query that pulls the data needed and the dropdown menu. The place I am having trouble is in the output, where each line from the dropdown should have 3 variables. Every attempt I make causes an error.
<!--Here is my query: grab customer code, last name, first name from customer table, join invoice table on customer code-->
<CFQUERY NAME="INVOICESEARCH" DATASOURCE="ORCL">
SELECT DISTINCT levine04.CUSTOMER6.CUS_CODE, CUS_LNAME, CUS_FNAME
FROM levine04.CUSTOMER6, levine04.INVOICE6
WHERE levine04.CUSTOMER6.CUS_CODE = levine04.INVOICE6.CUS_CODE;
</CFQUERY>
<!--Here is my dropdown:-->
<TR>
<TD ALIGN="right">INV_NUMBER</TD>
<TD>
<INPUT TYPE ="text" NAME="INV_NUMBER" SIZE="10" MAXLENGTH="10">
</TD>
</TR>
<TR>
<TD ALIGN="right">CUS_CODE, [CUS_FNAME, CUS_LNAME]</TD>
<TD>
<SELECT NAME="CUS_CODE" SIZE=1>
<OPTION SELECTED VALUE="ANY">ANY
<CFOUTPUT QUERY="INVOICESEARCH">
<OPTION VALUE="#INVOICESEARCH.CUS_CODE#" + "#INVOICESEARCH.CUS_LNAME#" +"#INVOICESEARCH.CUS_FNAME#"> #CUS_CODE# + #CUSLNAME# + #CUSFNAME#
</CFOUTPUT>
</SELECT>
</TD>
</TR>
Got it! But maybe this will help someone else..
<OPTION VALUE="#INVOICESEARCH.CUS_CODE#"> #CUS_CODE#[#CUS_LNAME#, #CUS_FNAME#]
I'm using Watir-Webdriver and Ruby. On the page I have a table of record. I need to click on any row and it should go the next page. How can I click on the row?
Here's the source code for each record from the table
<tr class="ng-scope" ng-click="clickHandler({rowItem: item})" ng-repeat="item in ngModel">
<td class="ng-bindging">Sometext</td>
<td class="ng-bindging">Sometext1</td>
<td class="ng-bindging">Sometext2</td>
<td class="ng-bindging">Sometext3</td>
<td class="ng-bindging">Sometext4</td>
<td>
<span class="glyphicon glyphicon-play-circle"></span>
</td>
Any suggestions would be appreciated.
Thank you.
As you do not care which tr element you click, you could simply click the first tr element in the table:
record_table = browser.table
record_table.tr.click
If the first row is a header row, you might need to click the last row instead:
record_table = browser.table
record_table.trs.last.click
Note that browser.table will click the first table on the page. If there are more tables on the page, you will want to be more specific - eg browser.table(id: 'some_id').
The key is to find unique attributes that Watir can access. The first issue is that your angular app doesn't product valid html5 without data- prepended.
I don't know what your other rows look like to know how to click this one versus another one, but if that attribute is unique, you could use css:
browser.element(css: "tr[ng-repeat='item in ngModel']")
If your text is unique you could also do this (even though it is less ideal):
browser.td(text: 'Sometext').parent
I have a table that contains rows like this:
<tr class="premium"><td class="name"><div class="name">John Doe</div>Fancy company name<br />Elmstreet 71<br />454378 Ghostown<br />Tel.: 123 4567 891<br /></td></tr>
<tr class="basic"><td class="name"><div class="name">John Smoe</div>Fancy company name<br />Elmstreet 73<br />456378 Ghostown<br />Tel.: 123 4567 891<br /></td></tr>
I need the xpath to select the company name from rows with the class="premium"
Thanks in advance!
xpath as itself returns set of strings divided by <br> tags. You can use string() function to take the 1st past
string(//tr[#class="premium"]/td[#class = "name"]/text())
or as kjhughes has supposed
//tr[#class="premium"]/td[#class = "name"]/text()[1]
result
String='Fancy company name'
Is there any way to display data in table format in Shoes?
<table border ="1">
<tr>
<th>Name</th>
<th> Content</th>
</tr>
#products.each do |product|
<tr>
<td> product.name </td>
<td>product.detail</td>
</tr>
end
</table>
Shoes does not have a native table construct (yet). There is an open issue on shoes4 about adding one but don't expect that soon or count on it being implemented.
You can mix flows with fixed width values to achieve a table like effect. E.g. put some flows with a fixed width next to each other and put these in a stack. I made a table like construct here
I’m using HtmlAgilityPack to obtain some Html from a web site.
Here is the received Html:
<table class="table">
<tr>
<td>
<table class="innertable">...</table>
</td>
</tr>
<tr>
<td colspan="2"><strong>Contact</strong></td>
</tr>
<tr>
<td colspan="2">John Doe</td>
</tr>
<tr>
<td colspan="2">Jane Doe</td>
</tr>
<tr>
<td colspan="2"> </td>
</tr>
<tr>
<td><strong>Units</strong></td>
<td>32</td>
</tr>
<tr>
<td><strong>Year</strong></td>
<td>1998</td>
</tr>
</table>
The Context:
I’m using the following code to get the first :
var table = document.DocumentNode.SelectNodes("//table[#class='table']").FirstOrDefault();
I’m using the following code to get the inner table :
var innerTable = table.SelectNodes("//table[#class=innertable]").FirstOrDefault();
So far so good!
I need to get some information from the first table and some from the inner table.
Since I begin with the information from the first table I need to skip the first row (which holds the inner table) so I do the following:
var tableCells = table.SelectNodes("tr[position() > 1]/td");
Since I now have all the cells from the first table excluding the inner table, I start doing the following:
string contact1 = HttpUtility.HtmlDecode(tableCells[1].InnerHtml);
string contact2 = HttpUtility.HtmlDecode(tableCells[2].InnerHtml);
string units = HttpUtility.HtmlDecode(tableCells[5].InnerHtml);
string years = HttpUtility.HtmlDecode(tableCells[7].InnerHtml);
The problem:
I’m getting the values I want by hardcoding the index in tableCells[] not thinking the layout would move…unfortunately, it does move.
In some cases I do not have a “Jane Doe” row (as shown in the above Html), this means I may or may not have two contacts.
Because of this, I can’t hardcode the indexes since I might end up having the wrong data in the wrong variables.
So I need to change my approach...
Does anyone know how I could perfect my algorithm so that it can take into account the fact that I may have one or two contacts and perhaps not use hardcoded indexes?
Thanks in advance!
vlince
There is never one unique solution to this kind of problem. Here is an XPATH that seems to do some kind of it though:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(yourHtmlFile);
doc.Save(Console.Out);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//tr[td/strong/text() = 'Contact']/following-sibling::tr/td/text()[. != ' ']"))
{
Console.WriteLine(node.OuterHtml);
}
will display this:
John Doe
Jane Doe
32
1998