How to get rows from table with specific header using Xpath - xpath

I need to get all rows in an HTML table:
<table>
<thead>
<tr>
<th>Name</th>
<th>Location</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dunkin Donuts</td><td>2 York Ave</td>
</tr>
</tbody>
</table>
Since there are many tables in the page I want to get the rows from this specific table.
Here is my Xpath:
table[tr/th/text()="Location"]//tr
I also tried:
table[tr/th[2]/text()="Location"]//tr
No elements are returned. Ideas on how I might get this to work?

Maybe your context node has no table children. You can fix this by globally selecting all table elements with //table. You also did not take the thead and tbody elements into account. Doing so results in the following XPath expression:
//table[thead/tr/th/text()="Location"]/tbody/tr

Related

How to select all the rows from a table with specific text in its header with xpath

I'm new to Xpath and I'm trying to get all the rows from a specific table on a wikipedia article that has many tables, luckily the table I want has the text "PosiciĆ³n" in one of the th elements inside it's header, how can I achieve this?
I am using C# to achieve this, any help and tips will be greatly appreciated :)
<table>
<thead>
<tr>
<th>Something</th>
<th>PosiciĆ³n</th>
<th>Something</th>
</tr>
</thead>
<tbody>
<tr>
<td>info1</td>
<td>info2</td>
<td>info3</td>
</tr>
... more trs
</tbody>
</table>

How to turn a table into a single block of text with scrapy

I am trying to scrape a table which looks like the below.
<table class="table">
<caption>Caption</caption>
<tbody>
<tr>
<th scope="row">Title</th>
<td>Detail</td>
</tr>
<tr>
<th scope="row">Title 2</th>
<td>Detail 2</td>
</tr>
</tbody>
</table>
How would you set up scrapy so my output file generates an output similar to the below?!
Title: Detail
Title2: Detail2
Currently I can get all the text using two css selectors (one for the td's and one for the th's) but I would love to be able to combine these!
Unfortunately the number of rows differs from page to page..
Using xpath:
tabledata={}
for i in response.xpath("//table[#class='table']//tr")
tabledata[i.xpath("th/text()").extract_first()] = i.xpath("td/text()").extract_first()
Output
{"Title":"Detail", "Title 2":"Detail 2"}

Scraping page with correct xpath using Mechanize and nokogiri

I am trying to access data contained in a table that is itself contained in a table with class ='L1'.
So basically my html structure is like this:
<table class="L1">
<table>
<tr></tr>
<tr>
<td></td>
<td>data</td>
</tr>
<tr>
<td></td>
<td>data</td>
</tr>
...ect...ect
</table>
</table>
I need to catch the data contained in a all <a> </a> that are in the second contained in <tr> </tr> but only starting with the second <tr> of the table.
So far I came up with that:
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")
But seems to me that this doesn't express the fact that I want to start only after the second <tr> (second <tr> included?
What would be the right code to do this ?
You can use position() to select the later elements that you want.
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr[position()>1]/td[2]/a[1]")
As the comments on that SO answer say, remember XPath counts from 1, so >1 skips the first tr.

Ruby - nokogiri - parse only specific html table

I have a HTML doc to parse and read a bunch of stuff from there. The problem is the html has multiple tables in it, and I am only interested in one table. Plus I want to read only the lines that having some useful content. Here is sample html page, there are two tables with no ID, and I want only the second table and only the lines that are useful to humans.
<HTML>
<BODY>
<TABLE>
<TR>
<TD> I don't want this table </TD></TR>
<TR>
<TD></TD>
<TD> No No No <br></TD>
</TR>
....
</TABLE>
<TABLE>
<TR>
<TD>04/13/2012 22:51 I want this table </TD></TR>
<TR>
<TD></TD>
<TD> First - something there <br></TD>
</TR>
<TR>
<TD>04/13/2012 23:23 Update from xyz</TD></TR>
<TR>
<TD></TD>
<TD>Second - something here <br></TD>
</TR>
</TABLE>
</BODY>
</HTML>
I am trying this code, which is obviously not working. The o/p is not the text I want. It includes both tables, I only want the second table. help!
require 'curb'
require 'nokogiri'
c = Curl::Easy.perform("http://server/cgi-bin/page.cgi?id=123456")
html_doc = Nokogiri::HTML(c.body_str.to_s)
puts html_doc.xpath("//table/tr/td")
Have you tried the xpath of //table[2]/tr/td to get the second table. If you can change the source of the HTML the best solution would be to provide id attributes for your tables.

html table max row and ajax navigation

I have a PHP page that returns an HTML table like this:
<table>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>etc...</td>
</tr>
</table>
What I want to do is to add an ajax numerical pagination system (1 2 ... 6) that allows we to fix a max 3 rows to display and reaching the others with the navigation.
Do you know where can I find a ready script that can help to solve this problem?
Is this about what your looking for?
http://www.dynamicdrive.com/dynamicindex17/ajaxpaginate/index.htm

Resources