I need to get all rows in an HTML table:
<table>
<thead>
<tr>
<th>Name</th>
<th>Location</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dunkin Donuts</td><td>2 York Ave</td>
</tr>
</tbody>
</table>
Since there are many tables in the page I want to get the rows from this specific table.
Here is my Xpath:
table[tr/th/text()="Location"]//tr
I also tried:
table[tr/th[2]/text()="Location"]//tr
No elements are returned. Ideas on how I might get this to work?
Maybe your context node has no table children. You can fix this by globally selecting all table elements with //table. You also did not take the thead and tbody elements into account. Doing so results in the following XPath expression:
//table[thead/tr/th/text()="Location"]/tbody/tr
Related
I'm new to Xpath and I'm trying to get all the rows from a specific table on a wikipedia article that has many tables, luckily the table I want has the text "PosiciĆ³n" in one of the th elements inside it's header, how can I achieve this?
I am using C# to achieve this, any help and tips will be greatly appreciated :)
<table>
<thead>
<tr>
<th>Something</th>
<th>PosiciĆ³n</th>
<th>Something</th>
</tr>
</thead>
<tbody>
<tr>
<td>info1</td>
<td>info2</td>
<td>info3</td>
</tr>
... more trs
</tbody>
</table>
I am trying to scrape a table which looks like the below.
<table class="table">
<caption>Caption</caption>
<tbody>
<tr>
<th scope="row">Title</th>
<td>Detail</td>
</tr>
<tr>
<th scope="row">Title 2</th>
<td>Detail 2</td>
</tr>
</tbody>
</table>
How would you set up scrapy so my output file generates an output similar to the below?!
Title: Detail
Title2: Detail2
Currently I can get all the text using two css selectors (one for the td's and one for the th's) but I would love to be able to combine these!
Unfortunately the number of rows differs from page to page..
Using xpath:
tabledata={}
for i in response.xpath("//table[#class='table']//tr")
tabledata[i.xpath("th/text()").extract_first()] = i.xpath("td/text()").extract_first()
Output
{"Title":"Detail", "Title 2":"Detail 2"}
I am trying to access data contained in a table that is itself contained in a table with class ='L1'.
So basically my html structure is like this:
<table class="L1">
<table>
<tr></tr>
<tr>
<td></td>
<td>data</td>
</tr>
<tr>
<td></td>
<td>data</td>
</tr>
...ect...ect
</table>
</table>
I need to catch the data contained in a all <a> </a> that are in the second contained in <tr> </tr> but only starting with the second <tr> of the table.
So far I came up with that:
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")
But seems to me that this doesn't express the fact that I want to start only after the second <tr> (second <tr> included?
What would be the right code to do this ?
You can use position() to select the later elements that you want.
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr[position()>1]/td[2]/a[1]")
As the comments on that SO answer say, remember XPath counts from 1, so >1 skips the first tr.
I have a HTML doc to parse and read a bunch of stuff from there. The problem is the html has multiple tables in it, and I am only interested in one table. Plus I want to read only the lines that having some useful content. Here is sample html page, there are two tables with no ID, and I want only the second table and only the lines that are useful to humans.
<HTML>
<BODY>
<TABLE>
<TR>
<TD> I don't want this table </TD></TR>
<TR>
<TD></TD>
<TD> No No No <br></TD>
</TR>
....
</TABLE>
<TABLE>
<TR>
<TD>04/13/2012 22:51 I want this table </TD></TR>
<TR>
<TD></TD>
<TD> First - something there <br></TD>
</TR>
<TR>
<TD>04/13/2012 23:23 Update from xyz</TD></TR>
<TR>
<TD></TD>
<TD>Second - something here <br></TD>
</TR>
</TABLE>
</BODY>
</HTML>
I am trying this code, which is obviously not working. The o/p is not the text I want. It includes both tables, I only want the second table. help!
require 'curb'
require 'nokogiri'
c = Curl::Easy.perform("http://server/cgi-bin/page.cgi?id=123456")
html_doc = Nokogiri::HTML(c.body_str.to_s)
puts html_doc.xpath("//table/tr/td")
Have you tried the xpath of //table[2]/tr/td to get the second table. If you can change the source of the HTML the best solution would be to provide id attributes for your tables.
I have a PHP page that returns an HTML table like this:
<table>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>First Row data</td><td>Second Row data</td><td>Third Row data</td>
</tr>
<tr>
<td>etc...</td>
</tr>
</table>
What I want to do is to add an ajax numerical pagination system (1 2 ... 6) that allows we to fix a max 3 rows to display and reaching the others with the navigation.
Do you know where can I find a ready script that can help to solve this problem?
Is this about what your looking for?
http://www.dynamicdrive.com/dynamicindex17/ajaxpaginate/index.htm