Html Agility Pack loop through Specific Row - xpath

I have a table like this
<table>
<thead>
<tr>
<th>Name</th>
<th>Department</th>
<th>Gender</th>
</tr>
</thead>
<tbody>
<tr id="data1">
</tr>
<tr>
</tr>
</tbody>
And I want to use Html Agility Pack to parse its specific row i.e i want to display row next to row which has id=data1
below is code I am trying ...
//Selecting Document Node....
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(data);
//Selecting Specific Node...
var tableNodes = doc.DocumentNode.SelectNodes("//table");

Your xpath should be like this:
//table/tbody/tr[#id='data1']/following-sibling::tr

Related

How to turn a table into a single block of text with scrapy

I am trying to scrape a table which looks like the below.
<table class="table">
<caption>Caption</caption>
<tbody>
<tr>
<th scope="row">Title</th>
<td>Detail</td>
</tr>
<tr>
<th scope="row">Title 2</th>
<td>Detail 2</td>
</tr>
</tbody>
</table>
How would you set up scrapy so my output file generates an output similar to the below?!
Title: Detail
Title2: Detail2
Currently I can get all the text using two css selectors (one for the td's and one for the th's) but I would love to be able to combine these!
Unfortunately the number of rows differs from page to page..
Using xpath:
tabledata={}
for i in response.xpath("//table[#class='table']//tr")
tabledata[i.xpath("th/text()").extract_first()] = i.xpath("td/text()").extract_first()
Output
{"Title":"Detail", "Title 2":"Detail 2"}

PDF Layout with MigraDoc

I am trying to achieve following matrix kind of layout:
TABLE1,1 TABLE1,2
CHART2,1 TABLE2,2
TABLE3 --> occupies whole row
CHART4 --> ocupies whole row
CHART5,1 CHART5,2
................. List goes on...
These components may span over multiple pages. What is the best way to have them side by side and still be able to view them in MigraDoc.
CHART5,1 could be a combination of 4 charts in one cell.
In HTML view I can use following analogy:
<TABLE>
<TR>
<TD>TABLE1,1</TD> <TD>TABLE1,2 </TD>
</TR>
<TR>
<TD>CHART2,1</TD> <TD>TABLE2,2 </TD>
</TR>
<TR>
<TD>TABLE3</TD colspan =2>
</TR>
<TR>
<TD>CHART4</TD colspan =2>
</TR>
<TR>
<TD>CHART5,1</TD> <TD>CHART5,2 </TD>
</TR>
</TABLE>
The MigraDoc equivalent for colspan=2 is MergeRight=1. This is a property of the Cell class.

Scraping page with correct xpath using Mechanize and nokogiri

I am trying to access data contained in a table that is itself contained in a table with class ='L1'.
So basically my html structure is like this:
<table class="L1">
<table>
<tr></tr>
<tr>
<td></td>
<td>data</td>
</tr>
<tr>
<td></td>
<td>data</td>
</tr>
...ect...ect
</table>
</table>
I need to catch the data contained in a all <a> </a> that are in the second contained in <tr> </tr> but only starting with the second <tr> of the table.
So far I came up with that:
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")
But seems to me that this doesn't express the fact that I want to start only after the second <tr> (second <tr> included?
What would be the right code to do this ?
You can use position() to select the later elements that you want.
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr[position()>1]/td[2]/a[1]")
As the comments on that SO answer say, remember XPath counts from 1, so >1 skips the first tr.

Dynamically load a table using javascript

I want to load the table dynamically as soon as the page loads.i have tried the following code , but it is not showing 2nd column values found out in javascript.plz help me out.
<html>
<head>
<script type="text\javascript">
var brw=navigator.appCodeName;
var bw=navigator.appName;
var vrs=navigator.appVersion;
var plt=navigator.platform;
document.getElementById('r1').innerHTML = ' '+brw;
document.getElementById('r2').innerHTML = ' '+bw;
document.getElementById('r3').innerHTML = ' '+vrs;
document.getElementById('r4').innerHTML = ' '+plt;
</script>
</head>
<body>
<table border='1'>
<tr><td>Browser code </td><td><div id='r1'></div></td></tr>
<tr><td>Browser </td><td><div id='r2'></div></td></tr>
<tr><td>Browser Version </td><td><div id='r3'></div></td></tr>
<tr><td>Platform </td><td><div id='r4'></div></td></tr>
</table>
</body>
</html>
Your table should be like :
<table>
<thead>
<tr>
<th>Browser code</th>
<th>Browser</th>
<th>Browser version</th>
<th>Platform</th>
</tr>
</thead>
<tbody>
<tr>
<td><div id='r1'></div></td>
<td><div id='r2'></div></td>
<td><div id='r3'></div></td>
<td><div id='r4'></div></td>
</tr>
</tbody>
</table>
This way your table heads are set at top in <thead> balise, then you can add rows in <tbody> balise.
Just put your Javascript code at the end of your page.

Cucumber + Selenium - How to count the number of rows in a table?

Does anyone know a quick way to count the number of entries in a table using Ruby, Cucumber & Selenium?
The table is fairly basic, I want to count the number of rows:
<table id="product_container">
<tr>
<th>Product Name</th>
<th>Qty In Stock</th>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</table>
You can use:
page.should have_css "#product_container tr", :count => number_of_rows.to_i
The following step definition should work with Capybara.
Then /^I should have (\d+) table rows$/ do |number_of_rows|
actual_number = page.all('#product_container tr').size
actual_order.should == number_of_rows
end
Usage:
Then I should have 10 table rows
The page.all documentation.
I always use getXpathCount() (Selenium method) in such situation and it works fine :)
In PHP:
$rowsCount = $this->getXpathCount("//table[#id='product_container']/tr");
And if you don't want to count header rows, you should edit the table as:
<table id="product_container">
<thead>
<tr>
<th>Product Name</th>
<th>Qty In Stock</th>
</tr>
</thead>
<tbody>
<tr>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
Then you can get the products count:
$rowsCount = $this->getXpathCount("//table[#id='product_container']/tbody/tr");

Resources