xpath select following node of a specific node - xpath

based on this HTML:
< table width='300' ......>
<tbody>
< tr>
< td class = 'wcheader1'> ..... </td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
< tr>
< td class = 'wcheader1'> ..... </td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
< tr>
< td class = 'wcheader1'> ..... </td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
< tr>
< td class = 'wccontnetbox'>......< /td>
< /tr>
</tbody>
</table>
I have trouble selecting only the first two <td class='wccontnetbox'> elements after the first <td class='wcheader1'> element. Is there an XPath expression to do this?
UPDATE: those elements are dynamic.

Use the following expression to select the first two wccontnetbox elements after the first wcheader1:
//table/tbody/tr[td[#class='wcheader1']][2]/
following-sibling::tr[td[#class='wccontnetbox']][position()<3]/td
I'm using // because you don't show your full input. It would be better to use a direct path to the table (e.g. /html/body/<etc>/table...).
Use the following expression to select all nodes between the first and second wcheader1 elements:
//table/tbody/tr[td[#class='wcheader1']][1]/following-sibling::tr[
count(.|//table/tbody/tr[td[#class='wcheader1']][2]/preceding-sibling::tr)
=
count(//table/tbody/tr[td[#class='wcheader1']][2]/
preceding-sibling::tr)]/td[#class='wccontnetbox']
Note: This second expression uses the Kayessian node-set intersection formula. In general, use the following expression to find the intersection of $set1 and $set2:
$set1[count(.|$set2)=count($set2)]

Related

Ruby Fibonacci Multiplication Table [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Ruby Fibonacci Multiplication Table This was an interview question.
Recently jumped into software development, and came across this challenge. Can you show me below, how can I write the Fibonacci multiplication table in Ruby language? For the last couple of days, I was trying to implement it but seems like I am facing a brick wall and the code makes me feel insane and also I failed on the interview, but that does not matter at the moment. I will appreciate any kind of help. Thanks a lot.
ApplicationController:
class ApplicationController < Sinatra::Base
configure do
set :public_folder, 'public'
set :views, 'app/views'
end
get '/' do
#time_of_day = Time.now
erb :index
end
end
FibonacciController:
class FibonacciController < ApplicationController
get '/fibonacci' do
place = params[:place].to_i
#sequence = fib(place)
erb :fibonacci
end
def fib(place)
res = []
a = 0
b = 1
while b < place do
res << b
a,b = b,a+b
end
res
end
end
Fibonacci.erb
<div class="container">
<h1> Fibonacci sequence: </h1>
<div class="sub-container">
<p> Generated fibonacci sequence: </p>
<%= #sequence.join(', ') %>
</div>
</div>
Index.erb
<div class="container">
Date and time: <%= #time_of_day %>
<br>
<h1> Fibonacci Multiplication Table </h1>
<p> Enter your number below: </p>
<form method="GET" action="/fibonacci">
<label for="sequence">
<input type="integer" name="place" placeholder="Insert your number">
<input type="submit">
</form>
</div>
(This is the ultimate goal of this challenge)
FibonacciController code
class FibonacciController < ApplicationController
get '/fibonacci' do
place = params[:place].to_i
sequence = fib(place)
#table = generate_table(sequence)
erb :fibonacci
end
def fib(place)
return [] if place <= 0
a = 0
b = 1
res = [a]
while res.length < place do
res << b
a,b = b, a+b
end
res
end
def generate_table(sequence)
return [] if sequence.length.zero?
cols = []
(sequence.length + 1).times do |row|
row_data = []
(sequence.length + 1).times do |col|
row_data << generate_table_element(row, col, sequence)
end
cols << row_data
end
cols
end
def generate_table_element(row, col, sequence)
return '_' if row.zero? && col.zero?
return sequence[col - 1] if row.zero?
return sequence[row - 1] if col.zero?
sequence[col - 1] * sequence[row - 1]
end
end
And in erb file
<p> Generated fibonacci sequence: </p>
<% #table.each do |table_row| %>
<%= table_row.join(',') %>
<br/>
<% end %>

Ruby code to display table element details

I have a HTML which displays the Product Details in the following way:
<div class="column">
<h3 class="hidden-xs">Product Details</h3>
<table class="table table-striped">
<tbody>
<tr class="header-row hidden-xs">
<th>Product</th>
<th>Duration</th>
<th>Unit Price</th>
<th>Price</th>
</tr>
<tr>
<td>google.com</td>
<td>1 Year</td>
<td class="hidden-xs">$5</td>
<td>$5</td>
</tr>
</tbody>
</table>
<div class="totals text-right">
<p>Subtotal: $5</p>
<p>Total: $5</p>
</div>
</div>
Ruby code is given below:
require 'watir'
browser = Watir::Browser.new(:chrome)
browser.goto('file:///C:/Users/Ashwin/Desktop/text.html')
browser.table(:class, 'table table-striped').trs.each do |tr|
p tr[0].text
p tr[1].text
p tr[2].text
p tr[3].text
end
I am getting the output this way:
"Product"
"Duration"
"Unit Price"
"Price"
"google.com"
"1 Year"
"$5"
"$5"
But I want the details to be displayed as below:
Product : google.com
Duration : 1 Year
Unit Price : $5
Price : $5
Can anyone please help?
The table looks quite simple, so you can use the Table#strings method to convert the table into an array of strings. Then you can output each column header with each row value.
# Get the table you want to work with
table = browser.table(class: 'table table-striped')
# Get the text of the table
rows = table.strings
# Get the column headers and determine the longest one
header = rows.shift
column_width = header.max { |a, b| a.length <=> b.length }.length
# Loop through the data rows and output the header/value
rows.each do |row|
header.zip(row) do |header, value|
puts "#{header.ljust(column_width)} : #{value}"
end
end
#=> Product : google.com
#=> Duration : 1 Year
#=> Unit Price : $5
#=> Price : $5
This code is only for the given table with two rows
require 'watir'
browser = Watir::Browser.new(:chrome)
browser.goto('file:///C:/Users/Ashwin/Desktop/text.html')
browser.table(:class, 'table table-striped').rows.each_with_index do |row,index|
if index.eql?0
firstRow=row
next
end
p firstRow[0].text+":"+row[0].text
p firstRow[1].text+":"+row[1].text
p firstRow[2].text+":"+row[2].text
p firstRow[3].text+":"+row[3].text
end

Display Nokogiri children nodes as raw HTML instead of >tag<

I am changing an XML table into an HTML table, and have to do some rearranging of nodes.
To accomplish the transformation, I scrape the XML, put it into a two-dimensional array, and then build the new HTML to output.
But some of the cells have HTML tags in them, and after my conversion <su> becomes >su<.
The XML data is:
<BOXHD>
<CHED H="1">Disc diameter, inches (cm)</CHED>
<CHED H="1">One-half or more of disc covered</CHED>
<CHED H="2">Number <SU>1</SU>
</CHED>
<CHED H="2">Exhaust foot <SU>3</SU>/min.</CHED>
<CHED H="1">Disc not covered</CHED>
<CHED H="2">Number <SU>1</SU>
</CHED>
<CHED H="2">Exhaust foot<SU>3</SU>/min.</CHED>
</BOXHD>
The steps I'm taking to convert this to an HTML table are:
class TableCell
attr_accessor :text, :rowspan, :colspan
def initialize(text='')
#text = text
#rowspan = 1
#colspan = 1
end
end
#frag = Nokogiri::HTML(xml)
# make a 2d array to store how the cells should be arranged
column = 0
prev_row = -1
#frag.xpath("boxhd/ched").each do |ched|
row = ched.xpath("#h").first.value.to_i - 1
if row <= prev_row
column +=1
end
prev_row = row
#data[row][column] = TableCell.new(ched.inner_html)
end
# methods to find colspan and rowspan, put them in #data
# ... snip ...
# now build an html table
doc = Nokogiri::HTML::DocumentFragment.parse ""
Nokogiri::HTML::Builder.with(doc) do |html|
html.table {
#data.each do |tr|
html.tr {
tr.each do |th|
next if th.nil?
html.th(:rowspan => th.rowspan, :colspan => th.colspan).table_header th.text
end
}
end
}
end
This gives the following HTML (notice the superscripts are escaped):
<table>
<tr>
<th rowspan="2" colspan="1" class="table_header">Disc diameter, inches (cm)</th>
<th rowspan="1" colspan="2" class="table_header">One-half or more of disc covered</th>
<th rowspan="1" colspan="2" class="table_header">Disc not covered</th>
</tr>
<tr>
<th rowspan="1" colspan="1" class="table_header">Number <su>1</su> </th>
<th rowspan="1" colspan="1" class="table_header">Exhaust foot <su>3</su>/min.</th>
<th rowspan="1" colspan="1" class="table_header">Number <su>1</su></th>
<th rowspan="1" colspan="1" class="table_header">Exhaust foot<su>3</su>/min.</th>
</tr>
</table>
How do I get the raw HTML instead of the entities?
I've tried these with no success
#data[row][column] = TableCell.new(ched.children)
#data[row][column] = TableCell.new(ched.children.to_s)
#data[row][column] = TableCell.new(ched.to_s)
This might help you understand what's happening:
require 'nokogiri'
doc = Nokogiri::XML('<root><foo></foo></root>')
doc.at('foo').content = '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo><html><body>bar</body></html></foo>\n</root>\n"
doc.at('foo').children = '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo>\n <html>\n <body>bar</body>\n </html>\n </foo>\n</root>\n"
doc.at('foo').children = Nokogiri::XML::Document.new.create_cdata '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo><![CDATA[<html><body>bar</body></html>]]></foo>\n</root>\n"
I abandoned the builder, and simply built the HTML:
headers = html_headers()
def html_headers()
rows = Array.new
#data.each do |row|
cells = Array.new
row.each do |cell|
next if cell.nil?
cells << "<th rowspan=\"%d\" colspan=\"%d\">%s</th>" %
[cell.rowspan,
cell.colspan,
cell.text]
end
rows << "<tr>%s</tr>" % cells.join
end
rows.join
end
def replace_nodes(headers)
# ... snip ...
#frag.xpath("boxhd").each do |old|
puts "replacing boxhd..."
old.replace headers
end
# ... snip ...
end
I don't understand why, but it appears that the text I replaced the <BOXHD> tags with are parsed and searchable, as I was able to change tag names from data in cell.text.

Scraping multiple table row siblings with Nokogiri

I’m trying to parse a table with the following markup.
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
I need to loop through each .athlete table row and get each sibling .run table row underneath until I reach the .end row. Then repeat for the next athlete and so on. Some .athlete rows have two .run rows, others have one.
Here’s what I have so far. I loop through the athletes:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://myurl.com"
doc = Nokogiri::HTML(open(url))
doc.css(".athlete").each do |athlete|
puts athlete.at_css("name").text
# Loop through the sibling .run rows until I reach the .end row
# output the value of the td’s in the .run row
end
I can’t figure out how to get each sibling .run row, and stop at the .end row. I feel like it would be easier if the table was better formed, but unfortunately I don’t have control of the markup. Any help would be greatly appreciated!
Voilà
require 'nokogiri'
doc = <<DOC
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
DOC
doc = Nokogiri::HTML(doc)
# You can exclude .end, if it is always empty? and not required
trs = doc.css('.athlete, .run, .end').to_a
# This will return [['athlete', 'run', ...,'end'], ['athlete', 'run', ...,'end'] ...]
athletes = trs.slice_before{ |elm| elm.attr('class') =='athlete' }.to_a
athletes.map! do |athlete|
{
name: athlete.shift.at_css('.name').text,
runs: athlete
.select{ |tr| tr.attr('class') == 'run' }
.map{|run| run.text.to_f }
}
end
puts athletes.inspect
#[{:name=>"Alex", :runs=>[5.0, 5.2]}, {:name=>"John", :runs=>[5.0]}]
I would process the table as followed:
Locate the table you want to process
table = doc.at_css("table")
Get all the immediate rows in the table
rows = table.css("> tr")
Group the rows with boundary .athlete and .end
grouped = [[]]
rows.each do |row|
if row['class'] == 'athlete' and grouped.last.empty?
grouped.last << row
elsif row['class'] == 'end' and not grouped.last.empty?
grouped.last << row
grouped << []
elsif not grouped.last.empty?
grouped.last << row
end
end
grouped.pop if grouped.last.empty? || grouped.last.last['class'] != 'end'
Process the grouped rows
grouped.each do |group|
puts "BEGIN: >> #{group.first.text} <<"
group[1..-2].each do |row|
puts " #{row.text.squeeze}"
end
puts "END: >> #{group.last.text} <<"
end

finding common ancestor from a group of xpath?

say i have
html/body/span/div/p/h1/i/font
html/body/span/div/div/div/div/table/tr/p/h1
html/body/span/p/h1/b
html/body/span/div
how can i get the common ancestor? in this case span would be the common ancestor of "font, h1, b, div" would be "span"
To find common ancestry between two nodes:
(node1.ancestors & node2.ancestors).first
A more generalized function that works with multiple nodes:
# accepts node objects or selector strings
class Nokogiri::XML::Element
def common_ancestor(*nodes)
nodes = nodes.map do |node|
String === node ? self.document.at(node) : node
end
nodes.inject(self.ancestors) do |common, node|
common & node.ancestors
end.first
end
end
# usage:
node1.common_ancestor(node2, '//foo/bar')
# => <ancestor node>
The function common_ancestor below does what you want.
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(DATA)
def common_ancestor *elements
return nil if elements.empty?
elements.map! do |e| [ e, [e] ] end #prepare array
elements.map! do |e| # build array of ancestors for each given element
e[1].unshift e[0] while e[0].respond_to?(:parent) and e[0] = e[0].parent
e[1]
end
# merge corresponding ancestors and find the last where all ancestors are the same
elements[0].zip(*elements[1..-1]).select { |e| e.uniq.length == 1 }.flatten.last
end
i = doc.xpath('//*[#id="i"]').first
div = doc.xpath('//*[#id="div"]').first
h1 = doc.xpath('//*[#id="h1"]').first
p common_ancestor i, div, h1 # => gives the p element
__END__
<html>
<body>
<span>
<p id="common-ancestor">
<div>
<p><h1><i id="i"></i></h1></p>
<div id="div"></div>
</div>
<p>
<h1 id="h1"></h1>
</p>
<div></div>
</p>
</span>
</body>
</html>

Resources