How can I search for a specific text element? - ruby

How can I search for the element containing Click Here to Enter a New Password using Nokigiri::HTML?
My HTML structure is like:
<table border="0" cellpadding="20" cellspacing="0" width="100%">
<tbody>
<tr>
<td class="bodyContent" valign="top">
<div>
<strong>Welcome to</strong>
<h2 style="margin-top:0">OddZ</h2>
Click Here
to Enter a New Password
<p>
Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.
<br>
</p>
</div>
</td>
</tr>
</tbody>
</table>
I tried:
doc = (Nokogiri::HTML(#inbox_emails.first.body.raw_source))
password_container = doc.search "[text()*='Click Here to Enter a New Password']"
but this did not find a result. When I tried:
password_container = doc.search "[text()*='Click Here']"
I got no result.
I want to search the complete text.
I found there are many spaces before text " to Enter a New Password" but I have not added any space in the HTML code.

Much of the text you are searching for is outside of the a element.
The best you can do might be:
a = doc.search('a[text()="Click Here"]').find{|a| a.next.text[/to Enter a New Password/]}

You can use a mix of xpath and regex, but since there's no matches in xpath for nokogiri yet, you can implement your own as follows:
class RegexHelper
def content_matches_regex node_set, regex_string
! node_set.select { |node| node.content =~ /#{regex_string}/mi }.empty?
end
def content_matches node_set, string
content_matches_regex node_set, string.gsub(/\s+/, ".*?")
end
end
search_string = "Click Here to Enter a New Password"
matched_nodes = doc.xpath "//*[content_matches(., '#{search_string}')]", RegexHelper.new

You can try by using CSS selector. I've saved your HTML as a file called, test.html
require 'Nokogiri'
#doc = Nokogiri::HTML(open('test.html'))
puts #result = #doc.css('p').text.gsub(/\n/,'')
it returns
Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.
There's a good post about Parsing HTML with Nokogiri

You were close. Here's how you find the text's containing element:
doc.search('*[text()*="Click Here"]')
This gives you the <a> tag. Is this what you want? If you actually want the parent element of the <a>, which is the containing <div>, you can modify it like so:
doc.search('//*[text()="Click Here"]/..').text
This selects the containing <div>, the text of which is:
Welcome to
OddZ
Click Here
to Enter a New Password
Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.

Related

XPath of the edit button is //*[#id="edit_1088"]

My query is I need to click on the first edit link on the table. The table has 7 columns and rows will be incremented dynamically.
HTML Code for first row:
<tr role="row" class="odd">
<td>
<form method="GET" target="_blank" action>...</form>
<td> ALLOCATION CHANGE</td>
<td class="right"></td>
<td> SATTER, KRAIG</td>
<td> CAFEMANAGER1</td>
<td class="sorting_1">03/08/2016 17:00</td>
<td class="edit_icon" id="edit_1088" onclick="on EditClick(1088)">
<span class="view_icon" style="margin-left: 40%;"></span>
</td>
</tr>
Note: the ID of the edit button keep on changing as the row increments.
Mycode in cucumber -ruby -capybara
And /^I click on the Expresso image$/ do
find(:xpath, '//*[#id="l1row"]/span').click
find('tr:odd > td:edit_icon [id="edit_"] match: first').click
sleep 10
end
Error Message: invalid selector: An invalid or illegal selector was specified
Update based on the posted HTML -
Note: Your first <td> isn't closed, I'm assuming thats just an error when you were adding the HTML to the question.
So from the HTML posted you don't actually have an edit link you just have a td you need to click on - the one in the first row with an id beginning with "edit_" so
find('tr:first-child > td[id^="edit_"]').click
The attempt you posted in your question won't work because there is no such CSS selector as odd or edit_icon and find needs valid CSS selectors (or XPath if you specify XPath or set it as your default)
Previous Answer Based on the wording of the question:
If this is a table then you can do what you want with CSS and not worry about XPath.
find('tr:first-child > td:last-child [id^="edit_"]')
will find the element whose id starts with "edit_" in the last td in the first row. If your rows and columns are not actually a table you'll need to provide some example HTML of what you're talking about.

Select previous td and clicking link with Mechanize and Nokogiri

Hi I am scrapping a webpage with mechanize and nokogiri. I am selecting a series of links <a></a>
html_body = Nokogiri::HTML(body)
links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")
Then I need to check if the content of each link (<a>content</a>, not the href) matches some stuff in my db. I am doing this:
links.each do |link|
if link = #tournament.homologation_number
if my condition is realized I need to select the <td></td> that is right before the <td> of the link I checked and click on the link that's in it.
<td></td>
<td>content I check with my condition</td>
How can I achieve this using Mechanize and nokogiri ?
I would iterate the first td's because it's easier to get at following elements than previous ones (with css anyway)
page.search('td[1]').each do |td|
if td.at('+ td a').text == 'foo'
page2 = agent.get td.at('a')[:href]
end
end
First of all you have to select all <td></td>, the followining xpath //table/tbody/tr/td[2]/a[1] only selects the first <a></a> element, so you could try something like //table/tbody/tr/td, but this depends on the situation.
Once you have your array of <td></td> you can access their links like this:
tds.each do |td|
link = td.children.first # Select the first children
if condition_is_matched(link.html) # Only consider the html part of the link, if matched follow the previous link
previous_td = td.previous
previous_url = previous_td.children.first.href
goto_url previous_url
end
end

how to click a link in a table based on the text in a row

Using page-object and watir-webdriver how can I click a link in a table, based on the row text as below:
The table contains 3 rows which have names in the first column, and a corresponding Details link in columns to the right:
DASHBOARD .... Details
EXAMPLE .... Details
and so on.
<div class="basicGridHeader">
<table class="basicGridTable">ALL THE DETAILS:
....soon
</table>
</div>
<div class="basicGridWrapper">
<table class="basicGridTable">
<tbody id="bicFac9" class="ide043">
<tr id="id056">
<td class="bicRowFac10">
<span>
<label class="bicDeco5">
<b>DASHBOARD:</b> ---> Based on this text
</label>
</span>
</td>
<td class="bicRowFac11">
....some element
</td>
<td class="bicRowFac12">
<span>
<a class="bicFacDet5">Details</a> ---> I should able click this link
</span>
</td>
</tr>
</tbody>
</table>
</div>
You could locate a cell that contains the specified text, go to the parent row and then find the details link in that row.
Assuming that there might be other detail links you would want to click, I would define a view_details method that accepts the text of the row you want to locate:
class MyPage
include PageObject
table(:grid){ div_element(:class => 'basicGridWrapper')
.table_element(:class => 'basicGridTable') }
def view_details(label)
grid_element.cell_element(:text => /#{label}/)
.parent
.link_element(:text => 'Details')
.click
end
end
You can then click the link with:
page.view_details('DASHBOARD')
Table elements include the Enumerable module, and I find it very useful in cases like these. http://ruby-doc.org/core-2.0.0/Enumerable.html. You could use the find method to locate and return the row that matches the criteria you are looking for. For example:
class MyPage
include PageObject
table(:grid_table, :class => 'basicGridTable')
def click_link_by_row_text(text_value)
matched_row = locate_row_by_text(text_value)
matched_row.link_element.click
#if you want to make sure you click on the link under the 3rd column you can also do this...
#matched_row[2].link_element.click
end
def locate_row_by_text(text_value)
#find the row that matches the text you are looking for
matched_row = grid_table_element.find { |row| row.text.include? text_value }
fail "Could not locate the row with value #{text_value}" if matched_row.nil?
matched_row
end
end
Here, locate_row_by_text will look for the row that includes the text you are looking for, and will throw an exception if it doesnt find it. Then, once you find the row, you can drill down to the link, and click on it as shown in the click_link_by_row_text method.
Just for posterity, I would like to give an updated answer. It is now possible to traverse through a table using table_element[row_index][column_index].
A little bit more verbose:
row_index could also be the text in a row to be matched - in your case - table_element['DASHBOARD']
Then find the corresponding cell/td element using either the index(zero based) or the header of that column
table_element['DASHBOARD'][2] - Selecting the third element in the
selected row.
Since you do not have a header row (<th> element) you can filter the cell element using the link's class attribute. Something like this
table_element['DASHBOARD'].link_element(:class => 'bicRowFac10').click
So the code would look something like this:
class MyPage
include PageObject
def click_link_by_row_text(text_value)
table_element[text_value].link_element(:class => 'bicRowFac10').click
end
end
Let me know if you need more explanation. Happy to help :)

Get last word inside table cell?

I want to scrape data from a table with Ruby and Nokogiri.
There are a lot of <td> elements, but I only need the country which is just text after a <br> element. The problem is, the <td> elements differ. Sometimes there is more than just the country.
For example:
<td>Title1<br>USA</td>
<td>Title2<br>Michael Powell<br>UK</td>
<td>Title3<br>Leopold Lindtberg<br>Ralph Meeker<br>Switzerland</td>
I want to address the element before the closing </td> tag because the country is always the last element.
How can I do that?
I'd use this:
require 'awesome_print'
require 'nokogiri'
html = '
<td>Title1<br>USA</td>
<td>Title2<br>Michael Powell<br>UK</td>
<td>Title3<br>Leopold Lindtberg<br>Ralph Meeker<br>Switzerland</td>
'
doc = Nokogiri::HTML(html)
ap doc.search('td').map{ |td| td.search('text()').last.text }
[
[0] "USA",
[1] "UK",
[2] "Switzerland"
]
The problem is that your HTML being parsed won't have rows of <td> tags, so you'll have to locate the ones you want to parse. Instead, they'll be interspersed between <tr> tags, and maybe even different <table> tags. Because your HTML sample doesn't show the true structure of the document, I can't help you more.
There are bunch of different solutions. Another solution using only the standard library is to substring out the things you dont want.
node_string = <<-STRING
<td>Title1<br>USA</td>
<td>Title2<br>Michael Powell<br>UK</td>
<td>Title3<br>Leopold Lindtberg<br>Ralph Meeker<br>Switzerland</td>
STRING
node_string.split("<td>").collect do |str|
last_str = str.split("<br>").last
last_str.gsub(/[\n,\<\/td\>]/,'') unless last_str.nil?
end.compact

Watir: Search table cell by bgcolor tag and get column number

Consider the following html as an example. its a scratch sheet I made to practice, but it has a snippet of the real html I am trying to work with.
http://www.carbide-red.com/prog/test_table.html
I am trying to locate a column and the only consistant identifier I can find is the background color (bgcolor).
<tr bgcolor="#ffffcc">
<td bgcolor="yellow" class="date" align=center>Equipment</td>
<td bgcolor="#ccccff" align=center class="date"><font color=black>8/12/12</font></td>
<td bgcolor="#ccccff" align=center class="date"><font color=black>8/19/12</font></td>
<td bgcolor="#ccccff" align=center class="date"><font color=black>8/26/12</font></td>
<td bgcolor="#ccccff" align=center class="date"><font color=black>9/2/12</font></td>
<td bgcolor="red" align=center class="date"><font color=yellow>9/9/12</font></td>
<td bgcolor="#ccffcc" align=center class="date"><font color=black>9/16/12</font></td>
<td bgcolor="#ccffcc" align=center class="date"><font color=black>9/23/12</font></td>
<td bgcolor="#ccffcc" align=center class="date"><font color=black>9/30/12</font></td>
<td bgcolor="#ccffcc" align=center class="date"><font color=black>10/7/12</font></td>
</tr>
I'm trying to find the <td> that has bgcolor=red. I would then like to save the column index of that cell, so that I can then use it to select the same column of the following rows.
But I can't seem to find a way to search for the bgcolor= tag. And I have not been able to find a way to get Watir to report back the column/row indexs to store in a variable. But if I can find the bgcolor= tag then I can search for like "equipment" and then count until I find the correct tag.
I know the html code is not ideal due to there note being any "name" or anything unique identifier, but I can't change that.
I am very new to Ruby & Watir. I tried to manipulate a website in Perl and it was was not going very well, and I discovered Watir and it did exactly what I needed (and suprisingly easy), but now I am trying to understand Ruby as well as the finer semantics.
Thanks for any help!
To get text of <td bgcolor="red"> try this:
browser.element(:css => "td[bgcolor=red]").text
You should get back "9/9/12". To click the element, replace text with click.
To put it's index in variable index try this:
index = nil
browser.tds.each_with_index {|td, i| index = i if td.attribute_value("bgcolor") == "red" or td.attribute_value("bgcolor") == "#ff0000"}
index variable should be 5.
I would use nokogiri if I were you:
doc = Nokogiri::HTML #browser.html
td = doc.at('td[#bgcolor="red"]')
index = td.search('./preceding-sibling::td').length
Unless there's tricky javascript on the page you're probably better off with mechanize than watir.
Yes the webpage I'm dealing with uses Javascript which is why I had a very hard time useing Mechanize::Firefox under Perl. Watir worked much more smoothly.
Thank you for your suggestion! It didn't work at first, but it helped me with Google searches and I was able to get a working version.
require "watir"
require "nokogiri"
browser = Watir::Browser.new
browser.goto "http://www.carbide-red.com/prog/test_table.html"
doc = Nokogiri::HTML.parse(browser.html)
td = doc.at('td[#bgcolor="red"]')
columnindex = td.search('./preceding-sibling::td').length
puts columnindex
browser.close
This returned "5"
Update:
For the sake of others who may find this while searching and learning. To use columnindex variable to find a specific column within a row use this code:
textvariable = browser.td(:text => "A58004").parent.td(:index => "#{columnindex}").text
puts "Textvariable: #{textvariable}"
This finds a <td> that contains the term "A58004" and then goes to the 5th column (0-5) over and returns the value of that cell. Using the webpage linked in my original question that would be "W=Sa"

Resources