How to get ID of an element using Watir where the child contains the string i search for - ruby

<div class="wrapper">
<div id="minHeightBlock" style="min-height: 430px;">
<div class="borderbox"><div class="standaloneBox">
<div class="sysHeaderContainer clearfix"> … </div>
<div class="notesForGuests"> … </div>
<div class="filterBox clearfix"> … </div>
<div class="resListHeader"> … </div>
<div id="corporaContainer" class="fullList">
<div id="c-a06ffa6a-dc62-4640-9760-dbd661c7ffe8" class="resItem clearfix">
<div class="resTitle">
<span id="filter-empty" class="statBall statFile empty" title="Status: Empty corpus"></span>
<span class="theText">
12321 corpora
</span>
</div>
<div class="resType"> … </div>
<div class="resSize"> … </div>
<div class="resPermission private"> … </div>
<div class="resDomain"> … </div>
<div class="resDescr"> … </div>
<div class="resDetails clearfix" style="display:none;"> … </div>
</div>
<div id="c-b8c0faba-e662-4998-836f-0ee58009b7fa" class="resItem clearfix"> … </div>
<div id="c-9d02b887-4835-4606-ad4b-775b39af9f48" class="resItem clearfix"> … </div>
<div id="c-021d3ba1-db03-4c4e-81a5-294737eb5b54" class="resItem clearfix"> … </div>
This is the code of the webpage im trying to script using Watir. All i know is only the what kind of span text the element should contain. I have many of these elements and i need to colect all of the element ID values so i can use them in further actions.
I have comented the places in the above code what i know and what i need to get.
So far i have tried this code:
#b.div(:id, "pageHeader").link(:text, "Corpora").click
sleep 5
#b.div(:id, "corporaContainer").spans(:text => /TestAuto\s.*/).each do |span|
puts span.parent.attribute_value("id")
end
But no output is done. Maybe im doing something wrong. Help me get this nut shell cracked.

Your attempt was close. The problem is that span.parent only goes up to the <div class="resTitle">. You need to go up one more parent:
#b.div(:id, "corporaContainer").spans(:text => /corpora/).each do |span|
puts span.parent.parent.attribute_value("id")
end
(Note that I changed the text in the locator of the spans since TestAuto\s.* did not match the sample html.)
Alternatively, I sometimes find it better to find the divs that contain the span. This way you do not have to worry about the number of parents changing:
p #b.divs(:class => 'resItem')
.find_all { |div| div.span(:text => /corpora/).exists? }
.collect { |div| div.id }
#=> ["c-a06ffa6a-dc62-4640-9760-dbd661c7ffe8"]
Below is a working example. Note that there are 2 important things:
The list of results is loaded asynchronously. Therefore you need to wait for the list to finish loading before capturing the results. sleep(5) might work, but you are better off using an actual wait method (since it seems to take longer than 5 seconds).
Make sure the search text actually exists on the page. In the below example, there is no "12321 corpora" title that was mentioned in the sample html.
Example:
require 'watir-webdriver'
# Title to search for:
title_text = /UniAdm/
# Go to the Corpora page:
#b = Watir::Browser.new :ff
#b.goto "https://www.letsmt.eu/Corpora.aspx"
# Wait for the results to load:
container = #b.div(:id, "corporaContainer")
container.div(:class => 'resItem').wait_until_present
# Find the matching ids:
p container.divs(:class => 'resItem')
.find_all { |div| div.span(:class => 'theText', :text => title_text).exists? }
.collect { |div| div.id }
#=> ["c-87ee80a9-e529-48b2-92be-bc8d76375478", "c-f139e781-4789-41f9-82e8-914e0e3eff81", "c-e17641d2-9364-4e87-9047-ba35580dc32f"]

Related

Watir: How to retrieve all HTML elements that match an attribute? (class, id, title, etc)

I have a page that is dynamically created and displays a list of products with their prices. Since it's dynamic, the same code is reused to create each product's information, so they share the tags and same classes. For instance:
<div class="product">
<div class="name">Product A</div>
<div class="details">
<span class="description">Description A goes here...</span>
<span class="price">$ 180.00</span>
</div>
</div>
<div class="product">
<div class="name">Product B</div>
<div class="details">
<span class="description">Description B goes here...</span>
<span class="price">$ 43.50</span>
</div>
</div>`
<div class="product">
<div class="name">Product C</div>
<div class="details">
<span class="description">Description C goes here...</span>
<span class="price">$ 51.85</span>
</div>
</div>
And so on.
What I need to do with Watir is recover all the texts inside the spans with class="price", in this example: $ 180.00, $43.50 and $51.85.
I've been playing around with something like this:
#browser.span(:class, 'price').each do |row| but is not working.
I'm just starting to use loops in Watir. Your help is appreciated. Thank you!
You can use pluralized methods for retrieving collections - use spans instead of span:
#browser.spans(:class => "price")
This retrieves a span collection object which behaves in similar to the Ruby arrays so you can use Ruby #each like you tried, but i would use #map instead for this situation:
texts = #browser.spans(:class => "price").map do |span|
span.text
end
puts texts
I would use the Symbol#to_proc trick to shorten that code even more:
texts = #browser.spans(:class => "price").map &:text
puts texts

Parsing nodes with Nokogiri?

I'm parsing web pages and I want to get the link from the <img src> by finding the <div id="image">.
How do I do this in Nokogiri? I tried walking through the child nodes but it fails.
<div id="image" class="image textbox ">
<div class="">
<img src="img.jpg" alt="" original-title="">
</div>
</div>
This is my code:
doc = Nokogiri::HTML(open("site.com"))
doc.css("div.image").each do |node|
node.children().each do |c|
puts c.attr("src")
end
end
Any ideas?
Try this and let me know if it works for you
require 'nokogiri'
source = <<-HTML
<div id="image" class="image textbox ">
<div class="">
<img src="img.jpg" alt="" original-title="">
</div>
</div>
HTML
doc = Nokogiri::HTML(source)
doc.css('div#image > div > img').each do |image|
puts image.attr('src')
end
Output:
img.jpg
Here is a great resource: http://ruby.bastardsbook.com/chapters/html-parsing/
Modifying an example a bit, I get this:
doc = Nokogiri::HTML(open("site.com"))
doc.css("div.image img").each do |img|
puts img.attr("src")
end
Although you should use the ID selector, #image, rather than the class selector, .image, when you can. It is very much faster.

In ruby when I try mytext.include? (">Model number<") is returning false

In ruby when I try mytext.include?(">Model number<") is returning false.
But mytext.include?("Model number") is returning true
What is wrong in the first condition?
mytext contains the string "Model number" inside ">" and "<"
This is relevant HTML:
<div class="bucket"> <div class="h1"><strong>Product Specifications</strong></div> <div class="content"> <div class="tsSectionHeader">Product Information</div> <div class="tsTable"> <div class="tsRow"><span class="tsLabel">Model number</span><span>516C</span></div> <div class="tsRow"><span class="tsLabel">Maximum weight recommendation</span><span>35 Pounds</span></div> <div class="tsRow"><span class="tsLabel">Material Type</span><span>Wood</span></div> </div> </div> </div>
You have to learn some HTML. > and < are part of span tag: <span></span>.
This is where the text appears:
<span class="tsLabel">Model number</span>
So a span has text Model number. You can get the text using Watir with this:
browser.span(:class => "tsLabel").text

Accessing a div element in an array of li elements

I am trying to access a div in an li array
<ul>
<li class="views-row views-row-1 views-row-odd views-row-first">
<div class="news-item">
</li>
<li class="views-row views-row-2 views-row-even">
<li class="views-row views-row-3 views-row-odd">
<div class="news-item">
<div class="image">
<div class="details with-image">
<h2>
<p class="standfirst">The best two-seat </p>
<div class="meta">
<div class="pub-date">26 April 2012</div>
<div class="topic-bar clearfix">
<div class="topic car_review">review</div>
</div>
</div>
</div>
</div>
</li>
I am trying to access the "div class="topic car_review">car review "and get its text.
The reason I am specifically using that text is that, depending on what the text is it would enter specific steps.
Code that I am using is
#topic = #browser.li(:class => /views-row-#{x}/).div(:class,'news-item').div(:class,'details').div(:class,'meta').div(:class,/topic /).text
The script was working fine before and suddenly it has stopped working and is just not able to get the div(:class,'news-item').
The error message I get is
unable to locate element, using {:class=>"news-item", :tag_name=>"div"} (Watir::Exception::UnknownObjectException)
I tried div(:class => /news-/) but still its just not able to find that element
I am really stuck!!!
I assume that when you are doing li(:class => /views-row-#{x}/), the x means you are iterating over all rows? If so, then your script will fail on the row-2 since it does not contain the news-item div (resulting in the error that you see).
If there is only one of these 'topic car_review' div tags, you can just do:
#topic = #browser.div(:class, 'topic car_review')
Update - Iterating over each LI:
If you need to iterate over each LI, then you could do:
#browser.lis.each do |li|
#topic = li.div(:class, 'topic car_review').text
end

get div nested in div element using Nokogiri

For following HTML, I want to parse it and get following result using Nokogiri.
event_name = "folk concert 2"
event_link = "http://www.douban.com/event/12761580/"
event_date = "20th,11,2010"
I know doc.xpath('//div[#class="nof clearfix"]') could get each div element, but how should I proceed to get each attribution like event_name, and especially the date?
HTML
<div class="nof clearfix">
<h2>folk concert 2 <span class="pl2"> </span></h2>
<div class="pl intro">
Date:25th,11,2010<br/>
</div>
</div>
<div class="nof clearfix">
<h2>folk concert <span class="pl2"> </span></h2>
<div class="pl intro">
Date:10th,11,2010<br/>
</div>
</div>
I don't know xpaths, I prefer to use css selectors, they make more sense to me. This tutorial might be useful for you.
require 'rubygems'
require 'nokogiri'
require 'pp'
Event = Struct.new :name , :link , :date
doc = Nokogiri::HTML DATA
events = doc.css("div.nof.clearfix").map do |eventnode|
name = eventnode.at_css("h2 a").text.strip
link = eventnode.at_css("h2 a")['href']
date = eventnode.at_css("div.pl.intro").text.strip
Event.new name , link , date
end
pp events
__END__
<div class="nof clearfix">
<h2>folk concert 2 <span class="pl2"> </span></h2>
<div class="pl intro">
Date: 25th,11,2010<br/>
</div>
</div>
<div class="nof clearfix">
<h2>folk concert <span class="pl2"> </span></h2>
<div class="pl intro">
Date: 10th,11,2010<br/>
</div>
</div>
This outputs:
[#<struct Event
name="folk concert 2",
link="http://www.douban.com/event/12761580/",
date="Date: 25th,11,2010">,
#<struct Event
name="folk concert",
link="http://www.douban.com/event/12761581/",
date="Date: 10th,11,2010">]

Resources