Nokogiri only get list items with links first - ruby

I have a document that looks like the following:
<ul>
<li>
LinkContent
</li>
</li>
Content Link
</li>
</ul>
I would like to only obtain the list items that start with an <a> tag, i.e. the first <li> would be a hit but the second would not.
I tried getting all list items and regex matching on the html content but it doesn't appear to be working:
list.search('li').each do |item|
if /^<a href="\/Synergies".*$/.match(item)
puts link # hit?
end
end
Any advice would be appreciated!

You can check whether the item's first child is either not text or empty text:
list.search('li').each do |item|
if !item.children.first.text? || item.children.first.text.strip.empty?
puts item # hit?
end
end
If you want to exclude items that don't begin with a link, you can select the first child and check its parents in the condition:
list.search('li > a:first-child').each do |item|
if !item.parent.children.first.text? || item.parent.children.first.text.strip.empty?
puts item # hit?
end
end

Related

watir webdriver print the count of a list item

I'm trying to get the count of a list item in a web page and outputting this:
office_lists = browser_driver.li(:class, 'office')
office_list = browser_driver.li(:class, 'office')
office_list = Hash.new 0
office_list.links.each do |link|
office_list[link] += 1
puts office_list
But I have been unsuccessful. I was hoping someone could. Say these are all my li on the page
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
i would then like to put this in some sort of an array object, then get the count and the output the number of items like so:
puts "There are #{count} number of offices in the list"
Any help would be much appreciated.
also do i need to require anything such as "pp" or "p" or anything else or will watir be enough for this task?
You are overriding what you have declared first
For an example,
office_lists = browser_driver.li(:class, 'office')
office_list = browser_driver.li(:class, 'office')
office_list = Hash.new 0
You are replacing office_list by Hash so your browser_driver.li(:class, 'office') will no longer available in office_list.
And I don't understand whether you want to count the links under a particular list or you want to count total number of lists.
If you want to count links under a particular list, then write the below code
p browser.li(:class, 'office').links.count
The above line will print the number of links under that list
If you want to count total numbers of lists, then write the following code
count= browser.lis(:class, 'office').count #its `lis` not `li`
puts "There are #{count} number of offices in the list"
Instead of rolling your own method, you can use the built-in lis method, which returns a collection of li elements.
Here's an example the collects the li elements with a class attribute of "office" and then chains Array::count to return the number of elements in the collection:
HTML (in a local file named foo.rb):
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
watir snippet:
require 'watir'
b = Watir::Browser.new :chrome
b.goto "file:///C:/foo.html"
count = b.lis(class: 'office').count
puts count
#=> 6
puts "There are #{count} number of offices in the list"
#=> There are 6 number of offices in the list

How to define custom locating strategy for select

I looking for a proper way to redefine/extend locating strategy for select tag in Gwt app.
From html snippet you can see that select tag is not visible.
So to select option from list I need to click on button tag, and than select needed li tag from dropdown.
<div class="form-group">
<select class="bootstrap-select form-control" style="display: none; locator='gender">
<div class="btn-group">
<button class="dropdown-toggle" type="button" title="Male">
<div class="dropdown-menu open">
<ul class="dropdown-menu inner selectpicker" role="menu">
<li data-original-index="1"> (contains a>span with option text)
.....more options
</ul>
</div>
</div>
</div>
I see dirty solution: to implement method in BasePage class. This approach nice page_object sugar(options,get value, etc):
def set_nationality(country, nationality='Nationality')
select = button_element(xpath: "//button[#title='#{nationality}']")
select.click
option = span_element(xpath: "//span[.='#{country}']")
option.when_visible
option.click
end
Is there any other more clear way to do so? Using `PageObject::Widgets maybe?
UPD: Here what I expect to get:
def bool_list(name, identifier={:index => 0}, &block)
define_method("#{name}_btn_element") do
platform.send('button_for', identifier.clone + "//button")
end
define_method("#{name}?") do
platform.send('button_for', identifier.clone + "//button").exists?
end
define_method(name) do
return platform.select_list_value_for identifier.clone + '/select' unless block_given?
self.send("#{name}_element").value
end
define_method("#{name}=") do |value|
return platform.select_list_value_set(identifier.clone + '/select', value) unless block_given?
self.send("#{name}_element").select(value)
end
define_method("#{name}_options") do
element = self.send("#{name}_element")
(element && element.options) ? element.options.collect(&:text) : []
end
end
The select list appears to have the most identify attributes, therefore I would use it as the base element of the widget. All of the other elements, ie the button and list items, would need to be located with respect to the select list. In this case, they all share the same div.form-group ancestor.
The widget could be defined as:
class BoolList < PageObject::Elements::SelectList
def select(value)
dropdown_toggle_element.click
option = span_element(xpath: "./..//span[.='#{value}']")
option.when_visible
option.click
end
def dropdown_toggle_element
button_element(xpath: './../div/button')
end
def self.accessor_methods(widget, name)
widget.send('define_method', "#{name}_btn_element") do
self.send("#{name}_element").dropdown_toggle_element
end
widget.send('define_method', "#{name}?") do
self.send("#{name}_btn_element").exists?
end
widget.send('define_method', name) do
self.send("#{name}_element").value
end
widget.send('define_method', "#{name}=") do |value|
self.send("#{name}_element").select(value)
end
widget.send('define_method', "#{name}_options") do
# Since the element is not displayed, we need to check the inner HTML
element = self.send("#{name}_element")
(element && element.options) ? element.options.map { |o| o.element.inner_html } : []
end
end
end
PageObject.register_widget :bool_list, BoolList, :select
Notice that all locators are in relation to the select list. As well, notice that we use the accessor_methods to add the extra methods to the page object.
The page object would then use the bool_list accessor method. Note that the identifier is for locating the select element, which we said would be the base element of the widget.
class MyPage
include PageObject
bool_list(:gender, title: 'Gender')
bool_list(:nationality, title: 'Nationality')
end
The page will now be able to call the following methods:
page.gender_btn_element.click
page.gender_btn_element.exists?
page.gender
page.gender = 'Female'
page.gender_options
page.nationality_btn_element.click
page.nationality_btn_element.exists?
page.nationality
page.nationality = 'Barbados'
page.nationality_options

How to get text from list items with Mechanize?

<div class="carstd">
<ul>
<li class="cars">"Car 1"</li>
<li class="cars">"Car 2"</li>
<li class="cars">"Car 3"</li>
<li class="cars">"Car 4"</li>
</ul>
</div>
I want strip the text from each list item with mechanize and print it out. I've tried
puts page.at('.cars').text.strip but it only gets the first item. I've also tried
page.links.each do |x|
puts x.at('.cars').text.strip
end
But I get an error undefined method 'at' for #<Mechanize::Page::Link:0x007fe7ea847810>.
There's no links there. Links are a elements that get converted into special Mechanize objects.
You want something like:
page.search('li.cars').text # the text of all the li's mashed together as a string
or
page.search('li.cars').map{|x| x.text} # the text of each `li` as an array of strings

Using Nokogiri to find element before another element

I have a partial HTML document:
<h2>Destinations</h2>
<div>It is nice <b>anywhere</b> but here.
<ul>
<li>Florida</li>
<li>New York</li>
</ul>
<h2>Shopping List</h2>
<ul>
<li>Booze</li>
<li>Bacon</li>
</ul>
On every <li> item, I want to know the category the item is in, e.g., the text in the <h2> tags.
This code does not work, but this is what I'm trying to do:
#page.search('li').each do |li|
li.previous('h2').text
end
Nokogiri allows you to use xpath expressions to locate an element:
categories = []
doc.xpath("//li").each do |elem|
categories << elem.parent.xpath("preceding-sibling::h2").last.text
end
categories.uniq!
p categories
The first part looks for all "li" elements, then inside, we look for the parent (ul, ol), the for an element before (preceding-sibling) which is an h2. There can be more than one, so we take the last (ie, the one closest to the current position).
We need to call "uniq!" as we get the h2 for each 'li' (as the 'li' is the starting point).
Using your own HTML example, this code output:
["Destinations", "Shopping List"]
You are close.
#page.search('li').each do |li|
category = li.xpath('../preceding-sibling::h2').text
puts "#{li.text}: category #{category}"
end
The code:
categories = []
Nokogiri::HTML("yours HTML here").css("h2").each do |category|
categories << category.text
end
The result:
categories = ["Destinations", "Shopping List"]

Match and exclude multiple classes with Watir

I would like to be able to match against a class while excluding certain classes as well.
I can use something like follows to get all li elements that match the specified class, but I'm not sure how I can screen out classes at the same time.
b = Watir::Browser.new
free_boxes = b.lis(:class, /cellGridGameStandard/)
I would like to change this into something that will match all li elements with the cellGridGameStandard class, but excludes all elements that also contain either the notEligible class or the ownAlready class.
Here are a couple of options.
Let us assume that the html is like:
<ul>
<li class="cellGridGameStandard">
Element 1
</li>
<li class="cellGridGameStandard ownAlready">
Element 2
</li>
<li class="cellGridGameStandard notEligible">
Element 3
</li>
<li class="cellGridGameStandard">
Element 4
</li>
</ul>
The first and fourth li elements match the specified criteria.
One option would be to check for lis that do not have the ownAlready or notEligible class:
matching = browser.lis(:class => 'cellGridGameStandard')
.find_all { |li|
['ownAlready', 'notEligible'].none? {
|class_name| li.class_name.split.include? class_name
}
}
p matching.collect(&:text)
#=> ["Element 1", "Element 4"]
Another option, which is easier to write but sometimes considered harder to read, is to use a css locator:
matching = browser.elements(:css => 'li.cellGridGameStandard:not(.ownAlready):not(.notEligible)')
p matching.collect(&:text)
#=> ["Element 1", "Element 4"]

Resources