How to get text from list items with Mechanize?

How to get text from list items with Mechanize? - ruby

<div class="carstd">
<ul>
<li class="cars">"Car 1"</li>
<li class="cars">"Car 2"</li>
<li class="cars">"Car 3"</li>
<li class="cars">"Car 4"</li>
</ul>
</div>
I want strip the text from each list item with mechanize and print it out. I've tried
puts page.at('.cars').text.strip but it only gets the first item. I've also tried
page.links.each do |x|
puts x.at('.cars').text.strip
end
But I get an error undefined method 'at' for #<Mechanize::Page::Link:0x007fe7ea847810>.

There's no links there. Links are a elements that get converted into special Mechanize objects.
You want something like:
page.search('li.cars').text # the text of all the li's mashed together as a string
or
page.search('li.cars').map{|x| x.text} # the text of each `li` as an array of strings

Related

Nokogiri only get list items with links first

I have a document that looks like the following:
<ul>
<li>
LinkContent
</li>
</li>
Content Link
</li>
</ul>
I would like to only obtain the list items that start with an <a> tag, i.e. the first <li> would be a hit but the second would not.
I tried getting all list items and regex matching on the html content but it doesn't appear to be working:
list.search('li').each do |item|
if /^<a href="\/Synergies".*$/.match(item)
puts link # hit?
end
end
Any advice would be appreciated!

You can check whether the item's first child is either not text or empty text:
list.search('li').each do |item|
if !item.children.first.text? || item.children.first.text.strip.empty?
puts item # hit?
end
end
If you want to exclude items that don't begin with a link, you can select the first child and check its parents in the condition:
list.search('li > a:first-child').each do |item|
if !item.parent.children.first.text? || item.parent.children.first.text.strip.empty?
puts item # hit?
end
end

watir webdriver print the count of a list item

I'm trying to get the count of a list item in a web page and outputting this:
office_lists = browser_driver.li(:class, 'office')
office_list = browser_driver.li(:class, 'office')
office_list = Hash.new 0
office_list.links.each do |link|
office_list[link] += 1
puts office_list
But I have been unsuccessful. I was hoping someone could. Say these are all my li on the page
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
<li class=‘office’></li>
i would then like to put this in some sort of an array object, then get the count and the output the number of items like so:
puts "There are #{count} number of offices in the list"
Any help would be much appreciated.
also do i need to require anything such as "pp" or "p" or anything else or will watir be enough for this task?

You are overriding what you have declared first
For an example,
office_lists = browser_driver.li(:class, 'office')
office_list = browser_driver.li(:class, 'office')
office_list = Hash.new 0
You are replacing office_list by Hash so your browser_driver.li(:class, 'office') will no longer available in office_list.
And I don't understand whether you want to count the links under a particular list or you want to count total number of lists.
If you want to count links under a particular list, then write the below code
p browser.li(:class, 'office').links.count
The above line will print the number of links under that list
If you want to count total numbers of lists, then write the following code
count= browser.lis(:class, 'office').count #its `lis` not `li`
puts "There are #{count} number of offices in the list"

Instead of rolling your own method, you can use the built-in lis method, which returns a collection of li elements.
Here's an example the collects the li elements with a class attribute of "office" and then chains Array::count to return the number of elements in the collection:
HTML (in a local file named foo.rb):
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
<li class='office'></li>
watir snippet:
require 'watir'
b = Watir::Browser.new :chrome
b.goto "file:///C:/foo.html"
count = b.lis(class: 'office').count
puts count
#=> 6
puts "There are #{count} number of offices in the list"
#=> There are 6 number of offices in the list

Using Nokogiri to find element before another element

I have a partial HTML document:
<h2>Destinations</h2>
<div>It is nice <b>anywhere</b> but here.
<ul>
<li>Florida</li>
<li>New York</li>
</ul>
<h2>Shopping List</h2>
<ul>
<li>Booze</li>
<li>Bacon</li>
</ul>
On every <li> item, I want to know the category the item is in, e.g., the text in the <h2> tags.
This code does not work, but this is what I'm trying to do:
#page.search('li').each do |li|
li.previous('h2').text
end

Nokogiri allows you to use xpath expressions to locate an element:
categories = []
doc.xpath("//li").each do |elem|
categories << elem.parent.xpath("preceding-sibling::h2").last.text
end
categories.uniq!
p categories
The first part looks for all "li" elements, then inside, we look for the parent (ul, ol), the for an element before (preceding-sibling) which is an h2. There can be more than one, so we take the last (ie, the one closest to the current position).
We need to call "uniq!" as we get the h2 for each 'li' (as the 'li' is the starting point).
Using your own HTML example, this code output:
["Destinations", "Shopping List"]

You are close.
#page.search('li').each do |li|
category = li.xpath('../preceding-sibling::h2').text
puts "#{li.text}: category #{category}"
end

The code:
categories = []
Nokogiri::HTML("yours HTML here").css("h2").each do |category|
categories << category.text
end
The result:
categories = ["Destinations", "Shopping List"]

Match and exclude multiple classes with Watir

I would like to be able to match against a class while excluding certain classes as well.
I can use something like follows to get all li elements that match the specified class, but I'm not sure how I can screen out classes at the same time.
b = Watir::Browser.new
free_boxes = b.lis(:class, /cellGridGameStandard/)
I would like to change this into something that will match all li elements with the cellGridGameStandard class, but excludes all elements that also contain either the notEligible class or the ownAlready class.

Here are a couple of options.
Let us assume that the html is like:
<ul>
<li class="cellGridGameStandard">
Element 1
</li>
<li class="cellGridGameStandard ownAlready">
Element 2
</li>
<li class="cellGridGameStandard notEligible">
Element 3
</li>
<li class="cellGridGameStandard">
Element 4
</li>
</ul>
The first and fourth li elements match the specified criteria.
One option would be to check for lis that do not have the ownAlready or notEligible class:
matching = browser.lis(:class => 'cellGridGameStandard')
.find_all { |li|
['ownAlready', 'notEligible'].none? {
|class_name| li.class_name.split.include? class_name
}
}
p matching.collect(&:text)
#=> ["Element 1", "Element 4"]
Another option, which is easier to write but sometimes considered harder to read, is to use a css locator:
matching = browser.elements(:css => 'li.cellGridGameStandard:not(.ownAlready):not(.notEligible)')
p matching.collect(&:text)
#=> ["Element 1", "Element 4"]

Get element after another elements with Hpricot and Ruby

I have the following HTML:
<ul class="filtering_new" width="50%">
<li class="filter">1</li>
<li class="filter">2</li>
<script>Alert('1');</script>
<li class="filter">3</li>
</ul>
How can I get li with inner_html = 3?
I tried like this:
page.search("//ul.filtering_new").each do |list|
puts list.search("li").size
end
where page is the HTML document.
size = 2, but it should be 3.
I tried to do like in manual https://github.com/hpricot/hpricot/wiki/hpricot-challenge
but I cannot even find <script.
list.search("script")
returns nothing.

I don't think you can mixup XPath with CSS Selector when using search. In your example you do. Try:
//ul[#class='filtering_new']
or
ul.filtering_new
inside search.

Most XML/HTML parsing in Ruby uses Nokogiri these days, so I'll recommend that parser. However, both Hpricot and Nokogiri support XPath and CSS, so they are fairly interchangeable.
I'd go about it this way:
html = <<EOT
<ul class="filtering_new" width="50%">
<li class="filter">1</li>
<li class="filter">2</li>
<script>Alert('1');</script>
<li class="filter">3</li>
</ul>
EOT
require 'nokogiri'
doc = Nokogiri::HTML(html)
li = doc.search('//li[#class="filter"]').select{ |n| n.text.to_i == 3 }
li # => [#<Nokogiri::XML::Element:0x8053fc84 name="li" attributes=[#<Nokogiri::XML::Attr:0x8053fb6c name="class" value="filter">] children=[#<Nokogiri::XML::Text:0x80546f98 "3">]>]
That finds the candidate nodes, then returns them as a NodeSet to be iterated over, where they are selected/rejected based on the node's text.
li = doc.search('//li[text() = "3"]')
li # => [#<Nokogiri::XML::Element:0x8053fc84 name="li" attributes=[#<Nokogiri::XML::Attr:0x8053fb6c name="class" value="filter">] children=[#<Nokogiri::XML::Text:0x80546f98 "3">]>]
That offloads more of the comparison to the underlying libXML library, where it runs a lot faster.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get text from list items with Mechanize? - ruby

There's no links there. Links are a elements that get converted into special Mechanize objects. You want something like: page.search('li.cars').text # the text of all the li's mashed together as a string or page.search('li.cars').map{|x| x.text} # the text of each `li` as an array of strings

Related

Nokogiri only get list items with links first

watir webdriver print the count of a list item

Using Nokogiri to find element before another element

Match and exclude multiple classes with Watir

Get element after another elements with Hpricot and Ruby

Categories

Resources