nokogiri not recognising classes with hyphens - ruby

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.priceangels.com/site-map.html"
doc = Nokogiri::HTML(open(url))
doc.css('.lav1').each do |item|
puts item.text
end
doc.css('.masonry-brick').each do |item|
puts item.text
end
This is my first time using nokogiri. The first each loop behaves as expected. The second each loop fails to find any matches.
Does Nokogiri not recognise class names with dashes (hyphens)?
How do I get nokogiri to find the '.masonry-brick' classes?

doc.css("ul.sitemap-item a").each do |me|
puts me.text
end
Is this what you were looking for?
also
<div class="hello world">
doc.css("div[#class='hello world']")
You can use that if you're having problems with spaces.

Related

nokogiri run well but doesn't return anything

Hi and thanks for reading !
I'm learning how to use xpath and nokogiri and I followed same instructions than the tutorial on Engine Yard.
I copy / paste exactly the same code, it runs well on terminal and ended (no error message are returned) but nothing is returned. It should return all the titles with hyperlink but actually it just ended like there is nothing to return.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(URI.open("http://www.google.com/search?q=doughnuts"))
doc.xpath('//h3/a').each do |node|
puts node.text
end
# puts doc.class
I tried puts doc.class instead of puts node.text and it did the same thing (run well, ended without errors, return nothing)
I also tried puts doc.class instead of
doc.xpath('//h3/a').each do |node|
puts doc.class
end
and it return well : "Nokogiri::HTML::Document" so problem come from my xpath but i don't know why...
If someone can help me with this, I'll be glad ! :)
looking in google page structure, the 'h3' element is inside 'a' element. You can try something like this. I think its will work.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(URI.open("http://www.google.com/search?q=doughnuts"))
doc.xpath('//h3').each do |node|
puts node.text
puts node.parent.xpath('./#href')
end

Extract by <p> between h3 content nokogiri

I am trying to extract only the <p> that exist between Vigentes and Finalizados without achieving it.
require 'nokogiri'
require 'open-uri'
require 'time'
#url = "http://www.caru.org.uy/web/servicios/llamados-a-concurso-publico-para-contratar-personal/"
page = Nokogiri::HTML(open(#url))
div_content = page.css('.contenido')
div_content.each do |item|
puts item.text
break if item.css('h3').text == "Finalizados"
end
You should be able to do:
css = 'h3:contains(Vigentes) ~ p:has(~ h3:contains(Finalizados))'
But unfortunately, nokogiri doesn't behave properly for this one so we'll use xpath:
xpath = "//h3[contains(text(), 'Vigentes')]/following-sibling::p[./following-sibling::h3[contains(text(), 'Finalizados')]]"
page.search(xpath).each do |p|
# do something
end

How to print XPath search?

I'm trying to parse an XML file with Nokogiri:
require 'nokogiri'
require 'open-uri'
#doc = Nokogiri::XML(open('http://xml.pinnaclesports.com/pinnacleFeed.aspx?sportType=E%20Sports&contest=no'))
#doc.xpath("//event[#league='*LOL*']")
print #doc.text
which works and prints all the events that contain "LOL" in the "league" attribute, but when I create a block, it runs but prints nothing:
#doc.xpath("//event[#league='*LOL*']").each do |league_element|
puts "\n"+league_element.xpath('league').text
end
require 'nokogiri'
require 'open-uri'
#doc = Nokogiri::XML(open('http://xml.pinnaclesports.com/pinnacleFeed.aspx?sportType=E%20Sports&contest=no'))
events = #doc.xpath("//event[#league='*LOL*']")
puts #doc.children
.children "returns a new NodeSet containing all the children of all the nodes in the NodeSet." you can keep filtering node names and values using children.xpath()
For example:
#doc = Nokogiri::XML(open('http://xml.pinnaclesports.com/pinnacleFeed.aspx?sportType=E%20Sports&contest=no'))
events = #doc.xpath("//event[#league='*LOL*']")
puts #doc.children.xpath('//league').text
=> LOL Cham Kor
=> LOL Cham Kor
=> ....
Or
#doc.children.each do |item|
puts item.xpath('//league')
end

Search Websites Content

How do you search a Websites source code with ruby, hard to explain but heres the code for doing it in python
import urllib2, re
word = "How to ask"
source = urllib2.urlopen("http://stackoverflow.com").read()
if re.search(word,source):
print "Found it "+word
Here's one way:
require 'open-uri'
word = "How to ask"
open('http://stackoverflow.com') do |f|
puts "Found it #{word}" if f.read =~ /#{word}/
end
If all you want to do is search jcrossley3 gave you your answere. If you want to do something more complicated you should look at an HTML parser that can let you treat the website like a DOM Tree. Have a look at why´s great hpricot gem to do just that.
require 'hpricot'
require 'open-uri'
doc = open("http://qwantz.com/") { |f| Hpricot(f) }
doc.search("//p[#class='posted']")
(doc/"p/a/img").each do |img|
puts img.attributes['class']
end

Ruby Regex Help

I want to Extract the Members Home sites links from a site.
Looks like this
<a href="http://www.ptop.se" target="_blank">
i tested with it this site
http://www.rubular.com/
<a href="(.*?)" target="_blank">
Shall output http://www.ptop.se,
Here comes the code
require 'open-uri'
url = "http://itproffs.se/forumv2/showprofile.aspx?memid=2683"
open(url) { |page| content = page.read()
links = content.scan(/<a href="(.*?)" target="_blank">/)
links.each {|link| puts #{link}
}
}
if you run this, it dont works. why not?
I would suggest that you use one of the good ruby HTML/XML parsing libraries e.g. Hpricot or Nokogiri.
If you need to log in on the site you might be interested in a library like WWW::Mechanize.
Code example:
require "open-uri"
require "hpricot"
require "nokogiri"
url = "http://itproffs.se/forumv2"
# Using Hpricot
doc = Hpricot(open(url))
doc.search("//a[#target='_blank']").each { |user| puts "found #{user.inner_html}" }
# Using Nokogiri
doc = Nokogiri::HTML(open(url))
doc.xpath("//a[#target='_blank']").each { |user| puts "found #{user.text}" }
Several issues with your code
I don't know what you mean by using
{link}. But if you want to append a '#' character to the link make sure
you wrap that with quotes. ie
"#{link}"
String.scan accepts a block. Use it
to loop through the matches.
The page you are trying to access
does not return any links that the
regex would match anyway.
Here's something that would work:
require 'open-uri'
url = "http://itproffs.se/forumv2/"
open(url) do |page|
content = page.read()
content.scan(/<a href="(.*?)" target="_blank">/) do |match|
match.each { |link| puts link}
end
end
There're better ways to do it, I am sure. But this should work.
Hope it helps

Resources