I am working on a CLI Project and trying to open up a web page by using url variable declared in another method.
def self.open_deal_page(input)
index = input.to_i - 1
#deals = PopularDeals::NewDeals.new_deals
#deals.each do |info|
d = info[index]
#product_url = "#{d.url}"
end
#product_url.to_s
puts "They got me!"
end
def self.deal_page(product_url)
#self.open_deal_page(input)
deal = {}
html = Nokogiri::HTML(open(#product_url))
doc = Nokogiri::HTML(html)
deal[:name] = doc.css(".dealTitle h1").text.strip
deal[:discription] = doc.css(".textDescription").text.strip
deal[:purchase] = doc.css("div a.button").attribute("href")
deal
#binding.pry
end
but I am receiving this error.
`open': no implicit conversion of nil into String (TypeError)
any possible solution? Thank you so much in advance.
Try returning your #product_url within your open_deal_page method, because now you're returning puts "They got me!", and also note that your product_url is being created inside your each block, so, it won't be accessible then, try creating it before as an empty string and then you can return it.
def open_deal_page(input)
...
# Create the variable
product_url = ''
# Assign it the value
deals.each do |info|
product_url = "#{info[index].url}"
end
# And return it
product_url
end
In your deal_page method tell to Nokogiri to open the product_url that you're passing as argument.
def deal_page(product_url)
...
html = Nokogiri::HTML(open(product_url))
...
end
Related
I'm trying to scrape a website however I cannot seem to get my while-loop to break out once it hits a page with no more information:
def scrape_verse_items(keyword)
pg = 1
while pg < 1000
puts "page #{pg}"
url = "https://www.bible.com/search/bible?page=#{pg}&q=#{keyword}&version_id=1"
doc = Nokogiri::HTML(open(url))
items = doc.css("ul.search-result li.reference")
error = doc.css('div#noresults')
until error.any? do
if keyword != ''
item_hash = {}
items.each do |item|
title = item.css("h3").text.strip
content = item.css("p").text.strip
item_hash[title] = content
end
else
puts "Please enter a valid search"
end
if error.any?
break
end
end
pg += 1
end
item_hash
end
puts scrape_verse_items('joy')
I know this doesn't exactly answer your question, but perhaps you might consider using a different approach altogether.
Using while and until loops can get a bit confusing, and usually isn't the most performant way of doing things.
Maybe you would consider using recursion instead.
I've written a small script that seems to work :
class MyScrapper
def initialize;end
def call(keyword)
puts "Please enter a valid search" && return unless keyword
scrape({}, keyword, 1)
end
private
def scrape(results, keyword, page)
doc = load_page(keyword, page)
return results if doc.css('div#noresults').any?
build_new_items(doc).merge(scrape(results, keyword, page+1))
end
def load_page(keyword, page)
url = "https://www.bible.com/search/bible?page=#{page}&q=#{keyword}&version_id=1"
Nokogiri::HTML(open(url))
end
def build_new_items(doc)
items = doc.css("ul.search-result li.reference")
items.reduce({}) do |list, item|
title = item.css("h3").text.strip
content = item.css("p").text.strip
list[title] = content
list
end
end
end
You call it by doing MyScrapper.new.call("Keyword") (It might make more sense to have this as a module you include or even have them as class methods to avoid the need to instantiate the class.
What this does is, call a method called scrape and you give it the starting results, keyword, and page. It loads the page, if there are no results it returns the existing results it has found.
Otherwise it builds a hash from the page it loaded, and then the method calls itself, and merges the results with the new hash it just build. It does this till there are no more results.
If you want to limit the page results you can just change this like:
return results if doc.css('div#noresults').any?
to this:
return results if doc.css('div#noresults').any? || page > 999
Note: You might want to double-check the results that are being returned are correct. I think they should be but I wrote this quite quickly, so there could always be a small bug hiding somewhere in there.
I'm trying to scrape a website's content to instantiate objects out of the data, and I'm running into a problem with a dead link on the page I'm scraping. I want to figure out how I can simply not iterate over that link and avoid scraping it altogether.
I tried using this, but it didn't work:
name = li.css("strong a").text.strip unless li.nil?
url = li.css("a")[0].attr("href") unless li.nil?
Player.new(name,url)
class HomepageScraper
BASE_URL = "https://www.nba.com/history/nba-at-50/top-50-players"
def self.scrape_players
page = open(BASE_URL)
parsed_HTML = Nokogiri::HTML(page)
name_lis = parsed_HTML.css("div.field-item li")
name_lis.each do |li|
name = li.css("strong a").text.strip
url = li.css("a")[0].attr("href")
Player.new(name,url)
end
end
end
I expected example output to be:
#name = "Shaquille o neal", #url = "www.nba..."
But received:
#name = "Shaquille o neal", #url = nil
The error message is:
undefined method `attr' for nil:NilClass (NoMethodError)
If you run at least Ruby 2.3, do a
url = li.css("a")[0]&.attr("href")
This sets url to nil, if the part to the left of &. is nil, and applies attr otherwise.
You should use the compact method on Array.
It is a useful method if you need to remove nil values from an array.
For example:
[1, nil, 2, nil].compact => [1, 2]
In your case:
name_lis.compact.each do |li|
end
I have been trying to use Minitest to test my code (full repo) but am having trouble with one method which downloads a SHA1 hash from a .txt file on a website and returns the value.
Method:
def download_remote_sha1
#log.info('Downloading Elasticsearch SHA1.')
#remote_sha1 = ''
Kernel.open(#verify_url) do |file|
#remote_sha1 = file.read
end
#remote_sha1 = #remote_sha1.split(/\s\s/)[0]
#remote_sha1
end
You can see that I log what is occurring to the command line, create an object to hold my SHA1 value, open the url (e.g. https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.deb.sha1.txt)
I then split the string so that I only have the SHA1 value.
The problem is that during a test, I want to stub the Kernel.open which uses OpenURI to open the URL. I would like to ensure that I'm not actually reaching out to download any file, but rather I'm just passing the block my own mock IO object testing just that it correctly splits stuff.
I attempted it like the block below but when #remote_sha1 = file.read occurs the file item is nil.
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub :open, #mock_file do
#downloader = ElasticsearchUpdate::Downloader.new(hash, true)
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end
I was working on this question too, but matt figured it out first. To add to what matt posted:
When you write:
Kernel.stub(:open, #mock_file) do
#block code
end
...that means when Kernel.open() is called--in any code, anywhere before the stub() block ends--the return value of Kernel.open() will be #mock_file. However, you never use the return value of Kernel.open() in your code:
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
If you wanted to use the return value of Kernel.open(), you would have to write:
return_val = Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#do something with return_val
Therefore, the return value of Kernel.open() is irrelevant in your code--which means the second argument of stub() is irrelevant.
A careful examination of the source code for stub() reveals that stub() takes a third argument--an argument which will be passed to a block specified after the stubbed method call. You, in fact, have specified a block after your stubbed Kernel.open() method call:
stubbed method call -+ +- start of block
| | |
V V V
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
^
|
end of block
So, in order to pass #mockfile to the block you need to specify it as the third argument to Kernel.stub():
Kernel.stub(:open, 'irrelevant', #mock_file) do
end
Here is a full example for future searchers:
require 'minitest/autorun'
class Dog
def initialize
#verify_url = 'http://www.google.com'
end
def download_remote_sha1
#remote_sha1 = ''
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#puts #remote_sha1[0..300]
#remote_sha1 = #remote_sha1.split(" ")[0] #Using a single space for the split() pattern will split on contiguous whitespace.
end
end
#Dog.new.download_remote_sha1
describe 'downloaded file' do
it 'should be an sha1 code' do
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub(:open, 'irrelevant', #mock_file) do
#downloader = Dog.new
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end
end
end
xxx
The second argument to stub is what you want the return value to be for the duration of your test, but the way Kernel.open is used here requires the value it yields to the block to be changed instead.
You can achieve this by providing a third argument. Try changing the call to Kernel.stub to
Kernel.stub :open, true, #mock_file do
#...
Note the extra argument true, so that #mock_file is now the third argument and will be yielded to the block. The actual value of the second argument doesn’t really matter in this case, you might want to use #mock_file there too to more closely correspond to how open behaves.
I am working on a program that will eventually compare two .csv files and print out any variances between the two. However, at the moment I can't get past a "can't convert nil into String (TypeError)" when reading one of the files.
Here is a sample line from the problematic csv file:
11/13/15,11:31:00,ABCD,4000150097,1321126281700ABCDEF,WR00002440,,,4001,1392,AI,INTERNAL RETURN,INBOUND,,ABCDEF
And here is my code so far:
require 'csv'
class CSVReportCompare
def initialize(filename_data, filename_compare)
puts "setting filename_data=", filename_data
puts "setting compare=", filename_compare
#filename_data = filename_data
#filenam_compare = filename_compare
end
def printData
#data = CSV.read(#filename_data)
puts #data.inspect
end
def printCompareData
#compareData = CSV.read(#filename_compare)
puts #compareData.inspect
end
def compareData
end
end
c1 = CSVReportCompare.new("data.csv", "compare_data.csv")
c1.printData
c1.printCompareData
Anyways, is there a way to get around the error?
You have a typo in your initialize method:
#filenam_compare = filename_compare
#-------^ missing "e"
So you're setting the wrong instance variable. Instance variables are created when they're first used and initialized to nil so later, when you try to access #filename_compare, the instance variable with the correct name is created and has a value of nil.
I'm new to Ruby - I'm having troubles on every step...
Imagine a Ruby script main.rb and a lot of unknown script files script1.rb ... scriptN.rb.
Each scriptX.rb contains unique module with one procedure needs to be executed:
Module X
def some_procedure(i)
puts "{#i} Module X procedure executed successfully!"
end
end
All I need is to:
iterate over all files in current directory
if current file has name like /^script.*?\.rb$/
then load it and execute some_procedure
How can I do it in main.rb ?
Thank you in advance!
Choose from these great answers in SO on loading the files: Best way to require all files from a directory in ruby?
Then in your files, just have them execute on load, rather than on a method call.
The problem might be that, when a file is required, it doesn't return the list of modules (or, in general, constants) which it defines. So, unless you don't know which module a script has defined, you will not know where to pass your some_procedure message.
As a workaround, you may try getting the list of defined constants before and after the script was required, find a difference, i.e. list of constants during require, and iterate through all of them, checking which one implements the method you need.
First, we need to put some restriction:
Every file script_my1.rb will have the module named Script_my1. I.e. first letter capitalized, all other letters - lowercase.
Create two files script_my1.rb and script_my2.rb as follows:
---script_my1.rb:
module Script_my1
#value = 0
def self.some_procedure(i)
puts "#{i} my1 executed!"
#value = i
end
def self.another_procedure()
return #value
end
end
---script_my2.rb:
module Script_my2
#value = 0
def self.some_procedure(i)
puts "#{i} my2 executed!"
#value = i
end
def self.another_procedure()
return #value
end
end
Now the main script, that loads and executes some_procedure() in each module, and then another_procedure().
Please notice, that each module can have separated variables with the same name #value.
Moreover, I think every module can be executed in a separate thread and have access to global variables, but I have not tested it yet.
---main.rb:
# Load all files from the current directory
# with name like script_xxx.rb
i = 1
result = nil
Dir['./script_*.rb'].each { |f|
next if File.directory?(f)
require (f)
moduleName = f[2,f.length].rpartition('.rb')[0].capitalize
eval ( "#{moduleName}.some_procedure(%d)" % i )
eval ( "result = #{moduleName}.another_procedure()" )
puts result
i = i + 1
}
Output of this program is:
1 my1 executed!
1
2 my2 executed!
2
That is all!
Some improvement to previous solution can be made. If we want to avoid special naming, we can use global hash to store procedure's names. Each loaded script_xx.rb file would register it's own procedures in this global hash.
Please notice, that in this case we make two cycles:
first we load all files script_xx.b
every file while loading will register it's procedures in $global_procs array.
then iterate over all entries in $global_procs to execute all registered procedures via eval()
Hope, this is a more 'ruby-like' solution!
---script_my1.rb
module My1
#value = 0
def self.some_procedure(i)
puts "#{i} my1 executed!"
#value = i
end
def self.another_procedure()
return #value
end
end
$global_procs << { 'module' => 'My1',
'some_procedure' => 'My1.some_procedure',
'another_procedure' => 'My1.another_procedure' }
---script_my2.rb
module MMM2
#value = 0
def self.some_procedure(i)
puts "#{i} MMM2 executed!"
#value = i
end
def self.another_procedure()
return #value
end
end
$global_procs << { 'module' => 'MMM2',
'some_procedure' => 'MMM2.some_procedure',
'another_procedure' => 'MMM2.another_procedure' }
---main.rb
# Create global array for holding module's info
$global_procs = []
Dir['./script_*.rb'].each { |f|
next if File.directory?(f)
require (f)
}
i = 1
result = nil
$global_procs.each { |p|
puts "Module name: " + p['module']
eval(p['some_procedure']+'(i)')
result = eval(p['another_procedure']+'()')
puts result
i = i + 1
}