Boolean method not returning in different situations [RUBY] - ruby

I'm building a simple web-scraper (scraping jobs from indeed.com) for practice and I'm trying to implement the following method (low_salary?(salary)). The aim is for the method to compare a minimum (i.e. desired) salary, compare it with the offered salary contained in the job object (#salary):
class Job
attr_reader :title, :company, :location, :salary, :url
def initialize(title, company, location, salary, url)
#title = title
#company = company
#location = location
#salary = salary
#url = url
end
def low_salary?(minimum_salary)
return if !#salary
minimum_salary < #salary.split(/[^\d]/)[1..2].join.to_i
end
end
The method works fine when comapring #salary and the min_salary variable given to it, the delete_if appropriately deletes the elements that return true for low_salary? and returns correctly when #salary is nil (indeed listings don't always include the salary so my assumption is that there will be some nil values) in the following test program (Also: I am unsure as to why minimum_salary < #salary works but #salary < minimum_salary doesn't, but this does exactly what I want it to do):
require_relative('job_class.rb')
job = Job.new("designer", "company", "location", "£23,000 a year", "url")
job_results = []
job_results.push(job)
min_salary = 50000
print job.low_salary?(min_salary)
job_results.delete_if { |job| job.low_salary?(min_salary) }
print job_results
However in my scraper program, I get a no method error when calling the method: job_class.rb:16:in "low_salary?": undefined method `join' for nil:NilClass (NoMethodError)
require 'nokogiri'
require 'open-uri'
require_relative 'job_class.rb'
class JobSearchTool
def initialize(job_title, location, salary)
#document = Nokogiri::HTML(open("https://uk.indeed.com/jobs?q=#{job_title.gsub('-', '+')}&l=#{location}"))
#job_listings = #document.css('div.mosaic-provider-jobcards > a')
#salary = salary.to_i
#job_results = []
end
def scrape_jobs
#job_listings.each do |job_card|
#job_results.push(Job.new(
job_card.css('h2 > span').text, #title
job_card.css('span.companyName').text, #company
job_card.css('div.companyLocation').text, #location
job_card.css('span.salary-snippet').text, #salary
job_card['href']) #url
)
end
end
def format_jobs
#job_results.each do |job|
puts <<~JOB
#{job.title} - #{job.company} in #{job.location} :#{job.salary}
Apply at: #{job.url}
---------------------------------------------------------------------------------
JOB
end
end
def check_salary
#job_results.delete_if { |job| job.low_salary?(#salary) }
end
def run
scrape_jobs
check_salary
format_jobs
end
if __FILE__ == $0
job_search_tool = JobSearchTool.new(ARGV[0], ARGV[1], ARGV[2])
job_search_tool.run
end
Obviously something from the scraper programme is influencing the method somehow, but I can't understand what it could be. I'm using the method in the exact same way as the test program, so what difference is causing the method not to return when #salary is nil?

A quick search on the URL you're scraping shows there are job posts that don't have a salary, so, when you get the data from that HTML element and initialize a new Job object, the salary is an empty string, and knowing that "".split(/[^\d]/)[1..2] returns nil, that's the error you get.
You must add a way to handle job posts without a salary:
class Job
attr_reader :title, :company, :location, :salary, :url
def initialize(title, company, location, salary, url)
#title = title
#company = company
#location = location
#salary = salary.to_s # Explicit conversion of nil to string
#url = url
end
def low_salary?(minimum_salary)
return if parsed_salary.zero? # parsed_salary returns always an integer,
# so you can check when is zero,
# and not just when is falsy
minimum_salary < parsed_salary
end
private
def parsed_salary
salary[/(?<=£)(\d|,)*(?=\s)/]
.to_s # converts nil to "" if the regex doesn't capture anything
.tr(",", "") # removes the commas to parse the string as an integer
.to_i # parses the string to its corresponding integer representation
end
end
Notice the regex isn't meant to capture everything, but it works with the salary as rendered in the website.

Related

Delete method in plain Ruby is not working

Please see below.
The delete method is not working and I do not know why.
I am trying to delete a customer without using rails and just plain ruby.
please can you help.
wrong number of arguments (given 0, expected 1) (ArgumentError)
from /Users/mustafaalomer/code/MustafaAlomer711/fullstack-challenges/02-OOP/05-Food-Delivery-Day-One/01-Food-Delivery/app/repositories/customer_repository.rb:28:in `delete'
from /Users/mustafaalomer/code/MustafaAlomer711/fullstack-challenges/02-OOP/05-Food-Delivery-Day-One/01-Food-Delivery/app/controllers/customers_controller.rb:33:in `destroy'
from /Users/mustafaalomer/code/MustafaAlomer711/fullstack-challenges/02-OOP/05-Food-Delivery-Day-One/01-Food-Delivery/router.rb:36:in `route_action'
from /Users/mustafaalomer/code/MustafaAlomer711/fullstack-challenges/02-OOP/05-Food-Delivery-Day-One/01-Food-Delivery/router.rb:13:in `run'
from app.rb:19:in `<main>'
require_relative "../views/customers_view"
require_relative "../models/customer"
class CustomersController
def initialize(customer_repository)
#customer_repository = customer_repository
#customers_view = CustomersView.new
end
def add
# ask user for a name
name = #customers_view.ask_user_for(:name)
# ask user for a address
address = #customers_view.ask_user_for(:address)
# make a new instance of a customer
customer = Customer.new(name: name, address: address)
# add the customer to the repository
#customer_repository.create(customer)
list
end
def list
customers = #customer_repository.all
#customers_view.display_list(customers)
end
def destroy
# ask user for the id to delete
list
id = #customers_view.ask_user_to_delete(:id)
# customer = #customer_repository.find(id)
# #customer_repository.delete(customer)
end
end
require 'csv'
require_relative '../models/customer'
class CustomerRepository
def initialize(csv_file)
#csv_file = csv_file
#customers = []
#next_id = 1
load_csv if File.exist?(csv_file)
end
def all
#customers
end
def create(customer)
customer.id = #next_id
#customers << customer
#next_id += 1
save_csv
end
def find(id)
#customers.find { |customer| customer.id == id}
end
def delete(id)
#customers.delete { |customer| customer.id == id}
end
private
def save_csv
CSV.open(#csv_file, "wb") do |csv|
csv << %w[id name address]
#customers.each do |customer|
csv << [customer.id, customer.name, customer.address]
end
end
end
def load_csv
CSV.foreach(#csv_file, headers: :first_row, header_converters: :symbol) do |row|
row[:id] = row[:id].to_i
#customers << Customer.new(row)
end
#next_id = #customers.last.id + 1 unless #customers.empty?
end
end
delete always takes an argument.
delete_if can be given a block and seems to be what you're looking for.

Adding specific URL to BASE PATH in order to scrape webpage using Nokogiri

I am new to ruby and this site so please bear with me! I have googled endlessly to no fruition.
I am trying to pass in a college object to my class method scrape_college_info that I created in the previous class method scrape_illinois_index_page, so that I may scrape the next level of information for the specific college the user selects using Pry and Nokogiri. Unfortunately, I keep getting an argument error.
I know it isn't the prettiest, but this is my code right now:
class College
attr_accessor :name, :location, :size, :type, :url
BASE_PATH = "https://www.collegesimply.com/colleges/illinois/"
def self.college
self.scrape_colleges
end
def self.scrape_colleges
colleges = self.scrape_illinois_index_page
colleges
end
def self.scrape_illinois_index_page
doc = Nokogiri::HTML(open(BASE_PATH))
# binding.pry
colleges = []
doc.xpath("//tr").each do |doc|
college = self.new
if doc.css("td")[0] != nil
college.name = doc.css("td")[0].text.strip
end
if doc.css("td")[1] != nil
college.location = doc.css("td")[1].text.strip
end
if doc.css('table.table tbody tr td:nth-child(1) a')[0] != nil
college.link = doc.css('table.table tbody tr td:nth-child(1) a')[0]['href']
end
colleges << college
end
colleges
end
def self.scrape_college_info(college)
doc = Nokogiri::HTML(open(BASE_PATH + "#{college.link}"))
end
end
Try below code to get college.link.
if doc.css("td")[0] != nil
college.name = doc.css("td")[0].text.strip
college.link = doc.css("td")[0].css("a").map{|a| a['href']}[0]
end
Now you can pass college link like :
def self.scrape_college_info(college)
doc = Nokogiri::HTML(open("https://www.collegesimply.com" + "#{college.link}"))
end
Hope this will solve your problem. Please let me know, if it works for you.
Try using URI.join:
new_url = URI.join(BASE_PATH, college.link).to_s

API integration error HTTParty

I'm learning how to work with HTTParty and API and I'm having an issue with my code.
Users/admin/.rbenv/versions/2.0.0-p481/lib/ruby/2.0.0/uri/generic.rb:214:in `initialize': the scheme http does not accept registry part: :80 (or bad hostname?)
I've tried using debug_output STDOUT both as an argument to my method and after including HTTParty to have a clue but with no success. Nothing gets displayed:
require 'httparty'
class LolObserver
include HTTParty
default_timeout(1) #timeout after 1 second
attr_reader :api_key, :playerid
attr_accessor :region
def initialize(region,playerid,apikey)
#region = region_server(region)
#playerid = playerid
#api_key = apikey
end
def region_server(region)
case region
when "euw"
self.class.base_uri "https://euw.api.pvp.net"
self.region = "EUW1"
when "na"
self.class.base_uri "https://na.api.pvp.net"
self.region = "NA1"
end
end
def handle_timeouts
begin
yield
#Timeout::Error, is raised if a chunk of the response cannot be read within the read_timeout.
#Timeout::Error, is raised if a connection cannot be created within the open_timeout.
rescue Net::OpenTimeout, Net::ReadTimeout
#todo
end
end
def base_path
"/observer-mode/rest/consumer/getSpectatorGameInfo"
end
def current_game_info
handle_timeouts do
url = "#{ base_path }/#{region}/#{playerid}?api_key=#{api_key}"
puts '------------------------------'
puts url
HTTParty.get(url,:debug_output => $stdout)
end
end
end
I verified my URL which is fine so I'm lost as to where the problem is coming from.
I tested with a static base_uri and it doesn't change anything.
The odd thing is when I do:
HTTParty.get("https://euw.api.pvp.net/observer-mode/rest/consumer/getSpectatorGameInfo/EUW1/randomid?api_key=myapikey")
Everything is working fine and I'm getting a response.
HTTParty doesn't seem to like the way you set your base_uri.
Unless you need it to be like that just add another attr_reader called domain and it will work.
require 'httparty'
class LolObserver
include HTTParty
default_timeout(1) #timeout after 1 second
attr_reader :api_key, :playerid, :domain
attr_accessor :region
def initialize(region,playerid,apikey)
#region = region_server(region)
#playerid = playerid
#api_key = apikey
end
def region_server(region)
case region
when "euw"
#domain = "https://euw.api.pvp.net"
self.region = "EUW1"
when "na"
#domain = "https://na.api.pvp.net"
self.region = "NA1"
end
end
def handle_timeouts
begin
yield
#Timeout::Error, is raised if a chunk of the response cannot be read within the read_timeout.
#Timeout::Error, is raised if a connection cannot be created within the open_timeout.
rescue Net::OpenTimeout, Net::ReadTimeout
#todo
end
end
def base_path
"/observer-mode/rest/consumer/getSpectatorGameInfo"
end
def current_game_info
handle_timeouts do
url = "#{domain}/#{ base_path }/#{region}/#{playerid}?api_key=#{api_key}"
puts '------------------------------'
puts url
HTTParty.get(url,:debug_output => $stdout)
end
end
end

Assert_equal undefined local variable LRTHW ex52

Hi I made it to the lase exercise os Learn Ruby The Hard Way, and I come at the wall...
Here is the test code:
def test_gothon_map()
assert_equal(START.go('shoot!'), generic_death)
assert_equal(START.go('dodge!'), generic_death)
room = START.go("tell a joke")
assert_equal(room, laser_weapon_armory)
end
And here is the code of the file it should test:
class Room
attr_accessor :name, :description, :paths
def initialize(name, description)
#name = name
#description = description
#paths = {}
end
def ==(other)
self.name==other.name&&self.description==other.description&&self.paths==other.paths
end
def go(direction)
#paths[direction]
end
def add_paths(paths)
#paths.update(paths)
end
end
generic_death = Room.new("death", "You died.")
And when I try to launch the test file I get an error:
generic_death = Room.new("death", "You died.")
I tried to set the "generic_death = Room.new("death", "You died.")" in test_gothon_map method and it worked but the problem is that description of the next object is extremely long, so my questions are:
why assertion doesn't not respond to defined object?
can it be done different way then by putting whole object to testing method, since description of the next object is extremely long...
The nature of local variable is that they are, well, local. This means that they are not available outside the scope they were defined.
That's why ruby does not know what generic_death means in your test.
You can solve this in a couple of ways:
define rooms as constants in the Room class:
class Room
# ...
GENERIC_DEATH = Room.new("death", "You died.")
LASER_WEAPON_ARMORY = Room.new(...)
end
def test_gothon_map()
assert_equal(Room::START.go('shoot!'), Room::GENERIC_DEATH)
assert_equal(Room::START.go('dodge!'), Room::GENERIC_DEATH)
room = Room::START.go("tell a joke")
assert_equal(room, Room::LASER_WEAPON_ARMORY)
end
assert the room by its name, or some other identifier:
def test_gothon_map()
assert_equal(START.go('shoot!').name, "death")
assert_equal(START.go('dodge!').name, "death")
room = START.go("tell a joke")
assert_equal(room.name, "laser weapon armory")
end

Create dynamic variables from th class name in tables, move td values into that row's array or hash?

I'm an amateur programmer wanting to scrape data from a site that is similar to this site: http://www.highschoolsports.net/massey/ (I have permission to scrape the site, by the way.)
The target site has 'th' classes for each 'th' in row[0] but I want to ensure that each 'TD' I pull from each table is somehow linked to that th's class name, because the tables are inconsistent, for example one table might be:
row[0] - >>th.name, th.place, th.team
row[1] - >>td[0], td[1] , td[2]
while another might be:
row[0] - >>th.place, th.team, th.name
row[1] - >>td[0], td[1] , td[2] etc..
My Question: How do I capture the 'th' class name across many hundreds of tables which are inconsistent(in 'th' class order) and create the 10-14 variables(arrays), then link the 'td' corresponding to that column in the table to that dynamic variable? Please let me know if this is confusing.. there are multiple tables on a given page..
Currently my code is something like:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'uri'
class Result
def initialize(row)
#attrs = {}
#attrs[:raw] = row.text
end
end
class Race
def initialize(page, table)
#handle = page
#table = table
#results = []
#attrs = {}
parse!
end
def parse!
#attrs[:name] = #handle.css('div.caption').text
get_rows
end
def get_rows
# get all of the rows ..
#handle.css('tr').each do |tr|
#results << RaceResult.new(tr)
end
end
end
class Event
class << self
def all(index_url)
events = []
ourl = Nokogiri::HTML(open(index_url))
ourl.css('a.event').each do |url|
abs_url = MAIN + url.attributes["href"]
events << Event.new(abs_url)
end
events
end
end
def initialize(url)
#url = url
#handle = nil
#attrs = {}
#races = []
#sub_events = []
parse!
end
def parse!
#handle = Nokogiri::HTML(open(#url))
get_page_meta
if(#handle.css('table.base.event_results').length > 0)
#handle.search('div.table_container.event_results').each do |table|
#races << Race.new(#handle, table)
end
else
#handle.css('div.centered a.obvious').each do |ol|
#sub_events << Event.new(BASE_URL + ol.attributes["href"])
end
end
end
def get_page_meta
#attrs[:name] = #handle.css('html body div.content h2 text()')[0] # event name
#attrs[:date] = #handle.xpath("html/body/div/div/text()[2]").text.strip #date
end
end
A friend has been helping me with this and I'm just starting to get a grasp on OOP but I'm only capture the tables and they're not split into td's and stored into some kind of variable/array/hash etc.. I need help understanding this process or how to do this. The critical piece would be dynamically assigning variable names according to the classes of the data and moving the 'td's' from that column (all td[2]'s for example) into that dynamic variable name. I can't tell you how amazing it would be if someone actually could help me solve this problem and understand how to make this work. Thank you in advance for any help!
It's easy once you realize that the th contents are the keys of your hash. Example:
#items = []
doc.css('table.masseyStyleTable').each do |table|
fields = table.css('th').map{|x| x.text.strip}
table.css('tr').each do |tr|
item = {}
fields.each_with_index do |field,i|
item[field] = tr.css('td')[i].text.strip rescue ''
end
#items << item
end
end

Resources