Ruby uninitialized constant Job (NameError) Scraping and adding to database - ruby

I am creating a scraper with Nokogiri and Ruby on Rails. My goal is to scrape jobs from a specific webpage. I created the following code, which results in an array of job titles. So this works fine.
My problem is now, that I want to add these titles to my database of Vacancies. When I type in Vacancy.create(companyname=jobs[0]), it should create a Vacancy with the first job-title in the array.
But it gives me an error instead:
app/services/job_service.rb:18:in `': uninitialized constant
Vacancy (NameError)
So it looks like it does not know the class Vacancy.
I therefore required the file vacancy.rb:
require_relative(../models/vacancy.rb')
But then it gives me another error:
uninitialized constant ApplicationRecord (NameError)
So I now think that I am doing something fundamentally wrong here.
Am I putting the whole scraper file in the wrong folder (should I probably put it in the rake folder)?. All I want is to execute something like Vacancy.create so that it pushes this to my database of Vacancies (aka Jobs).
Here is the the scraper (job_service.rb):
require 'open-uri'
require 'nokogiri'
url = "https://www.savedroid.com/#karriere-section"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
jobs = []
html_doc.search('.job').each do |element|
jobs << element.text.strip
end
Vacancy.create(companyname=jobs[0])

make sure that the model is created and there are necessary fields in the table
let's put you code of parser into the rails services:
class Jobs
def self.jobs
return #jobs if #jobs
require 'open-uri'
require 'nokogiri'
url = "https://www.savedroid.com/#karriere-section"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
jobs = []
html_doc.search('.job').each do |element|
jobs << element.text.strip
end
#jobs = jobs
end
end
then you can call it inside rails controller:
VacancyController < ApplicationController
def create
Jobs.jobs.each do |job|
Vacancy.create(companyname: job)
end
end
end

Related

NameError Exception: undefined local variable or method `products' for Wheyscrapper:Class

I'm building a small web scraper using Ruby and now I'm trying to refactor my code. Unfortunately, I'm encountering some errors while I'm refactoring my code. This is one of the errors.
Basically, I'm calling two separate methods in the first method which is whey_scrapper. Each of these two methods are basically responsible of scraping a specific item on the webpage. When I run and debug this code with byebug, I basically try to display the products or prices I've scraped but I get an error message saying that 'products' or 'prices' is undefined. This is my current code:
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'byebug'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
products = Array.new
product_names = parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
products << product
end
end
def prices_scrapper
prices = Array.new
product_prices = parsed_page.css('div.price-box')
product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
prices << price
end
end
byebug
whey_scrapper
end
There's a lot going on here, but to make it more Ruby you'd consider making those lazy-initialized and giving them names that reflect that:
class Wheyscrapper
URL = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?%s"
def initialize(company:)
#company = company
# Use encode_www_form to encode query-string parameters
#url = URL % URI.encode_www_form(manufacturer: company)
end
def document
# Lazy-initialize a parsd version of the page
#document ||= Nokogiri::HTML(open(url).read)
end
def products
document.css('div.product-primary').map do |product_name|
{
name: product_name.css('h2.product-name').text
}
end
end
def prices
document.css('div.price-box').map do |product_price|
{
amount: product_price.css('span.price').text
}
end
end
end
This fixes a lot of the data propagation problems you had in your original. When you declare a variable it's a local variable, meaning it doesn't exist outside of that particular call of that particular method. If you want to persist it for longer you need to use instance variables, as in #products, or you need to define methods that return the data you need.
The above approach combines that, using a lazy-initialized instance variable to persist the parsed document, and exposes that as a method the other methods can use.
Now you can spin this up:
scraper = WheyScraper.new(company: "Body & Fit")
Where that should enable everything to be available directly:
scraper.prices
scraper.products
When you learn how to use Ruby effectively you'll often find solutions to your problems that are really minimal. Usually a lot of Ruby code is a sign that it's not being used properly.
This should be refactored in a better way but this should at least work without refactor, based on my comments above
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
#parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
#products = Array.new
product_names = #parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
#products << product
end
end
def prices_scrapper
#prices = Array.new
#product_prices = #parsed_page.css('div.price-box')
#product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
#prices << price
end
end
end
w = Wheyscrapper.new.whey_scrapper

How do I reference a method in a different class from a method in another class?

I have a module and class in a file lib/crawler/page-crawler.rb that looks like this:
require 'oga'
require 'net/http'
require 'pry'
module YPCrawler
class PageCrawler
attr_accessor :url
def initialize(url)
#url = url
end
def get_page_listings
body = Net::HTTP.get(URI.parse(#url))
document = Oga.parse_html(body)
document.css('div.result')
end
newpage = PageCrawler.new "http://www.someurl"
#listings = newpage.get_page_listings
#listings.each do |listing|
bizname = YPCrawler::ListingCrawler.new listing['id']
end
end
end
Then I have another module & class in another file lib/crawler/listing-crawler.rb that looks like this:
require 'oga'
require 'pry'
module YPCrawler
class ListingCrawler
def initialize(id)
#id = id
end
def extract_busines_name
binding.pry
end
end
end
However, when I try to run this script ruby lib/yp-crawler.rb which executes the page-crawler.rb file above and works without the YPCrawler call, I get this error:
/lib/crawler/page-crawler.rb:23:in `block in <class:PageCrawler>': uninitialized constant YPCrawler::ListingCrawler (NameError)
The issue is on this line:
bizname = YPCrawler::ListingCrawler.new listing['id']
So how do I call that other from within my iterator in my page-crawler.rb?
Edit 1
When I just do `ListingCrawler.new listing['id'], I get the following error:
uninitialized constant YPCrawler::PageCrawler::ListingCrawler (NameError)
Edit 2
Here is the directory structure of my project:
Edit 3
My yp-crawler.rb looks like this:
require_relative "yp-crawler/version"
require_relative "crawler/page-crawler"
require_relative "crawler/listing-crawler"
module YPCrawler
end
In your yp-crawler.rb file, based on the structure that you posted, you should have something like:
require 'yp-crawler/version'
require 'crawler/listing-crawler'
require 'crawler/page-crawler'
Try this, in your yp-crawler.rb add the line:
Dir["#{File.dirname(__FILE__)}/crawler/**/*.rb"].each { |file| load(file) }
That should automatically include all files in your /crawler directory at runtime. Might want to do the same for the other directories.
Let me know if that helps :)

'Error: Cannot open "/home/<...>/billy-bones/=" for reading' while using pry and DataMapper

So, I'm trying to build a quick console program for my development needs, akin to rails console (I'm using Sinatra + DataMapper + pry).
I run it and launch cat = Category.new(name: 'TestCat', type: :referential). It gives me the following error:
Error: Cannot open "/home/art-solopov/Projects/by-language/Ruby/billy-bones/=" for reading.
What could be the cause of the problem?
console:
#!/usr/bin/env ruby
$LOAD_PATH << 'lib'
require 'pry'
require 'config'
binding.pry
lib/config.rb:
# Configuration files and app-wide requires go here
require 'sinatra'
require 'data_mapper'
require 'model/bill'
require 'model/category'
configure :production do
DataMapper::Logger.new('db-log', :debug)
DataMapper.setup(:default,
'postgres://billy-bones:billy#localhost/billy-bones')
DataMapper.finalize
end
configure :development do
DataMapper::Logger.new($stderr, :debug)
DataMapper.setup(:default,
'postgres://billy-bones:billy#localhost/billy-bones-dev')
DataMapper.finalize
DataMapper.auto_upgrade!
end
configure :test do
require 'dm_migrations'
DataMapper::Logger.new($stderr, :debug)
DataMapper.setup(:default,
'postgres://billy-bones:billy#localhost/billy-bones-test')
DataMapper.finalize
DataMapper.auto_migrate!
end
lib/model/category.rb:
require 'data_mapper'
class Category
include DataMapper::Resource
property :id, Serial
property :name, String
property :type, Enum[:referential, :predefined, :computable]
has n, :bills
# has n, :tariffs TODO uncomment when tariff ready
def create_bill(params)
# A bill factory for current category type
case type
when :referential
ReferentialBill.new params
when :predefined
PredefinedBill.new params
when :computable
ComputableBill.new params
end
end
end
If I substitute pry with irb in the console script, it goes fine.
Thank you very much!
P. S.
Okay, yesterday I tried this script again, and it worked perfectly. I didn't change anything. I'm not sure whether I should remove the question now or not.
P. P. S.
Or actually not... Today I've encountered it again. Still completely oblivious to what could cause it.
** SOLVED **
DAMN YOU PRY!
Okay, so here's the difference.
When I tested it the second time, I actually entered a = Category.new(name: 'TestCat', type: :referential) and it worked. Looks like pry just thinks cat is a Unix command, not a valid variable name.
Not answer to the pry question I just generally hate case statements in ruby.
Why not change:
def create_bill(params)
# A bill factory for current category type
case type
when :referential
ReferentialBill.new params
when :predefined
PredefinedBill.new params
when :computable
ComputableBill.new params
end
end
to:
def create_bill(params)
# A bill factory for current category type
self.send("new_#{type}_bill",params)
end
def new_referential_bill(params)
ReferentialBill.new params
end
def new_predefined_bill(params)
PredefinedBill.new params
end
def new_computable_bill(params)
ComputableBill.new params
end
You could make this more dynamic but I think that would take away from readability in this case but if you'd like in rails this should do the trick
def create_bill(params)
if [:referential, :predefined, :computable].include?(type)
"#{type}_bill".classify.constantize.new(params)
else
#Some Kind of Handling for non Defined Bill Types
end
end
Or this will work inside or outside rails
def create_bill(params)
if [:referential, :predefined, :computable].include?(type)
Object.const_get("#{type.to_s.capitalize}Bill").new(params)
else
#Some Kind of Handling for non Defined Bill Types
end
end

DELAYED_JOB Update_attribute in delayed job is not working, table not updated

I has been wanted to do the Delayed::Job which to do the fbLikes ( I post a lot on StackOverFlow but still haven't solve the problem yet) My table n database have name, id, fbLikes, fbId, url.
Here is my steps for the program.
[Home Page]company list -> Create a Company[Insert A company infos] ->Save fbId, name, id, url BUT NOT FBLIKES -> redirect to HomePage [after_save Update the fbLikes for the previous added company]
I not sure whether my delayed job is working or not because my fbLikes in my model is still blank and not updated with latest fbLikes.I not sure is there a better way to do this.
For "rake jobs:work" there is no background work display in the console.
[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end
[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end

In Sinatra(Ruby), how should I create global variables which are assigned values only once in the application lifetime?

In Sinatra, I'm unable to create global variables which are assigned values only once in the application lifetime. Am I missing something? My simplified code looks like this:
require 'rubygems' if RUBY_VERSION < "1.9"
require 'sinatra/base'
class WebApp < Sinatra::Base
#a = 1
before do
#b = 2
end
get '/' do
puts #a, #b
"#{#a}, #{#b}"
end
end
WebApp.run!
This results in
nil
2
in the terminal and ,2 in the browser.
If I try to put #a = 1 in the initialize method, I'm getting an error in the WebApp.run! line.
I feel I'm missing something because if I can't have global variables, then how can I load large data during application instantiation?
before do seems to get called every time there is a request from the client side.
class WebApp < Sinatra::Base
configure do
set :my_config_property, 'hello world'
end
get '/' do
"#{settings.my_config_property}"
end
end
Beware that if you use Shotgun, or some other Rack runner tool that reloads the code on each request the value will be recreated each time and it will look as if it's not assigned only once. Run in production mode to disable reloading and you will see that it's only assigned on the first request (you can do this with for example rackup --env production config.ru).
I ran into a similar issue, I was trying to initialize an instance variable #a using the initialize method but kept receiving an exception every time:
class MyApp < Sinatra::Application
def initialize
#a = 1
end
get '/' do
puts #a
'inside get'
end
end
I finally decided to look into the Sinatra code for initialize:
# File 'lib/sinatra/base.rb', line 877
def initialize(app = nil)
super()
#app = app
#template_cache = Tilt::Cache.new
yield self if block_given?
end
Looks like it does some necessary bootstrapping and I needed to call super().
def initialize
super()
#a = 1
end
This seemed to fix my issue and everything worked as expected.
Another option:
helpers do
def a
a ||= 1
end
end
Building on Theo's accepted solution, it is also possible to do:
class App < Sinatra::Application
set :blabla, ''
namespace '/b' do
get '/baby' do
# do something where bouh is assigned a value
settings.blabla = 'bouh'
end
end
namespace '/z'
get '/human' do
# settings.blabla is available here with newly assigned value
end
end
end
You could use OpenStruct.
require 'rubygems'
require 'sinatra'
require 'ostruct'
configure do
Struct = OpenStruct.new(
:foo => 'bar'
)
end
get '/' do
"#{Struct.foo}" # => bar
end
You can even use the Struct class in views and other loaded files.

Resources