DELAYED_JOB Update_attribute in delayed job is not working, table not updated - ruby

I has been wanted to do the Delayed::Job which to do the fbLikes ( I post a lot on StackOverFlow but still haven't solve the problem yet) My table n database have name, id, fbLikes, fbId, url.
Here is my steps for the program.
[Home Page]company list -> Create a Company[Insert A company infos] ->Save fbId, name, id, url BUT NOT FBLIKES -> redirect to HomePage [after_save Update the fbLikes for the previous added company]
I not sure whether my delayed job is working or not because my fbLikes in my model is still blank and not updated with latest fbLikes.I not sure is there a better way to do this.
For "rake jobs:work" there is no background work display in the console.
[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end

[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end

Related

NameError Exception: undefined local variable or method `products' for Wheyscrapper:Class

I'm building a small web scraper using Ruby and now I'm trying to refactor my code. Unfortunately, I'm encountering some errors while I'm refactoring my code. This is one of the errors.
Basically, I'm calling two separate methods in the first method which is whey_scrapper. Each of these two methods are basically responsible of scraping a specific item on the webpage. When I run and debug this code with byebug, I basically try to display the products or prices I've scraped but I get an error message saying that 'products' or 'prices' is undefined. This is my current code:
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'byebug'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
products = Array.new
product_names = parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
products << product
end
end
def prices_scrapper
prices = Array.new
product_prices = parsed_page.css('div.price-box')
product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
prices << price
end
end
byebug
whey_scrapper
end
There's a lot going on here, but to make it more Ruby you'd consider making those lazy-initialized and giving them names that reflect that:
class Wheyscrapper
URL = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?%s"
def initialize(company:)
#company = company
# Use encode_www_form to encode query-string parameters
#url = URL % URI.encode_www_form(manufacturer: company)
end
def document
# Lazy-initialize a parsd version of the page
#document ||= Nokogiri::HTML(open(url).read)
end
def products
document.css('div.product-primary').map do |product_name|
{
name: product_name.css('h2.product-name').text
}
end
end
def prices
document.css('div.price-box').map do |product_price|
{
amount: product_price.css('span.price').text
}
end
end
end
This fixes a lot of the data propagation problems you had in your original. When you declare a variable it's a local variable, meaning it doesn't exist outside of that particular call of that particular method. If you want to persist it for longer you need to use instance variables, as in #products, or you need to define methods that return the data you need.
The above approach combines that, using a lazy-initialized instance variable to persist the parsed document, and exposes that as a method the other methods can use.
Now you can spin this up:
scraper = WheyScraper.new(company: "Body & Fit")
Where that should enable everything to be available directly:
scraper.prices
scraper.products
When you learn how to use Ruby effectively you'll often find solutions to your problems that are really minimal. Usually a lot of Ruby code is a sign that it's not being used properly.
This should be refactored in a better way but this should at least work without refactor, based on my comments above
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
#parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
#products = Array.new
product_names = #parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
#products << product
end
end
def prices_scrapper
#prices = Array.new
#product_prices = #parsed_page.css('div.price-box')
#product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
#prices << price
end
end
end
w = Wheyscrapper.new.whey_scrapper

How to test HTTParty API call with Ruby and RSpec

I am using the HTTParty gem to make a call to the GitHub API to access a list of user's repos.
It is a very simple application using Sinatra that displays a user's favourite programming language based on the most common language that appears in their repos.
I am a bit stuck on how I can write an RSpec expectation that mocks out the actual API call and instead just checks that json data is being returned.
I have a mock .json file but not sure how to use it in my test.
Any ideas?
github_api.rb
require 'httparty'
class GithubApi
attr_reader :username, :data, :languages
def initialize(username)
#username = username
#response = HTTParty.get("https://api.github.com/users/#{#username}/repos")
#data = JSON.parse(#response.body)
end
end
github_api_spec.rb
require './app/models/github_api'
require 'spec_helper'
describe GithubApi do
let(:github_api) { GithubApi.new('mock_user') }
it "receives a json response" do
end
end
Rest of the files for clarity:
results.rb
require 'httparty'
require_relative 'github_api'
class Results
def initialize(github_api = Github.new(username))
#github_api = github_api
#languages = []
end
def get_languages
#github_api.data.each do |repo|
#languages << repo["language"]
end
end
def favourite_language
get_languages
#languages.group_by(&:itself).values.max_by(&:size).first
end
end
application_controller.rb
require './config/environment'
require 'sinatra/base'
require './app/models/github_api'
class ApplicationController < Sinatra::Base
configure do
enable :sessions
set :session_secret, "#3x!iltĀ£"
set :views, 'app/views'
end
get "/" do
erb :index
end
post "/user" do
#github = GithubApi.new(params[:username])
#results = Results.new(#github)
#language = #results.favourite_language
session[:language] = #language
session[:username] = params[:username]
redirect '/results'
end
get "/results" do
#language = session[:language]
#username = session[:username]
erb :results
end
run! if app_file == $0
end
There are multiple ways you could approach this problem.
You could, as #anil suggested, use a library like webmock to mock the underlying HTTP call. You could also do something similar with VCR (https://github.com/vcr/vcr) which records the results of an actual call to the HTTP endpoint and plays back that response on subsequent requests.
But, given your question, I don't see why you couldn't just use an Rspec double. I'll show you how below. But, first, it would be a bit easier to test the code if it were not all in the constructor.
github_api.rb
require 'httparty'
class GithubApi
attr_reader :username
def initialize(username)
#username = username
end
def favorite_language
# method to calculate which language is used most by username
end
def languages
# method to grab languages from repos
end
def repos
repos ||= do
response = HTTParty.get("https://api.github.com/users/#{username}/repos")
JSON.parse(response.body)
end
end
end
Note that you do not need to reference the #username variable in the url because you have an attr_reader.
github_api_spec.rb
require './app/models/github_api'
require 'spec_helper'
describe GithubApi do
subject(:api) { described_class.new(username) }
let(:username) { 'username' }
describe '#repos' do
let(:github_url) { "https://api.github.com/users/#{username}/repos" }
let(:github_response) { instance_double(HTTParty::Response, body: github_response_body) }
let(:github_response_body) { 'response_body' }
before do
allow(HTTParty).to receive(:get).and_return(github_response)
allow(JSON).to receive(:parse)
api.repos
end
it 'fetches the repos from Github api' do
expect(HTTParty).to have_received(:get).with(github_url)
end
it 'parses the Github response' do
expect(JSON).to have_received(:parse).with(github_response_body)
end
end
end
Note that there is no need to actually load or parse any real JSON. What we're testing here is that we made the correct HTTP call and that we called JSON.parse on the response. Once you start testing the languages method you'd need to actually load and parse your test file, like this:
let(:parsed_response) { JSON.parse(File.read('path/to/test/file.json')) }
You can mock those API calls using https://github.com/bblimke/webmock and send back mock.json using webmock. This post, https://robots.thoughtbot.com/how-to-stub-external-services-in-tests walks you through the setup of webmock with RSpec (the tests in the post mock GitHub API call too)

Ruby uninitialized constant Job (NameError) Scraping and adding to database

I am creating a scraper with Nokogiri and Ruby on Rails. My goal is to scrape jobs from a specific webpage. I created the following code, which results in an array of job titles. So this works fine.
My problem is now, that I want to add these titles to my database of Vacancies. When I type in Vacancy.create(companyname=jobs[0]), it should create a Vacancy with the first job-title in the array.
But it gives me an error instead:
app/services/job_service.rb:18:in `': uninitialized constant
Vacancy (NameError)
So it looks like it does not know the class Vacancy.
I therefore required the file vacancy.rb:
require_relative(../models/vacancy.rb')
But then it gives me another error:
uninitialized constant ApplicationRecord (NameError)
So I now think that I am doing something fundamentally wrong here.
Am I putting the whole scraper file in the wrong folder (should I probably put it in the rake folder)?. All I want is to execute something like Vacancy.create so that it pushes this to my database of Vacancies (aka Jobs).
Here is the the scraper (job_service.rb):
require 'open-uri'
require 'nokogiri'
url = "https://www.savedroid.com/#karriere-section"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
jobs = []
html_doc.search('.job').each do |element|
jobs << element.text.strip
end
Vacancy.create(companyname=jobs[0])
make sure that the model is created and there are necessary fields in the table
let's put you code of parser into the rails services:
class Jobs
def self.jobs
return #jobs if #jobs
require 'open-uri'
require 'nokogiri'
url = "https://www.savedroid.com/#karriere-section"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
jobs = []
html_doc.search('.job').each do |element|
jobs << element.text.strip
end
#jobs = jobs
end
end
then you can call it inside rails controller:
VacancyController < ApplicationController
def create
Jobs.jobs.each do |job|
Vacancy.create(companyname: job)
end
end
end

Why can't PhantomJS/Poltergeist pull up this website correctly?

I'm using Capybara to navigate a web page. The url look like this:
http://www.myapp.com/page/ABCXYZ?page=<x>
The page has a paginated table on it. Passing a page number to it will paginate the table appropriately.
However, when using the poltergeist driver, the page parameter is always ignored.
Using the selenium driver is not an option because it's a hassle to get it to run headless, it doesn't want to run more than one time (gives "connection refused" error on localhost).
This looks like an encoding issue, but I'm not sure where exactly in the stack that the issue lies.
class Verifier
class << self
include Capybara::DSL
Capybara.default_driver = :poltergeist
Capybara.default_wait_time = 10
def parse_table(header)
xpath = '//*[#id="products"]/table[3]/tbody/tr/td/div[4]/div/table'
table = find(:xpath, xpath)
rows = []
table.all("tr").each do |row|
product_hash = {}
row.all("td").each_with_index do |col,idx|
product_hash[header[idx]] = col.text
end
rows << product_hash
end
rows
end
def pages
page.find(".numberofresults").text.gsub(" Products","").split(" ").last.to_i/25
end
def import(item)
visit "http://www.myapp.com/page/#{item}"
header = parse_header
apps = parse_vehicles(header)
pages.times do |pagenumber|
url = "http://www.myapp.com/page/#{item}?page=#{pagenumber+1}" # This is the problem
end
end
end
That url in the last loop? It is processes as if the pagenumber is not present. When I change the driver to :selenium, this whole thing works. So it's not a Capybara issue as far as I can see.

Warming Up Cache Digests Overnight

We have a Rails 3.2 website which is fairly large with thousands of URLs. We implemented Cache_Digests gem for Russian Doll caching. It is working well. We want to further optimize by warming up the cache overnight so that user gets a better experience during the day. I have seen answer to this question: Rails: Scheduled task to warm up the cache?
Could it be modified for warming up large number of URLs?
To trigger cache hits for many pages with expensive load times, just create a rake task to iteratively send web requests to all record/url combinations within your site. (Here is one implementation)
Iteratively Net::HTTP request all site URL/records:
To only visit every page, you can run a nightly Rake task to make sure that early morning users still have a snappy page with refreshed content.
lib/tasks/visit_every_page.rake:
namespace :visit_every_page do
include Net
include Rails.application.routes.url_helpers
task :specializations => :environment do
puts "Visiting specializations..."
Specialization.all.sort{ |a,b| a.id <=> b.id }.each do |s|
begin
puts "Specialization #{s.id}"
City.all.sort{ |a,b| a.id <=> b.id }.each do |c|
puts "Specialization City #{c.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_city_cache/#{c.id}.js") )
end
Division.all.sort{ |a,b| a.id <=> b.id }.each do |d|
puts "Specialization Division #{d.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_division_cache/#{d.id}.js") )
end
end
end
end
# The following methods are defined to fake out the ActionController
# requirements of the Rails cache
def cache_store
ActionController::Base.cache_store
end
def self.benchmark( *params )
yield
end
def cache_configured?
true
end
end
(If you want to directly include cache expiration/recaching into this task, check out this implementation.)
via a Custom Controller Action:
If you need to bypass user authentication restrictions to get to your pages, and/or you don't want to screw up (too badly) your website's tracking analytics, you can create a custom controller action for hitting cache digests that use tokens to bypass authentication:
app/controllers/specializations.rb:
class SpecializationsController < ApplicationController
...
before_filter :check_token, :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
skip_authorization_check :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
...
def refresh_cache
#specialization = Specialization.find(params[:id])
#feedback = FeedbackItem.new
render :show, :layout => 'ajax'
end
def refresh_city_cache
#specialization = Specialization.find(params[:id])
#city = City.find(params[:city_id])
render 'refresh_city.js'
end
def refresh_division_cache
#specialization = Specialization.find(params[:id])
#division = Division.find(params[:division_id])
render 'refresh_division.js'
end
end
Our custom controller action renders the views of other expensive to load pages, causing cache hits to those pages. E.g. refresh_cache renders the same view page & data as controller#show, so requests to refresh_cache will warm up the same cache digests as controller#show for those records.
Security Note:
For security reasons, I recommend before providing access to any custom refresh_cache controller request that you pass in a token and check it to make sure that it corresponds with a unique token for that record. Matching URL tokens to database records before providing access (as seen above) is trivial because your Rake task has access to the unique tokens of each record -- just pass the record's token in with each request.
tl;dr:
To trigger thousands of site URL's/cache digests, create a rake task to iteratively request every record/url combination in your site. You can bypass your app's user authentication restrictions for this task by creating a a custom controller action that authenticates access via tokens instead.
I realize this question is about a year old, but I just worked out my own answer, after scouring a bunch of partial & incorrect solutions.
Hopefully this will help the next person...
Per my own utility class, which can be found here:
https://raw.githubusercontent.com/JayTeeSF/cmd_notes/master/automated_action_runner.rb
You can simply run this (per it's .help method) and pre-cache your pages, without tying-up your own web-server, in the process.
class AutomatedActionRunner
class StatusObject
def initialize(is_valid, error_obj)
#is_valid = !! is_valid
#error_obj = error_obj
end
def valid?
#is_valid
end
def error
#error_obj
end
end
def self.help
puts <<-EOH
Instead tying-up the frontend of your production site with:
`curl http://your_production_site.com/some_controller/some_action/1234`
`curl http://your_production_site.com/some_controller/some_action/4567`
Try:
`rails r 'AutomatedActionRunner.run(SomeController, "some_action", [{id: "1234"}, {id: "4567"}])'`
EOH
end
def self.common_env
{"rack.input" => "", "SCRIPT_NAME" => "", "HTTP_HOST" => "localhost:3000" }
end
REQUEST_ENV = common_env.freeze
def self.run(controller, controller_action, params_ary=[], user_obj=nil)
success_objects = []
error_objects = []
autorunner = new(controller, controller_action, user_obj)
Rails.logger.warn %Q|[AutomatedAction Kickoff]: Preheating cache for #{params_ary.size} #{autorunner.controller.name}##{controller_action} pages.|
params_ary.each do |params_hash|
status = autorunner.run(params_hash)
if status.valid?
success_objects << params_hash
else
error_objects << status.error
end
end
return process_results(success_objects, error_objects, user_obj.try(:id), autorunner.controller.name, controller_action)
end
def self.process_results(success_objects=[], error_objects=[], user_id, controller_name, controller_action)
message = %Q|AutomatedAction Summary|
backtrace = (error_objects.first.try(:backtrace)||[]).join("\n\t").inspect
num_errors = error_objects.size
num_successes = success_objects.size
log_message = %Q|[#{message}]: Generated #{num_successes} #{controller_name}##{controller_action}, pages; Failed #{num_errors} times; 1st Fail: #{backtrace}|
Rails.logger.warn log_message
# all the local-variables above, are because I typically call Sentry or something with extra parameters!
end
attr_reader :controller
def initialize(controller, controller_action, user_obj)
#controller = controller
#controller = controller.constantize unless controller.respond_to?(:name)
#controller_instance = #controller.new
#controller_action = controller_action
#env_obj = REQUEST_ENV.dup
#user_obj = user_obj
end
def run(params_hash)
Rails.logger.warn %Q|[AutomatedAction]: #{#controller.name}##{#controller_action}(#{params_hash.inspect})|
extend_with_autorun unless #controller_instance.respond_to?(:autorun)
#controller_instance.autorun(#controller_action, params_hash, #env_obj, #user_obj)
end
private
def extend_with_autorun
def #controller_instance.autorun(action_name, action_params, action_env, current_user_value=nil)
self.params = action_params # suppress strong parameters exception
self.request = ActionDispatch::Request.new(action_env)
self.response = ActionDispatch::Response.new
define_singleton_method(:current_user, -> { current_user_value })
send(action_name) # do it
return StatusObject.new(true, nil)
rescue Exception => e
return StatusObject.new(false, e)
end
end
end

Resources