Warming Up Cache Digests Overnight - caching

We have a Rails 3.2 website which is fairly large with thousands of URLs. We implemented Cache_Digests gem for Russian Doll caching. It is working well. We want to further optimize by warming up the cache overnight so that user gets a better experience during the day. I have seen answer to this question: Rails: Scheduled task to warm up the cache?
Could it be modified for warming up large number of URLs?

To trigger cache hits for many pages with expensive load times, just create a rake task to iteratively send web requests to all record/url combinations within your site. (Here is one implementation)
Iteratively Net::HTTP request all site URL/records:
To only visit every page, you can run a nightly Rake task to make sure that early morning users still have a snappy page with refreshed content.
lib/tasks/visit_every_page.rake:
namespace :visit_every_page do
include Net
include Rails.application.routes.url_helpers
task :specializations => :environment do
puts "Visiting specializations..."
Specialization.all.sort{ |a,b| a.id <=> b.id }.each do |s|
begin
puts "Specialization #{s.id}"
City.all.sort{ |a,b| a.id <=> b.id }.each do |c|
puts "Specialization City #{c.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_city_cache/#{c.id}.js") )
end
Division.all.sort{ |a,b| a.id <=> b.id }.each do |d|
puts "Specialization Division #{d.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_division_cache/#{d.id}.js") )
end
end
end
end
# The following methods are defined to fake out the ActionController
# requirements of the Rails cache
def cache_store
ActionController::Base.cache_store
end
def self.benchmark( *params )
yield
end
def cache_configured?
true
end
end
(If you want to directly include cache expiration/recaching into this task, check out this implementation.)
via a Custom Controller Action:
If you need to bypass user authentication restrictions to get to your pages, and/or you don't want to screw up (too badly) your website's tracking analytics, you can create a custom controller action for hitting cache digests that use tokens to bypass authentication:
app/controllers/specializations.rb:
class SpecializationsController < ApplicationController
...
before_filter :check_token, :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
skip_authorization_check :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
...
def refresh_cache
#specialization = Specialization.find(params[:id])
#feedback = FeedbackItem.new
render :show, :layout => 'ajax'
end
def refresh_city_cache
#specialization = Specialization.find(params[:id])
#city = City.find(params[:city_id])
render 'refresh_city.js'
end
def refresh_division_cache
#specialization = Specialization.find(params[:id])
#division = Division.find(params[:division_id])
render 'refresh_division.js'
end
end
Our custom controller action renders the views of other expensive to load pages, causing cache hits to those pages. E.g. refresh_cache renders the same view page & data as controller#show, so requests to refresh_cache will warm up the same cache digests as controller#show for those records.
Security Note:
For security reasons, I recommend before providing access to any custom refresh_cache controller request that you pass in a token and check it to make sure that it corresponds with a unique token for that record. Matching URL tokens to database records before providing access (as seen above) is trivial because your Rake task has access to the unique tokens of each record -- just pass the record's token in with each request.
tl;dr:
To trigger thousands of site URL's/cache digests, create a rake task to iteratively request every record/url combination in your site. You can bypass your app's user authentication restrictions for this task by creating a a custom controller action that authenticates access via tokens instead.

I realize this question is about a year old, but I just worked out my own answer, after scouring a bunch of partial & incorrect solutions.
Hopefully this will help the next person...
Per my own utility class, which can be found here:
https://raw.githubusercontent.com/JayTeeSF/cmd_notes/master/automated_action_runner.rb
You can simply run this (per it's .help method) and pre-cache your pages, without tying-up your own web-server, in the process.
class AutomatedActionRunner
class StatusObject
def initialize(is_valid, error_obj)
#is_valid = !! is_valid
#error_obj = error_obj
end
def valid?
#is_valid
end
def error
#error_obj
end
end
def self.help
puts <<-EOH
Instead tying-up the frontend of your production site with:
`curl http://your_production_site.com/some_controller/some_action/1234`
`curl http://your_production_site.com/some_controller/some_action/4567`
Try:
`rails r 'AutomatedActionRunner.run(SomeController, "some_action", [{id: "1234"}, {id: "4567"}])'`
EOH
end
def self.common_env
{"rack.input" => "", "SCRIPT_NAME" => "", "HTTP_HOST" => "localhost:3000" }
end
REQUEST_ENV = common_env.freeze
def self.run(controller, controller_action, params_ary=[], user_obj=nil)
success_objects = []
error_objects = []
autorunner = new(controller, controller_action, user_obj)
Rails.logger.warn %Q|[AutomatedAction Kickoff]: Preheating cache for #{params_ary.size} #{autorunner.controller.name}##{controller_action} pages.|
params_ary.each do |params_hash|
status = autorunner.run(params_hash)
if status.valid?
success_objects << params_hash
else
error_objects << status.error
end
end
return process_results(success_objects, error_objects, user_obj.try(:id), autorunner.controller.name, controller_action)
end
def self.process_results(success_objects=[], error_objects=[], user_id, controller_name, controller_action)
message = %Q|AutomatedAction Summary|
backtrace = (error_objects.first.try(:backtrace)||[]).join("\n\t").inspect
num_errors = error_objects.size
num_successes = success_objects.size
log_message = %Q|[#{message}]: Generated #{num_successes} #{controller_name}##{controller_action}, pages; Failed #{num_errors} times; 1st Fail: #{backtrace}|
Rails.logger.warn log_message
# all the local-variables above, are because I typically call Sentry or something with extra parameters!
end
attr_reader :controller
def initialize(controller, controller_action, user_obj)
#controller = controller
#controller = controller.constantize unless controller.respond_to?(:name)
#controller_instance = #controller.new
#controller_action = controller_action
#env_obj = REQUEST_ENV.dup
#user_obj = user_obj
end
def run(params_hash)
Rails.logger.warn %Q|[AutomatedAction]: #{#controller.name}##{#controller_action}(#{params_hash.inspect})|
extend_with_autorun unless #controller_instance.respond_to?(:autorun)
#controller_instance.autorun(#controller_action, params_hash, #env_obj, #user_obj)
end
private
def extend_with_autorun
def #controller_instance.autorun(action_name, action_params, action_env, current_user_value=nil)
self.params = action_params # suppress strong parameters exception
self.request = ActionDispatch::Request.new(action_env)
self.response = ActionDispatch::Response.new
define_singleton_method(:current_user, -> { current_user_value })
send(action_name) # do it
return StatusObject.new(true, nil)
rescue Exception => e
return StatusObject.new(false, e)
end
end
end

Related

How to test Stripe's invoice_pdf property when it keeps changing?

In my Rails 6 app I have a very simple controller that displays download links to a user's Stripe invoice PDFs:
class ReceiptsController < ApplicationController
before_action :signed_in_user
def index
receipts = current_account.receipts
end
def show
receipt = current_account.receipts.find(params[:id])
stripe_invoice = Stripe::Invoice.retrieve(receipt.stripe_invoice_id)
redirect_to stripe_invoice.invoice_pdf
end
end
Since Stripe doesn't provide permanent invoice URLs (please correct me if I am wrong), I am storing each invoice's Stripe ID in the database and then use that ID to lookup the current URL to the invoice PDF from the Stripe API.
The problem is that this works most of the time but not all the time. The spec that I created for the controller show action fails in about 20 % of cases because the two URLs do not match:
describe ReceiptsController, :type => :controller do
before :each do
#account = FactoryBot.create(:activated_account)
#user = #account.users.create(FactoryBot.attributes_for(:user))
sign_in(#user)
end
describe 'GET #show' do
# The implementation details of this block don't really matter
before :each do
Customers::FindOrCreate.call(#account)
stripe_subscription = Subscriptions::CreateRemote.call(#account,:payment_behavior => "default_incomplete")
#stripe_invoice = stripe_subscription.latest_invoice
#receipt = Receipts::Create.call(#stripe_invoice)
end
# This test fails in about 20 % of cases because the redirect does not go to #stripe_invoice.invoice_pdf but a slightly different URL
it "redirects to Stripe invoice PDF" do
get :show, :params => {:id => #receipt}
expect(response).to redirect_to #stripe_invoice.invoice_pdf
end
end
end
How can this be? Does the invoice_pdf property of a Stripe invoice change every few seconds? I've been trying to work this out for days now but can't get my head around it.
Addition:
This is a typical test failure that I get quite often:
Expected response to be a redirect to <https://pay.stripe.com/invoice/acct_105jfm2HzYSlmhv7/test_YWNjdF8xMDJqc20yS3pZUmxzaHc0LF9NMGZONnFzNUpPTjlObVprd0hvdGpIdWFUamJHTTVxLDQ3Njc3MDY30200oOxX3A1/pdf?s=ap> but was a redirect to <https://pay.stripe.com/invoice/acct_105jfm2HzYSlmhv7/test_YWNjdF8xMDJqc20yS3pZUmxzaHc0LF9NMGZONnFzNUpPTjlObVprd0hvdGpIdWFUamJHTTVxLDQ3Njc3MDY402001iYCSUbn/pdf?s=ap>.
Expected "https://pay.stripe.com/invoice/acct_105jfm2HzYSlmhv7/test_YWNjdF8xMDJqc20yS3pZUmxzaHc0LF9NMGZONnFzNUpPTjlObVprd0hvdGpIdWFUamJHTTVxLDQ3Njc3MDY30200oOxX3A1F/pdf?s=ap" to be === "https://pay.stripe.com/invoice/acct_105jfm2HzYSlmhv7/test_YWNjdF8xMDJqc20yS3pZUmxzaHc0LF9NMGZONnFzNUpPTjlObVprd0hvdGpIdWFUamJHTTVxLDQ3Njc3MDY402001iYCSUbn/pdf?s=ap".

Metrics/AbcSize Too High: How do I decrease the ABC in this method?

I have recently started using Rubocop to "standardise" my code, and it has helped me optimise a lot of my code, as well as help me learn a lot of Ruby "tricks". I understand that I should use my own judgement and disable Cops where necessary, but I have found myself quite stuck with the below code:
def index
if params[:filters].present?
if params[:filters][:deleted].blank? || params[:filters][:deleted] == "false"
# if owned is true, then we don't need to filter by admin
params[:filters][:admin] = nil if params[:filters][:admin].present? && params[:filters][:owned] == "true"
# if admin is true, then must not filter by owned if false
params[:filters][:owned] = nil if params[:filters][:owned].present? && params[:filters][:admin] == "false"
companies_list =
case params[:filters][:admin]&.to_b
when true
current_user.admin_companies
when false
current_user.non_admin_companies
end
if params[:filters][:owned].present?
companies_list ||= current_user.companies
if params[:filters][:owned].to_b
companies_list = companies_list.where(owner: current_user)
else
companies_list = companies_list.where.not(owner: current_user)
end
end
else
# Filters for deleted companies
companies_list = {}
end
end
companies_list ||= current_user.companies
response = { data: companies_list.alphabetical.as_json(current_user: current_user) }
json_response(response)
end
Among others, the error that I'm getting is the following:
C: Metrics/AbcSize: Assignment Branch Condition size for index is too high. [<13, 57, 16> 60.61/15]
I understand the maths behind it, but I don't know how to simplify this code to achieve the same result.
Could someone please give me some guidance on this?
Thanks in advance.
Well first and foremost, is this code fully tested, including all the myriad conditions? It's so complex that refactoring will surely be disastrous unless the test suite is rigorous. So, write a comprehensive test suite if you don't already have one. If there's already a test suite, make sure it tests all the conditions.
Second, apply the "fat model skinny controller" paradigm. So move all the complexity into a model, let's call it CompanyFilter
def index
companies_list = CompanyFilter.new(current_user, params).list
response = { data: companies_list.alphabetical.as_json(current_user: current_user) }
json_response(response)
end
and move all those if/then/else statements into the CompanyFilter#list method
tests still pass? great, you'll still get the Rubocop warnings, but related to the CompanyFilter class.
Now you need to untangle all the conditions. It's a bit hard for me to understand what's going on, but it looks as if it should be reducible to a single case statement, with 5 possible outcomes. So the CompanyFilter class might look something like this:
class CompanyFilter
attr_accessors :current_user, :params
def initialize(current_user, params)
#current_user = current_user
#params = params
end
def list
case
when no_filter_specified
{}
when user_is_admin
#current_user.admin_companies
when user_is_owned
# etc
when # other condition
# etc
end
end
private
def no_filter_specified
#params[:filter].blank?
end
def user_is_admin
# returns boolean based on params hash
end
def user_is_owned
# returns boolean based on params hash
end
end
tests still passing? perfect! [Edit] Now you can move most of your controller tests into a model test for the CompanyFilter class.
Finally I would define all the different companies_list queries as scopes on the Company model, e.g.
class Company < ApplicationRecord
# some examples, I don't know what's appropriate in this app
scope :for_user, ->(user){ where("...") }
scope :administered_by, ->(user){ where("...") }
end
When composing database scopes ActiveRecord::SpawnMethods#merge is your friend.
Post.where(title: 'How to use .merge')
.merge(Post.where(published: true))
While it doesn't look like much it lets you programatically compose scopes without overelying on mutating assignment and if/else trees. You can for example compose an array of conditions and merge them together into a single ActiveRecord::Relation object with Array#reduce:
[Post.where(title: 'foo'), Post.where(author: 'bar')].reduce(&:merge)
# => SELECT "posts".* FROM "posts" WHERE "posts"."title" = $1 AND "posts"."author" = $2 LIMIT $3
So lets combine that with a skinny controllers approach where you handle filtering in a seperate object:
class ApplicationFilter
include ActiveModel::Attributes
include ActiveModel::AttributeAssignment
attr_accessor :user
def initialize(**attributes)
super()
assign_attributes(attributes)
end
# A convenience method to both instanciate and apply the filters
def self.call(user, params, scope: model_class.all)
return scope unless params[:filters].present?
scope.merge(
new(
permit_params(params).merge(user: user)
).to_scope
)
end
def to_scope
filters.map { |filter| apply_filter(filter) }
.compact
.select {|f| f.respond_to?(:merge) }
.reduce(&:merge)
end
private
# calls a filter_by_foo method if present or
# defaults to where(key => value)
def apply_filter(attribute)
if respond_to? "filter_by_#{attribute}"
send("filter_by_#{attribute}")
else
self.class.model_class.where(
attribute => send(attribute)
)
end
end
# Convention over Configuration is sexy.
def self.model_class
name.chomp("Filter").constantize
end
# filters the incoming params hash based on the attributes of this filter class
def self.permit_params
params.permit(filters).reject{ |k,v| v.blank? }
end
# provided for modularity
def self.filters
attribute_names
end
end
This uses some of the goodness provided by Rails to setup objects with attributes that will dynamically handle filtering attributes. It looks at the list of attributes you have declared and then slices those off the params and applies a method for that filter if present.
We can then write a concrete implementation:
class CompanyFilter < ApplicationFilter
attribute :admin, :boolean, default: false
attribute :owned, :boolean
private
def filter_by_admin
if admin
user.admin_companies
else
user.non_admin_companies
end
end
# this should be refactored to use an assocation on User
def filter_by_owned
case owned
when nil
nil
when true
Company.where(owner: user)
when false
Company.where.not(owner: user)
end
end
end
And you can call it with:
# scope is optional
#companies = CompanyFilter.call(current_user, params), scope: current_user.companies)

Setting up rom-http relation for REST CRUD

I'm trying to set up a rom-http relation for basic REST CRUD, but I find the documentation to be pretty scarce for a beginner, and a little too complex when digging in. What I've tried so far is this:
rom = ROM.container(:http, uri: 'http://localhost:8000', handlers: :json) do |conf|
conf.relation(:users) do
schema(:users) do
end
end
end
This queries the URI http://localhost:8000/users, but how do I configure prefixes, parameters and related resources?
What I'd like to accomplish is being able to consume a URI such as http://localhost:8000/users/1/posts?start=0&size=10 where we have
a global prefix (api)
a version prefix (v1, could be part of the global prefix)
a parent resource (users/1)
a child resource (posts)
query parameters (bonus points if they can be chained like .offset(0).limit(10))
Is this possible with the current implementation? The documentation could use a deeper example, without forcing newcomers to dig into the architecture - which is without doubt brilliant, but complex for someone coming from the ease of use (and the pitfalls) of ActiveRecord. :-)
Sorry that nobody has replied to this yet, the current built-in json handler is a bit broken at the moment, it builds the uri manually when it should just use it from the dataset, you can achieve what you want with something like the following:
NOTE: I only called .dataset.uri to show an example of the URI that would be queried, as I don't have a compatible API running locally.
NOTE: For anything beyond playing around with the library, you'll probably want to use a custom adapter anyway.
require 'bundler/inline'
gemfile(true) do
gem 'rom'
gem 'rom-http'
end
class MyJSONRequest
def self.call(dataset)
uri = dataset.uri
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true if uri.scheme.eql?('https')
request_class = Net::HTTP.const_get(ROM::Inflector.classify(dataset.request_method))
request = request_class.new(uri.request_uri)
dataset.headers.each_with_object(request) do |(header, value), request|
request[header.to_s] = value
end
http.request(request)
end
end
class MyJSONResponse
# Handle JSON responses
#
# #param [Net::HTTP::Response] response
# #param [Dataset] dataset
#
# #return [Array<Hash>]
#
# #api public
def self.call(response, dataset)
Array([JSON.parse(response.body, symbolize_names: true)]).flatten(1)
end
end
ROM::HTTP::Handlers.register(
:my_json,
request: MyJSONRequest,
response: MyJSONResponse
)
rom = ROM.container(:http, uri: 'http://localhost:8000/api', handlers: :my_json) do |conf|
conf.relation(:users) do
schema('v1/users') do
attribute :id, ROM::Types::Integer.meta(
primary_key: true
)
attribute :name, ROM::Types::String
end
def by_id(id)
append_path(id)
end
def offset(offset)
add_params(start: offset)
end
def limit(limit)
add_params(size: limit)
end
end
conf.relation(:posts) do
schema('v1/posts') do
attribute :id, ROM::Types::Integer.meta(
primary_key: true
)
attribute :name, ROM::Types::String
end
def by_user(user_id)
with_options(
base_path: 'v1/users',
path: "#{user_id}/posts"
)
end
end
end
users = rom.relations[:users]
posts = rom.relations[:posts]
users.offset(0).limit(10).dataset.uri
# => #<URI::HTTP http://localhost:8000/api/v1/users?start=0&size=10>
posts.by_user(1).dataset.uri
# => #<URI::HTTP http://localhost:8000/api/v1/users/1/posts>
Also, for nested resources, ROM can query those automatically, check the (out-dated) example sections below to see how that works.
https://github.com/rom-rb/rom-http/blob/57ca3703bf82bc9d7b2c3304752947de2c6d6dea/examples/repository_with_combine.rb#L49-L51
https://github.com/rom-rb/rom-http/blob/57ca3703bf82bc9d7b2c3304752947de2c6d6dea/examples/repository_with_combine.rb#L66-L71
https://github.com/rom-rb/rom-http/blob/57ca3703bf82bc9d7b2c3304752947de2c6d6dea/examples/repository_with_combine.rb#L81-L83

Sidekiq mechanize overwritten instance

I am building a simple web spider using Sidekiq and Mechanize.
When I run this for one domain, it works fine. When I run it for multiple domains, it fails. I believe the reason is that web_page gets overwritten when instantiated by another Sidekiq worker, but I am not sure if that's true or how to fix it.
# my scrape_search controller's create action searches on google.
def create
#scrape = ScrapeSearch.build(keywords: params[:keywords], profession: params[:profession])
agent = Mechanize.new
scrape_search = agent.get('http://google.com/') do |page|
search_result = page.form...
search_result.css("h3.r").map do |link|
result = link.at_css('a')['href'] # Narrowing down to real search results
#domain = Domain.new(some params)
ScrapeDomainWorker.perform_async(#domain.url, #domain.id, remaining_keywords)
end
end
end
I'm creating a Sidekiq job per domain. Most of the domains I'm looking for should contain just a few pages, so there's no need for sub-jobs per page.
This is my worker:
class ScrapeDomainWorker
include Sidekiq::Worker
...
def perform(domain_url, domain_id, keywords)
#domain = Domain.find(domain_id)
#domain_link = #domain.protocol + '://' + domain_url
#keywords = keywords
# First we scrape the homepage and get the first links
#domain.to_parse = ['/'] # to_parse is an array of PATHS to parse for the domain
mechanize_path('/')
#domain.verified << '/' # verified is an Array field containing valid domain paths
get_paths(#web_page) # Now we should have to_scrape populated with homepage links
#domain.scraped = 1 # Loop counter
while #domain.scraped < 100
#domain.to_parse.each do |path|
#domain.to_parse.delete(path)
#domain.scraped += 1
mechanize_path(path) # We create a Nokogiri HTML doc with mechanize for the valid path
...
get_paths(#web_page) # Fire this to repopulate to_scrape !!!
end
end
#domain.save
end
def mechanize_path(path)
agent = Mechanize.new
begin
#web_page = agent.get(#domain_link + path)
rescue Exception => e
puts "Mechanize Exception for #{path} :: #{e.message}"
end
end
def get_paths(web_page)
paths = web_page.links.map {|link| link.href.gsub((#domain.protocol + '://' + #domain.url), "") } ## This works when I scrape a single domain, but fails with ".gsub for nil" when I scrape a few domains.
paths.uniq.each do |path|
#domain.to_parse << path
end
end
end
This works when I scrape a single domain, but fails with .gsub for nil for web_page when I scrape a few domains.
You can wrap you code in another class, and then create and object of that class within your worker:
class ScrapeDomainWrapper
def initialize(domain_url, domain_id, keywords)
# ...
end
def mechanize_path(path)
# ...
end
def get_paths(web_page)
# ...
end
end
And your worker:
class ScrapeDomainWorker
include Sidekiq::Worker
def perform(domain_url, domain_id, keywords)
ScrapeDomainWrapper.new(domain_url, domain_id, keywords)
end
end
Also, bear in mind that Mechanize::Page#links may be a nil.

DELAYED_JOB Update_attribute in delayed job is not working, table not updated

I has been wanted to do the Delayed::Job which to do the fbLikes ( I post a lot on StackOverFlow but still haven't solve the problem yet) My table n database have name, id, fbLikes, fbId, url.
Here is my steps for the program.
[Home Page]company list -> Create a Company[Insert A company infos] ->Save fbId, name, id, url BUT NOT FBLIKES -> redirect to HomePage [after_save Update the fbLikes for the previous added company]
I not sure whether my delayed job is working or not because my fbLikes in my model is still blank and not updated with latest fbLikes.I not sure is there a better way to do this.
For "rake jobs:work" there is no background work display in the console.
[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end
[MODEL company.rb]
require "delayed_job"
require "count_job.rb"
after_save :fb_likes
def fb_likes
Delayed::Job.enqueue(CountJob.new(self.id))
end
[lib/count_job.rb]
require 'net/http'
require 'company.rb'
class CountJob < Struct.new(:id)
def perform
#company = Company.find(id)
uri = URI("http://graph.facebook.com/#{#company.fbId}")
data = Net::HTTP.get(uri)
#company.fbLikes= JSON.parse(data)['likes']
#company.save!
end
end

Resources