Sorry for such a big question. I do not have much experience with Rails threads and mutex.
I have a class as follow which is used by different controllers to get the license for each customers.
Customers and their licenses gets added and removed every hour. An api is available to get all customers and their licenses.
I plan to create a rake task to call update_set_customers_licenses, run hourly via a cronjob.
I have following questions:
1) Even with a mutex, currently there is a potential for problem, there is a chance that my rake task can occur while updating. Any idea on how to solve this?
2) My design below writes the json out to a file, this is done is for safety as the api is not that reliable. As can be seen, it is not reading the file back, so in essence the file write is useless. I tried to implement a file read but together with mutex and rake task, it gets really confusing. Any pointers will help here.
class Customer
##customers_to_licenses_hash = nil
##last_updated_at = nil
##mutex = Mutex.new
CUSTOMERS_LICENSES_FILE = "#{Rails.root}/tmp/customers_licenses"
def self.cached_license_with_customer(customer)
Rails.cache.fetch('customer') {self.license_with_customer(customer)}
end
def self.license_with_customer(customer)
##mutex.synchronize do
license = ##customers_to_licenses_hash[customer]
if license
return license
elsif(##customers_to_licenses_hash.nil? || Time.now.utc - ##last_updated_at > 1.hours)
updated = self.update_set_customers_licenses
return ##customers_to_licenses_hash[customer] if updated
else
return nil
end
end
end
def self.update_set_customers_licenses
updated = nil
file_write = File.open(CUSTOMERS_LICENSES_FILE, 'w')
results = self.get_active_customers_licenses
if results
##customers_to_licenses_hash = results
file_write.print(results.to_json)
##last_updated_at = Time.now.utc
updated = true
end
file_write.close
updated
end
def self.get_active_customers_licenses
#http get thru api
#return hash of records
end
end
I'm pretty it's the case that every time rails loads, the environment is "fresh" and has no concept of "state" in between instances. That is to say, a mutex in one ruby instance (the one request to rails) has no effect on a second ruby instance (another request to rails or in this case, a rake task).
If you follow the data upstream, you'll find that the common root of every instance that can be used to synchronize them is the database. You could use transactional blocks or maybe a manual flag you set and unset in the database.
Related
I am trying to interact with Matlab.Application.Single win32ole objects in my rails application. The problem I am running into is that while I am developing my application, each separate request reloads my win32ole objects so I loose the connection to my matlab orignal instances and new instances are made. Is there a way to persist live objects between requests in rails? or is there a way to reconnect to my Matlab.Application.Single instances?
In production mode I use module variables to store my connection between requests, but in development mode Module variables are reloaded every request.
here is a snippet of my code
require 'win32ole'
module Calculator
#engine2 = nil
#engine3 = nil
def self.engine2
if #engine2.nil?
#engine2 = WIN32OLE.new("Matlab.Application.Single")
#engine2.execute("run('setup_path.m')")
end
#engine2
end
def self.engine3
if #engine3.nil?
#engine3 = WIN32OLE.new("Matlab.Application.Single")
#engine3.execute("run('setup_path.m')")
end
#engine3
end
def self.load_CT_image(file)
Calculator.engine2.execute("spm_image('Init','#{file}')")
end
def self.load_MR_image(file)
Calculator.engine3.execute("spm_image('Init','#{file}')")
end
end
I am then able to use my code in my controllers like this:
Calculator.load_CT_image('Post_Incident_CT.hdr')
Calculator.load_MR_image('Post_Incident_MRI.hdr')
You can keep an app-wide object in a constant that won't be reset for every request. Add this to a new file in config/initializers/:
ENGINE_2 = WIN32OLE.new("Matlab.Application.Single")
You might also need to include the .execute("run('setup_path.m')") line here as well (I'm not familiar with WIN32OLE). You can then assign that object to your instance variables in your Calculator module (just replace the WIN32OLE.new("Matlab.Application.Single") call with ENGINE_2, or simply refer to them directly.
I know this is beyond the scope of your question, but you have a lot of duplicated code here, and you might want to think about creating a class or module to manage your Matlab instances -- spinning up new ones as needed, and shutting down old ones that are no longer in use.
I have a running environment with a Rails application, Sidekiq and clockwork mod for scheduling purposes.
I have many different workers, filled with logger.debug and logger.info instructions, and I occasionally need to activate debug logging on some of them to know what's going on.
I like the Sidekiq logger, and I would like to utilize it because it just need a "logger.debug" instruction in the workers to do its job.
What I miss with my current setup is the possibility to activate the DEBUG level for some workers, while leaving the others in standard INFO.
Now in each of my workers I have this initialize method:
class SendMailOnStart
include Sidekiq::Worker
sidekiq_options :retry => false, :queue => :critical
def initialize
logger.level = Logger::INFO
end
.... ...
But if a change the level in one worker, this level will be overwritten by the level specified in the next one - e.g. if two workers are processed together, the second one will "win".
What's the best way to achieve this in an elegant way?
Coming from Java world, I can think only to create a custom logger and putting it in each worker, copying output format used by Sidekiq logger, adding a logger method in each worker like
def logger
logger = MyLogger.new
end
and changing the level when I neeed it in initialize method
Is this the best approach in Ruby?
I had a similar question and I found this thread more useful:
Log to different logger based on call from Sidekiq or Rails
You should be able set the log level for Sidekiq workers specifically in the block mentioned there by altering Rails.logger.
I have no clue if that’s the best approach, but I would do the following. First of all, let’s prepare the function to retrieve the caller’s filename and/or method:
def parse_caller
# magic number 7 below is the amount of calls
# on stack to unwind to get your caller
if /^(?<file>.+?):(?<line>\d+)(?::in `(?<method>.*)')?/ =~ caller(7).first
file = Regexp.last_match[:file]
line = Regexp.last_match[:line].to_i
method = Regexp.last_match[:method]
[file, line, method]
end
end
Then I would override the default formatter of Logger instance, compelling it to check the caller:
logger.formatter = lambda do |severity, datetime, progname, msg|
f,l,m = parse_caller
# HERE GOES YOUR CHECK
if f =~ /…/
…
end
end
I know it looks a weird hack, but it works fine for me.
I just switched to using Sidekiq on Heroku but I'm getting the following after my jobs run for a while:
2012-12-11T09:53:07+00:00 heroku[worker.1]: Process running mem=1037M(202.6%)
2012-12-11T09:53:07+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-12-11T09:53:28+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-12-11T09:53:28+00:00 heroku[worker.1]: Process running mem=1044M(203.9%)
It keeps growing like that.
For these jobs I'm using Nokogiri and HTTParty to retrieve URLs and parse them. I've tried changing some code but I'm not actually sure what I'm looking for in the first place. How should I go about debugging this?
I tried adding New Relic to my app but unfortunately that doesn't support Sidekiq yet.
Also, after Googling I'm trying to switch to a SAX parser and see if that works but I'm getting stuck. This is what I've done so far:
class LinkParser < Nokogiri::XML::SAX::Document
def start_element(name, attrs = [])
if name == 'a'
puts Hash[attrs]['href']
end
end
end
Then I try something like:
page = HTTParty.get("http://site.com")
parser = Nokogiri::XML::SAX::Parser.new(LinkParser.new)
Then I tried using the following methods with the data I retrieved using HTTParty, but haven't been able to get any of these methods to work correctly:
parser.parse(File.read(ARGV[0], 'rb'))
parser.parse_file(filename, encoding = 'UTF-8')
parser.parse_memory(data, encoding = 'UTF-8')
Update
I discovered that the parser wasn't working because I was calling parser.parse(page) instead of parser.parse(page.body) however I've tried printing out all the html tags for various websites using the above script and for some sites it prints out all the tags, while for others it only prints out a few tags.
If I use Nokogiri::HTML() instead of parser.parse() it works fine.
I was using Nokogiri::XML::SAX::Parser.new() instead of Nokogiri::HTML::SAX::Parser.new() for HTML documents and that's why I was running into trouble.
Code Update
Ok, I've got the following code working now, but can't figure out how to put the data I get into an array which I can use later on...
require 'nokogiri'
class LinkParser < Nokogiri::XML::SAX::Document
attr_accessor :link
def initialize
#link = false
end
def start_element(name, attrs = [])
url = Hash[attrs]
if name == 'a' && url['href'] && url['href'].starts_with?("http")
#link = true
puts url['href']
puts url['rel']
end
end
def characters(anchor)
puts anchor if #link
end
def end_element(name)
#link = false
end
def self.starts_with?(prefix)
prefix.respond_to?(:to_str) && self[0, prefix.length] == prefix
end
end
In the end I discovered that the memory leak is due to the 'Typhoeus' gem which is a dependency for the 'PageRankr' gem that I'm using in part of my code.
I discovered this by running the code locally while monitoring memory usage with watch "ps u -C ruby", and then testing different parts of the code until I could pinpoint where the memory leak came from.
I'm marking this as the accepted answer since in the original question I didn't know how to debug memory leaks but someone told me to do the above and it worked.
Just in case if you can't to resolve gems memory leaks issue:
You can run sidekiq jobs inside a forks, as described in the answer https://stackoverflow.com/a/1076445/3675705
Just add Application helper "do_in_child" and then inside your worker
def perform
do_in_child do
# some polluted task
end
end
Yes, i know it's kind a dirty solution becase Sidekiq should work in threads, but in my case it's the only one fast solution for production becase i have a slow jobs with parsing big XML files by nokogiri.
"Fast" thread feature will not give any advantage but memory leaks gives me a 2GB+ main sidekiq process after 10 minutes of work. And after one day sidekiq virtual memory grows up to 11GB (all available virtual memory on my server) and all the tasks are going extremely slow.
I am getting this error While running this
LoadError: Expected /home/user/Desktop/Tripurari/myapp/app/models/host.rb to define Host##
But every thing on it's place. Can some one tell me what the exact problem is below method.
def self.check_all(keyword)
memo_mutex = Mutex.new
memo = {}
threads = []
name = keyword.keyword
SITES.each do |site_and_options|
threads << Thread.new do
#host = Host.find_or_create_by_name(site)
if keyword.unavailable_usernames.find_by_host_id(#host.id)
memo[#host.name] = true
else
memo[#host.name] = false
end
end
end
threads.each { |t| t.join }
memo
end
The issue is probably caused by the autoloader. If the Host class is not yet loaded when first entering the loop where you create a couple of new threads, it is autoloaded, i.e. Rails searches the loadpath for a file matching the naming conventions and requires it.
This process is not threadsave. In your case, as you are creating servral threads in quick succession, each trying to autoload the global class, you get race conditions and strange things happen. Basically, you have two options for tackling this:
You can explicitly load the model before starting your threads by using require 'host' before starting your loop.
Or you can set config.threadsave! in an initializer. This will (among other things) preload all your classes when starting your server. This is preferred as with this, you avoid a truckload of other difficult to debug concurrency issues. For more information about config.threadsafe!, please refer to the excellent article by Aaron Patterson arguing it should be removed altogether in Rails 4.
Assuming the code you've quoted above is in a model's .rb file, add require_relative "host" to the top of that file.
For some time I have been using an old Ruby distribution (I think it was 1.8.6) on which I coded with the socket library. The old library had a method called ready?, which checked whether there was still data to be received without blocking. What would be the best replacement for that in 1.9?
The reason why I require it is as I have a program structured like this:
def handle_socket_messages
while true
break unless messages_to_send
sent_messages
end
while #s and #s.ready?
#read messages
readStr = #s.recv(0x1024)
...
end
end
(I then have another loop which keeps executing the handle_socket_messages method and then uses a sleep, so that the loop doesn't spin too fast, along with some other methods.
As you can see, I need to check whether I will receive data using #s.ready? (#s is a socket), otherwise the loops hang at readStr = #s.recv(0x1024), where the socket keeps wanting to receive data which the server doesn't send (It's waiting for data when it shouldn't).
What would be the best replacement for this method?
The solution was:
class Socket
def ready
not IO.select([self], nil, nil, 0) == nil
end
end
I've been using the ready? method successfully in Ruby 2.2.2 by requiring io/wait. There is a bit more info in this SO answer: https://stackoverflow.com/a/3983850/2464