I'm trying to create a Stats counter (similar to the one in ostrich for scala by Twitter) but having a hard time making sure that all my threads have access to this. My Stats class is defined like this:
class Stats
##counters = {}
.. accessors ..
def self.incr(counter, amt = 1)
if !##counters[counter]
##counters[counter] = java.util.concurrent.atomic.AtomicInteger.new()
end
##counters[counter].getAndAdd(amt)
end
end
I know there are some issues with the thread-safety of the counters hash itself. If I create threads manually, they seem to be able to access the Stats.counters globally but I'm trying to create a rackup application (Sinatra, embedded in Jetty using jetty-rackup) to show this info, and in that Sinatra application this Stats is empty. Is there a good way to share this counter with other parts of the application or is sinatra doing something that's clearing out the global variable scope?
We discussed this on IRC #jruby, but just to reiterate here for posterity, my best guess is that you are encountering a situation where jetty-rackup is creating and pooling multiple runtimes to use for servicing requests. Each of these runtimes has the same Ruby code loaded but is unaware of each other, similar to multiple Ruby processes. You have many options for sharing state between them.
Use a Java class (with a singleton instance or static methods/fields)
Use a purpose-built Java in-memory caching library
Use the Java Servlet session
Use an external mechanism (memcached, DB, etc.)
Many more
Related
We aren't using Rails and its not a Web application either we use JRuby and Ruby application and that pick message over a queue operate on them and
We are wondering is there a way to check the performance such application like DB timing slow method calls etc without actually starting the Ruby and JRuby in profiling mode.
Currently, we are using diffing the timing for each DB call like this.
start_time = Time.now
## Possible DB call.
total_time_taken= (Time.now - start_time) * 1000
I don't find this method reliable also we missing metric of method that potentially slow.
So essentially we are looking for something similar to what Newrelic and Skylight.io do for Rails application but for Ruby and JRuby application performance monitoring.
Scout or New Relic should work if you have a Rack application. Otherwise you might be able to use Datadog
There are other profiling gems which might be of use.
memory_profiler
ruby-prof
There are many others
https://github.com/flyerhzm/bullet
https://github.com/tmm1/stackprof
https://github.com/ankane/pghero
For Jruby you may have limited choices, some of the rack tools or Datadog may work but also see
Jruby see https://github.com/jruby/jruby/wiki/Profiling-JRuby
I have a Puma server configured to use two workers, each with 16 threads. And having config.threadsafe! disabled to allow threading using puma.
Now I have a code, which I doubt not using threadsafety even though I have used Mutex as a constant in there. I want this code to be executed by only one puma thread at a time to avoid concurrency issues, and uses Mutex for it.
Now, My question is,
Does Mutex works to inject threadsafety while using puma threads, on multiple workers? As I understand, worker is a separate process and so Mutex will not work.
If Mutex doesn't work as per above, then what could be the solution to enable threadsafety on perticular code?
Code example
class MyService
...
MUTEX = Mutex.new
...
def initialize
...
end
def doTask
MUTEX.synchronize do
...
end
end
end
The MUTEX thing didn't worked for me, so I need to find another approach. Please see the solution below.
The problem is, Diff. puma threads are making requests to external remote API at the same time and sometimes the remote API takes time to respond.
I wanted to restrict the number of total API requests, but it was not working because of above issue.
To resolve this,
I have created a DB table where I will create a new entry as in-pogress , when the request is sent to external API.
Once that API responds back, I will update the entry as processed
I am checking total requests having in-progress before making any new requests to the external API.
This way, I am able to restrict the total number of requests from my system to external API.
I'm using Redis in my application, both for Sidekiq queues, and for model caching.
What is the best way to have a Redis connection available to my models, considering that the models that will be hitting Redis will be called both from my Web application (ran via Puma), and from background jobs inside Sidekiq?
I'm currently doing this in my initializers:
Redis.current = Redis.new(host: 'localhost', port: 6379)
And then simply use Redis.current.get / Redis.current.set (and similar) throughout the code...
This should be thread-safe, as far as I understand, since the Redis Client only runs one command at a time, using a Monitor.
Now, Sidekiq has its own connection pool to Redis, and recommends doing
Sidekiq.redis do |conn|
conn.get
conn.set
end
As I understand it, this would be better than the approach of just using Redis.current because you don't have multiple workers on multiple threads waiting on each other on a single connection when they hit Redis.
However, how can I make this connection that I get from Sidekiq.redis available to my models? (without having to pass it around as a parameter in every method call)
I can't set Redis.current inside that block, since it's global, and I'm back to everyone using the same connection (plus switching between them randomly, which might even be non-thread-safe)
Should I store the connection that I get from Sidekiq.Redis into a Thread-local variable, and use that thread-local variable everywhere?
In that case, what do I do in the "Puma" context? How do I set the thread-local variable?
Any thoughts on this are greatly appreciated.
Thank you!
You use a separate global connection pool for your application code. Put something like this in your redis.rb initializer:
require 'connection_pool'
REDIS = ConnectionPool.new(size: 10) { Redis.new }
Now in your application code anywhere, you can do this:
REDIS.with do |conn|
# some redis operations
end
You'll have up to 10 connections to share amongst your puma/sidekiq workers. This will lead to better performance since, as you correctly note, you won't have all the threads fighting over a single Redis connection.
All of this is documented here: https://github.com/mperham/sidekiq/wiki/Advanced-Options#connection-pooling
I need to build a webservice with application state. By this I mean the webservice needs to load and process a lot of data before being ready to answer requests, so a Rails-like approach where normally you don't keep state at the application level between two requests doesn't look appropriate.
I was wondering if a good approach was a daemon (using Daemon-Kit for instance) embedding a simple web server like Thin. The daemon would load and process the initial data.
But I feel it would be better to use Thin directly (launched with Rack). In this case how can I initialize and maintain my application state ?
EDIT: There will be thousands of requests per second, so having to read the app state from files or DB at each one is not efficient. I need to use global variables, and I am wondering what it the cleanest way to initialize and store then in a Ruby/Thin environment.
You could maintain state a number of ways.
A database, including NoSQL databases like Memcache or Redis
A file, or multiple files
Global variables or class variables, assuming the server never gets restarted/reloaded
I'm building a simple recipe search engine with Ruby and Sinatra for an iPhone app, using RabbitMQ for my message queue. I'm looking around and finding a lot of different implementation choices for background processes, but most of them either implement custom message queue algorithms or operate as Rails plugins.
What's out there in terms of high-quality framework-agnostic worker libraries that will play nicely with RabbitMQ?
And are there any best-practices I should keep in mind while writing the worker code, beyond the obvious:
# BAD, don't do this!
begin
# work
rescue Exception
end
I am using Beanstalk and have written my own daemons using the daemons gem. Daemon kit is a new project but queue loops are not yet implemented. You can also have a look at Nanite if it fits your needs, it's framework-agnostic.
I ended up writing my own library in a fit of uncontrollable yak-shaving. Daemon kit was the right general idea, but seriously way too heavyweight for my needs. I don't want what looks like a full rails app for each of my daemons. I'm going to end up with at least 3 daemons, and that would be a colossal mess of directories. The daemons gem has a horrible API, and while I was tempted to abstract it away, I realized it was probably easier to just manage the fork myself, so that's what I did.
API looks like this:
require "rubygems"
require "chaingang"
class Worker
def setup
# Set up connections here
end
def teardown
# Tear down connections here
end
def call
# Do some work
sleep 1
end
end
ChainGang.prepare(Worker.new)
And then you just use the included rake task to start/stop/restart or check status. I took a page from the Rack playbook: anything that implements the call method is fair game as an argument to ChainGang.prepare and ChainGang.work methods, so a Proc is a valid worker object.
Took me longer to build than it would've to use something else, but I have a vague suspicion that it'll pay off in the long-run.
Check out nanite (written in Ruby), it's a young project written atop rabbitmq.
github.com/ezmobius/nanite/tree/master