Reliable fast queueing systems with Sinatra - ruby

We have a requirement to build a small Sinatra app which will capture events from an external API and add them to a queue for processing by a Rails application. We could be receiving hundreds of thousands of events per day.
Given that resque rules itself out by not being able to guarantee that jobs won't get lost, what other options are out there. We've looked at delayed_job and that doesn't play well with Sinatra, so what other alternatives are there for something fast, reliable and scalable.

Have you looked at Beanstalk?
http://kr.github.com/beanstalkd/
http://www.igvita.com/2010/05/20/scalable-work-queues-with-beanstalk/
There's an example Sinatra/Beanstalk app on GitHub:
https://github.com/adamwiggins/clockwork-sinatra-beanstalk
Alternatively you might want to check out RabbitMQ with ruby-amqp, but I think I'd first try the Beanstalk approach (it handles the workload you describe in your post for us):
https://github.com/ruby-amqp/amqp

Related

Should I use rails5 ActiveJob default async adapter for small background job in production?

Rails app which handle and activation of a license using an external service, the external service sometime delays the handling of rails request to over 30s, which will then return an error to front end (I'm running heroku, so max is 30s).
I tried using ActiveJobs and the default rails async adapter (Rails 5), and I can see that is working in Heroku out of the box. I keep reading that I should be using another web process and for example redis, but if the background job should just be performed straight after the request is done and if is just hitting another API outside which may be slower, is it so bad to use the default async?
I can see that this is handle in an in-process thread but I don't see a reason for such small job to be having another web process.
I use the async adapter in production for sending emails. This is a very small job. An email could take up to 3 seconds to send.
The doc said it's a poor fit for production because it will drop pending jobs on restart. If I remember correctly, Heroku restarts dynos once a day.
If your job is pending during the restart, the job will be lost. For my case, a pending email during the restart is pretty slim. So far so good.
But if you have jobs taking 30 seconds, I'll use Resque or DelayedJob.
If for small background job in production, which does not require 100% persistence in case of failure/server restart, whose duration is relatively short and thus separate process would be an overkill, I'd recommend using Sucker Punch.
Sucker Punch gem is designed to handle exactly such case. It prepares execution thread pool for each Job you create, using the concurrent-ruby gem, which is (probably) the most robust concurrency library in Ruby. It also hooks on_exit to finish all the pending tasks, so I guess you can expect this gem to be more reliable than the AsyncJob.
One thing to note is that although Sucker Punch is supported on Active Job, the adapter is not well written. Or, at least, when you use Sucker Punch adapter, it's behavior would be just like that of async adapter. So, I'd recommend using bare Sucker Punch if you wanted something just a little more useful/robust than AsyncJob.

For Airbrake 5, do I need to use Sidekiq?

Environment
I'm installing Airbrake on Heroku for a Ruby web app (not Rails).
So Airbrake#notify for Airbrake version 5 for Ruby sends a notification asynchronously.
My worry is that if I don't use Sidekiq worker + Redis, then it might still be possible that calling Airbrake#notify might still slow down the app's response time depending on how it's used (whether in a Rails-like controller or some other part of the app).
Besides overcoming the potential issue mentioned above, the other advantage of using Sidekiq worker + Redis to call Airbrake#notify I can think of is that Redis has a couple of persistence strategies so if the app crashes I can backtrack and look over the backed up error notifications from the Sidekiq queue.
Whereas if I don't use Sidekiq + Redis and the app crashes, then there will be no backed up data....
Questions
Does that mean I don't need to use Sidekiq + Redis (or some other equivalent database)?
Am I understanding the issue correctly? I don't have a very complete understanding of "pooled connections" and asynchronous processing, so this makes understanding what to do here a bit challenging.
This is the class that sends async notices https://github.com/airbrake/airbrake-ruby/blob/master/lib/airbrake-ruby/async_sender.rb
It's using standard ruby threads to send messages, so no background service should be necessary

Reliable scheduling with sidekiq

I'm building a monitoring service similar to pingdom but monitoring different aspects of a system and using sidekiq to queue the tasks which is working well. What I need to do is to schedule sending out pings every minute, rather than using a cron based system which would require spinning up a new ruby instance every minute I have gone down the route of using sidetiq (notice the different spelling with a "t") which uses sidekiq's own queue to schedule future tasks. This feels like a neat solution, however I am concerned this may not be the most reliable way of scheduling tasks? If there are issues with the system (as there inevitable will be at some point) will this method of scheduling tasks be less reliable than using a cron based method and why?
Thanks
You are giving too short description of your system needs but I'll try to guess how it could be:
In the first place using sidekiq means that you'll also need an instance of redis and also means that you'll need a way to monitor the sidekiq process and restart it in case of failure and possibly redis server.
A method based on cron tasks will have fewer requirements therefore much less possibilities of failing.
cron has been around for a long time and it's battle tested and it's very very reliable, but has it's drawbacks too.
Said that, you can build a system with separate instances of redis in a master/slave configuration and you can also use Redis sentinel to implement a failover in case of the master failure, implement a monitoring/alerting system on this setup (you can use something super simple like this http://contribsys.com/inspeqtor/ from the sidekiq author) and you can also start several instances of sidekiq in different machines.
With all of that, you can have a quite reliable system for running sidekiq with sidetiq.
Hope it helps

ROR + Detect how many ports are running for ruby application

In my ruby on rails project, I have to take pull from sql-server to my mysql database.
When I run my project on port 3000, it makes system busy when I want to take pull.
I want such method or way which system can detect, how many ports are running for ruby application and how to close if it is not in use ?
Thanks in advance.
Hard to understand exactly what you're asking for, but I'm assuming that when you are synchronizing databases, the system becomes busy and you can't serve any pages. This is a perfect example for the use of a background job that allows you to do tasks like this without affecting the rails application. The two gems that come to mind that will allow you to do this is Delayed_job and Resque. An outstanding screencast for doing this type of stuff is listed below as well.
http://github.com/collectiveidea/delayed_job
https://github.com/defunkt/resque/
http://railscasts.com/episodes/171-delayed-job
http://railscasts.com/episodes/271-resque

Heroku: web dyno vs. worker dyno? How many/what ratio do I need?

I was curious as to what the difference between web and worker dynos is on Heroku. They give a one sentence explanation on their pricing page, but this just left me confused. How do I know how many to pick of each? Is there a ratio I should aim for? I'm pretty new to this stuff, so can someone give an in depth explanation, or maybe some sort of way I can calculate how many and which kind of dynos I would need?
Also, I'm confused about what they mean by the amount of hours for each dyno.
http://www.heroku.com/pricing
I also happened upon this article. As one of their suggested solutions, they said to increase the amount of dynos. Which type of dyno are they referring to here?
http://devcenter.heroku.com/articles/backlog-too-deep
Your best indication if you need more dynos (aka processes on Cedar) is your heroku logs. Make sure you upgrade to expanded logging (it's free) so that you can tail your log.
You are looking for the heroku.router entries and the value you are most interested is the queue value - if this is constantly more than 0 then it's a good sign you need to add more dynos. Essentially this means than there are more requests coming in than your process can handle so they are being queued. If they are queued too long without returning any data they will be timed out.
There's no ideal ratio I'm afraid, you could have an app doing 100 requests a second needing many web processes but just doesn't make use of workers. You only need worker processes if you are doing processing in the background like sending emails etc etc.
ps Backlog too deep would be a Dyno web process that would cause it.
UPDATE: On March 26 2013 Heroku removed queue and wait fields from the log out put.
queue and wait fields have been removed from router log messages.
Also, the Heroku router no longer sets X-Heroku-Dynos-In-Use,
X-Heroku-Queue-Depth and X-Heroku-Queue-Wait-Time HTTP headers for
incoming requests.
Dynos are basically processes that run on your instance. With the new Cedar stack, they can be set up to execute any arbitrary shell command. For web applications, you generally have one process called "web" which is responsible for responding to HTTP requests from users. All other processes are what were previously called "workers." These run continuously in the background for things like cron, processing queues, and any heavy computation that you don't want to tie up your web processes. You can also scale each type of process, so that multiple processes of each type will be booted up for additional concurrency. The amount of each that you use really depends on the needs of your application and the load it receives. You can use tools like the New Relic plugin to monitor these things. Take a look at the articles on the Process Model and the Procfile in Heroku's dev center for more details.
A number of people have mentioned that there is no known ratio and that the ratio of web workers to 'background' workers you will want is dependent on how you designed your application - that is correct. However I thought it might be useful to add that as a general rule of thumb, you want your web workers - and thus the controller actions they are serving - to be lightning quick and very lightweight, to reduce latency in response times from browser actions. If there is some browser action that would require more than, say, about a half a second of real time to serve, then you will probably want to architect some sort of system that pushes the bulk of that action on to a queue.
You would then design an offline worker dyno(s) that will service this queue. They can take much longer because there are no HTTP responses pending on their output. Perhaps the page you rendered from the initial browser request that pushed the action will serve some Javascript that starts a thread that checks to see if the request has finished every 5 seconds, or something along those lines.
I still can't give you a ratio to work with for the same reason others have given, but hopefully this helps you decide how to architect your app. (I should also mention this is just one design out of many valid ones.)
https://stackoverflow.com/a/19965981/1233555 - Heroku has gone to random routing, so some dynos can have queues stacking up (while they serve a lengthy request) while other dynos are free. Avoid this by making sure that all requests are handled very quickly in your web dynos. This will reduce the number of web dynos you need, while requiring more worker dynos.
You also need to care about your web app supporting concurrency, which only some Rails configs do - try Unicorn, or carefully-written code (for I/O that doesn't block the EventMachine) with Thin.
You probably have to try, rather than calculate, to see how many dynos of each kind you need. Make sure their New Relic reports the dyno queue - see the above link.
Short answer is that you need as many as you need to keep your queues down.
As John describes, if you start seeing a queue in your logs then you need more dynos. If you start seeing your background queues getting too long (how you get this info is dependant on what you have implemented) then you need more workers.
There is no ratio as it's very much dependent on your application design and usage.

Resources