Rails app which handle and activation of a license using an external service, the external service sometime delays the handling of rails request to over 30s, which will then return an error to front end (I'm running heroku, so max is 30s).
I tried using ActiveJobs and the default rails async adapter (Rails 5), and I can see that is working in Heroku out of the box. I keep reading that I should be using another web process and for example redis, but if the background job should just be performed straight after the request is done and if is just hitting another API outside which may be slower, is it so bad to use the default async?
I can see that this is handle in an in-process thread but I don't see a reason for such small job to be having another web process.
I use the async adapter in production for sending emails. This is a very small job. An email could take up to 3 seconds to send.
The doc said it's a poor fit for production because it will drop pending jobs on restart. If I remember correctly, Heroku restarts dynos once a day.
If your job is pending during the restart, the job will be lost. For my case, a pending email during the restart is pretty slim. So far so good.
But if you have jobs taking 30 seconds, I'll use Resque or DelayedJob.
If for small background job in production, which does not require 100% persistence in case of failure/server restart, whose duration is relatively short and thus separate process would be an overkill, I'd recommend using Sucker Punch.
Sucker Punch gem is designed to handle exactly such case. It prepares execution thread pool for each Job you create, using the concurrent-ruby gem, which is (probably) the most robust concurrency library in Ruby. It also hooks on_exit to finish all the pending tasks, so I guess you can expect this gem to be more reliable than the AsyncJob.
One thing to note is that although Sucker Punch is supported on Active Job, the adapter is not well written. Or, at least, when you use Sucker Punch adapter, it's behavior would be just like that of async adapter. So, I'd recommend using bare Sucker Punch if you wanted something just a little more useful/robust than AsyncJob.
I'm trying to interact with Google's Calendar API. My tests so far show response times of 5-10 seconds to insert a single event, and I may need to export thousands of events at once [don't ask]. This seems likely to spam the heck out of my queues for unreasonable amounts of time. (95% of current jobs in this app finish in <300ms, so this will make it harder to allocate resources appropriately.)
I'm currently using Faraday in this app to call other, faster Google APIs. The Faraday wiki suggests using Typhoeus for parallel HTTP requests; however, using Typhoeus with Sidekiq was deemed "a bad idea" as of 2014.
Is Typhoeus still a bad idea? If so, is it reasonable to spawn N threads in a Sidekiq worker, make an HTTP request within each thread, and then wait for all threads to rejoin? Is there some other way to accomplish this extremely I/O-bound task without throwing more workers at the problem? Should I ask my manager to increase our Sidekiq Enterprise spend? ;) Or should I just throw these jobs in a low-priority queue and tell our users with ridiculous habits that they'll just have to wait?
It's reasonable to use threads within Sidekiq job threads. It's not reasonable to build your own threading infrastructure. You can use a reusable thread pool with the concurrent-ruby or parallel gems, you can use an http client which is thread-safe and allows concurrent requests, etc. HTTP.rb is a good one from Tony Arcieri but plain old net/http will work too:
https://github.com/httprb/http/wiki/Thread-Safety
Just remember that there's a few complexities: the job might be retried, how do you handle errors that the HTTP client raises? If you don't split these requests 1-to-1 with jobs, you might need to track each or idempotency becomes an issue.
And you are always welcome to increase your Sidekiq Enterprise thread count. :-D
I am wanting to design an application where the back end is constantly polling different sensors while the front end (sinatra) allows for this data to be viewed either via json api, or by simply displaying the results in html.
What considerations should I take to develop such an application and how should I structure the application for best scaling and ease of maintenance.
My first thought is to simply let sinatra poll the sensors every time it receives a request to the proper end points, but this seems like it could bog down quiet fast especially seeing how that some sensors only update themselves every couple seconds.
My second thought is to have a background process (or thread) poll the sensors and store the values for sinatra. When a request is received sinatra can then simply poll the background process for a cached value (or pull it from the threaded code) and present it to the client.
I like the second thought more, but I am not sure how I would develop the "background application" so that sinatra could poll it for data to present to the client. The other option would be for sinatra to thread the sensor polling code so that it can simply grab values from it inside the same process rather than requesting it from another process.
Due note that this application will also be responsible for automation of different relays and such based off the sensors and sinatra is only responsible for relaying the status of the sensors to the user. I think separating the backend (automation + sensor information) in a background process/daemon from the frontend (sinatra) would be ideal, but I am not sure how I would fetch the data for sinatra.
Anyone have any input on how I could structure this? If possible I would also appreciate a sample application that simply displays the idea that I could adopt and modify.
Thanks
Edit::
After a bit more research I have discovered drb (distributed ruby http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html) which allows you to make remote calls on objects over the network. This may be a suitable solution to this problem as the daemon can automate the relays, read the sensors and store the values in class objects, and then present the class objects over drb so that sinatra can call the getters on the remote object to obtain up to date data from the daemon. This is what I initially wanted to attempt to do.
What do you guys think? Is this advisable for such an application?
I have decided to go with Sinatra, DRB, and Daemons to meet the requirements I have stated above.
The web front end will run in its own process and only serve up statistical information via DRB interactions with the backend. This will allow quick response times for the clients and allow me to separate front end code from backend code.
The backend will run in its own process and constantly poll the sensors for updates and store them as class objects with getters so that Sinatra can fetch the information over DRB when required. It will also use the gathered information for automation that is project specific.
Finally the backend and frontend will be wrapped with a Daemons wrapper so that the project will have the capabilities of starting, restarting, stopping, run status, and automatic restarting of the Daemons if it crashes or quits for what ever reason.
Source information:
http://phrogz.net/drb-server-for-long-running-web-processes
http://ruby-doc.org/stdlib-1.9.3/libdoc/drb/rdoc/DRb.html
http://www.sinatrarb.com/
https://github.com/thuehlinger/daemons
We have a requirement to build a small Sinatra app which will capture events from an external API and add them to a queue for processing by a Rails application. We could be receiving hundreds of thousands of events per day.
Given that resque rules itself out by not being able to guarantee that jobs won't get lost, what other options are out there. We've looked at delayed_job and that doesn't play well with Sinatra, so what other alternatives are there for something fast, reliable and scalable.
Have you looked at Beanstalk?
http://kr.github.com/beanstalkd/
http://www.igvita.com/2010/05/20/scalable-work-queues-with-beanstalk/
There's an example Sinatra/Beanstalk app on GitHub:
https://github.com/adamwiggins/clockwork-sinatra-beanstalk
Alternatively you might want to check out RabbitMQ with ruby-amqp, but I think I'd first try the Beanstalk approach (it handles the workload you describe in your post for us):
https://github.com/ruby-amqp/amqp
Is Sinatra multi-threaded? I read else where that "sinatra is multi-threaded by default", what does that imply?
Consider this example
get "/multithread" do
t1 = Thread.new{
puts "sleeping for 10 sec"
sleep 10
# Actually make a call to Third party API using HTTP NET or whatever.
}
t1.join
"multi thread"
end
get "/dummy" do
"dummy"
end
If I access "/multithread" and "/dummy" subsequently in another tab or browser then nothing can be served(in this case for 10 seconds) till "/multithread" request is completed. In case activity freezes application becomes unresponsive.
How can we work around this without spawning another instance of the application?
tl;dr Sinatra works well with Threads, but you will probably have to use a different web server.
Sinatra itself does not impose any concurrency model, it does not even handle concurrency. This is done by the Rack handler (web server), like Thin, WEBrick or Passenger. Sinatra itself is thread-safe, meaning that if your Rack handler uses multiple threads to server requests, it works just fine. However, since Ruby 1.8 only supports green threads and Ruby 1.9 has a global VM lock, threads are not that widely used for concurrency, since on both versions, Threads will not run truly in parallel. The will, however, on JRuby or the upcoming Rubinius 2.0 (both alternative Ruby implementations).
Most existing Rack handlers that use threads will use a thread pool in order to reuse threads instead of actually creating a thread for each incoming request, since thread creation is not for free, esp. on 1.9 where threads map 1:1 to native threads. Green threads have far less overhead, which is why fibers, which are basically cooperatively scheduled green threads, as used by the above mentioned sinatra-synchrony, became so popular recently. You should be aware that any network communication will have to go through EventMachine, so you cannot use the mysql gem, for instance, to talk to your database.
Fibers scale well for network intense processing, but fail miserably for heavy computations. You are less likely to run into race conditions, a common pitfall with concurrency, if you use fibers, as they only do a context switch at clearly defined points (with synchony, whenever you wait for IO). There is a third common concurrency model: Processes. You can use preforking server or fire up multiple processes yourself. While this seems a bad idea at first glance, it has some advantages: On the normal Ruby implementation, this is the only way to use all your CPUs simultaniously. And you avoid shared state, so no race conditions by definition. Also, multiprocess apps scale easily over multiple machines. Keep in mind that you can combine multiple process with other concurrency models (evented, cooperative, preemptive).
The choice is mainly made by the server and middleware you use:
Multi-Process, non-preforking: Mongrel, Thin, WEBrick, Zbatery
Multi-Process, preforking: Unicorn, Rainbows, Passenger
Evented (suited for sinatra-synchrony): Thin, Rainbows, Zbatery
Threaded: Net::HTTP::Server, Threaded Mongrel, Puma, Rainbows, Zbatery, Thin[1], Phusion Passenger Enterprise >= 4
[1] since Sinatra 1.3.0, Thin will be started in threaded mode, if it is started by Sinatra (i.e. with ruby app.rb, but not with the thin command, nor with rackup).
While googling around, found this gem:
sinatra-synchrony
which might help you, because it touches you question.
There is also a benchmark, they did nearly the same thing like you want (external calls).
Conclusion: EventMachine is the answer here!
Thought I might elaborate for people who come across this. Sinatra includes this little chunk of code:
server.threaded = settings.threaded if server.respond_to? :threaded=
Sinatra will detect what gem you have installed for a webserver (aka, thin, puma, whatever.) and if it responds to "threaded" will set it to be threaded if requested. Neat.
After making some changes to code I was able to run padrino/sinatra application on mizuno
. Initially I tried to run Padrino application on jRuby but it was simply too unstable and I did not investigate as to why. I was facing JVM crashes when running on jRuby. I also went through this article, which makes me think why even choose Ruby if deployment can be anything but easy.
Is there any discussion on deployment of applications in ruby? Or can I spawn a new thread :)
I've been getting in to JRuby myself lately and I am extremely surprised how simple it is to switch from MRI to JRuby. It pretty much involves swapping out a few gems (in most cases).
You should take a look at the combination JRuby and Trinidad (App Server). Torquebox also seems to be an interesting all-in-one solution, it comes with a lot more than just an app server.
If you want to have an app server that supports threading, and you're familiar with Mongrel, Thin, Unicorn, etc, then Trinidad is probably the easiest to migrate to since it's practically identical from the users perspective. Loving it so far!