Tell Merb not to timeout - ruby

after posting a question related to nginx, I'm a bit further with my investigations: The problem is, that the merb framework timeouts after about 30 seconds. If i tell the underlying nginx-server not to timeout, merb does, and I can't find a way to tell it not to; I need to do requests that take up to some minutes.
Any hints? Thanks a lot.
-- UPDATE --
Seems that mongrel behind merb is causing the error. Is there any way to change the mongrel-timeout running with merb?

Perhaps a different approach would yield better results - rather than workaround the timeouts, how about maximizing throughput by deferring the execution of the task?
Some approaches for long-running tasks are to either use run_later or exec a separate worker process to complete the task ...
def run_in_background(r)
Thread.new do
response = IO.popen(r) do |f|
f.read
end
end
end
In both cases you should return 202 (Accepted) as the status code and a URL where the calling application can get status updates.
I use this approach to handle requests which cause background batch processes to execute. Each writes it's start-time, progress and completion-time to a database (you could easily use a file). When the URL is invoked, I fetch the progress from the database and provide that back to the calling process.

Related

Ruby threading/forking with API (Sinatra)

I am using Sinatra gem for my API. What I want to do is when request is received process it, return the response and start new long running task.
I am newbie to Ruby, I have read about Threading but not sure what is the best way to accomplish my task.
Here my sinatra endpoint
post '/items' do
# Processing data
# Return response (body ...)
# Start long running task
end
I would be grateful for any advice or example.
I believe that better way to do it - is to use background jobs. While your worker executes some long-running tasks, it is unavailable for new requests. With background jobs - they do the work, while your web-worker can work with new request.
You can have a look at most popular backgroung jobs gems for ruby as a starting point: resque, delayed_jobs, sidekiq
UPD: Implementation depends on chosen gem, but general scheme will be like this:
# Controller
post '/items' do
# Processing data
MyAwesomeJob.enqueue # here you put your job into queue
head :ok # or whatever
end
In MyAwesomejob you implement your long-runnning task
Next, about Mongoid and background jobs. You should never use complex objects as job arguments. I don't know what kind of task you are implementing, but there is general answer - use simple objects.
For example, instead of using your User as argument, use user_id and then find it inside your job. If you will do it like that, you can use any DB without problems.
Agree with unkmas.
There are two ways to do this.
Threads or a background job gem like sidekiq.
Threads are perfectly fine if the processing times aren't that high and if you don't want to write code for the worker. But there is a strong possibility that you might run up too many threads if you don't use a threadpool or if you're expecting bursty http traffic.
The best way to do it is by using sidekiq or something similar. You could even have a job queue like beanstalkd in between and en-queue the job to it and return the response. You can have a worker reading from the queue and processing it later on.

sidekiq multiple threads how-to

I have a task that will take a long time so I split it into 3 parts and want to launch three threads that will work on it concurrently (I made sure there isn't any accessing of the same variables or anything, don't worry, they strictly handle their own datasets).
As far as I can tell sidekiq launches a new thread for each worker, so I made three workers importer,importer2,importer3, all in app/workers. In one of my controllers I have this code:
Importer.perform_async(arrays[0], date)
Importer2.perform_async(arrays[1], date)
Importer3.perform_async(arrays[2], date)
render json: 1
My question is: Is that the best way to handle this?
It seems odd that a) the request to the controller would take so long to render the 1 and in the sidekiq log I can see Importer JID-639e67d2aa20cce885690dc7 INFO: start as well as the same for Importer2 but not 3 and then then sidekiq just exits with killed
When I relaunch sidekiq, I get the Importer3 ... start and it then is the only one working (it updates a DB value and it is the only one changing`
Any ideas why?
Are you sure you have enough memory? Maybe this can be helpful: Debugging Mystery Sidekiq Shutdowns

How to run multiple threads at the same time in ruby while working with a file?

I've been messing around with Ruby and threading a little bit today. I have a list of proxies that I want to check. Assuming a timeout of 10 seconds going through a very large list of proxies will take many hours if I write something that goes like:
proxies.each do |proxy|
check_proxy(proxy)
end
My first problem with trying to figure out threads is how to START multiple at the same exact time. I found a neat little snippet of code online:
for page in pages
threads << Thread.new(page) { |myPage|
puts "Fetching: #{myPage}\n"
doc = Hpricot(open(myPage.to_s)).to_s
puts "Got #{myPage}: #{doc.size}"
}
end
Seems to work nicely as far as starting them all at the same time. So now I can... start checking all 7 thousand records at the same time?
How do I go to a file, take out a line for each thread, run a batch of like 20 and repeat the process?
Can I run a while loop that in turn starts 20 threads at the same (which remove lines from a file) and keeps going until the file is blank?
I'm a little weak on the logic of what I'm supposed to do.
Thanks guys!
PS.
Another thought: Will there be file access issues if 20 workers are constantly messing with it randomly? What would be a good way around that if this is so?
The keyword you are after is threadpool. You can either try to find one for Ruby (I am sure there's couple at least on Github), or roll your own.
Here's a simple implementation here on SO.
Re: the file access, IMO you shouldn't let workers alter the file directly, but do it in your main thread. You don't want to allow simultaneous edits there.
Try to use gem DelayJob:
https://github.com/tobi/delayed_job
You don't need to generate that many Threads in order to do this work. In fact generating a lot of Threads can decrease the overall performance of your application. If you handle checking each proxy asynchronously, without blocking, you can get by with far fewer threads.
You'd create a file manager thread to process the file. Each line gets added as a request to an array(request queue). On the other end of the request queue you can use eventmachine to send the requests without blocking. eventmachine would also be used to receive the responses and handle the timeout. The response can then be placed on another array(response queue) which your file manager thread polls. The file manager thread pulls the responses from the response queue and resolves if the proxy exists or not.
This gets you down to just creating two threads. One issue that you will have is limiting the number of requests that have been sent since this model will be able to send out all of the requests in less than a second and flood the nearest router. In my experience you should be able to have around 500 outstanding requests at any one time.
There is more than one way to solve this problem asynchronously but hopefully the above is enough to help get you started with non-blocking I/O.

Using ruby timeout in a thread making a database call

I am using Ruby 1.9.2.
I have a thread running which makes periodic calls to a database. The calls can be quite long, and sometimes (for various reasons) the DB connection disappears. If it does disappear, the thread just silently hangs there forever.
So, I want to wrap it all in a timeout to handle this. The problem is, on the second time through when a timeout should be called (always second), it still simply hangs. The timeout never takes effect. I know this problem existed in 1.8, but I was lead to believe timeout.rb worked in 1.9.
t = Thread.new do
while true do
sleep SLEEPTIME
begin
Timeout::timeout(TIMEOUTTIME) do
puts "About to do DB stuff, it will hang here on the second timeout"
db.do_db_stuff()
process_db_stuff()
end
rescue Timeout::Error
puts "Timed out"
#handle stuff here
end
end
end
Any idea why this is happening and what I can do about it?
One possibility is that your thread does not hang, it actually dies. Here's what you should do to figure out what's going on. Add this before you create your worker thread:
Thread.abort_on_exception = true
When an exception is raised inside your thread that is never caught, your whole process is terminated, and you can see which exception was raised. Otherwise (and this is the default), your thread is killed.
If this turns out not to be the problem, read on...
Ruby's implementation of timeouts is pretty naive. It sets up a separate thread that sleeps for n seconds, then blindly raises a Timeout exception inside the original thread.
Now, the original code might actually be in the middle of a rescue or ensure block. Raising an exception in such a block will silently abort any kind of cleanup code. This might leave the code that times out in an improper state.
It's quite difficult to tell if this is your problem exactly, but seeing how database handlers might do a fair bit of locking and exception handling, it might be very likely. Here's an article that explains the issue in more depth.
Is there any way you can use your database library's built-in timeout handling? It might be implemented on a lower level, not using Ruby's timeout implementation.
A simple alternative is to schedule the database calls in a separate process. You can fork the main process each time you do the heavy database-lifting. Or you could set up a simple cronjob to execute a script that executes it. This will be slightly more difficult if you need to communicate with your main thread. Please leave some more details if you want any advice on which option might suit your needs.
Based on your comments, the thread is dying. This might be a fault in libraries or application code that you may or may not be able to fix. If you wish to trap any arbitrary error that is generated by the database handling code and subsequently retry, you can try something like the following:
t = Thread.new do
loop do
sleep INTERVAL
begin
# Execute database queries and process data
rescue StandardError
# Log error or recover from error situation before retrying
end
end
end
You can also use the retry keyword in the rescue block to retry immediately, but you probably should keep a counter to make sure you're not accidentally retrying indefinitely when an unrecoverable error keeps occurring.

Coldfusion request never timeout for ldap requests !

I have an application running in CF8 which does calls to external systems like search engine and ldaps often. But at times some request never gets the response and is shown always in the the active request list.
Even tho there is request timeout set in the administration, its not getting applied to these scenarios.
I have around 5 request still pending to be finished for the last 20hours !!!
My server settings are as below
Timeout Requests after ( seconds) : 300 sec
Max no of simultaneous requests : 20
Maximum number of running JRun threads : 50
Maximum number of running JRun threads : 1000
Timeout requests waiting in queue after 300 seconds
I read through some articles and found there are cases where threads are never responded or killed. But i dont have a solid solution how can i timeout this or kill this automatically
really appreciated if you guys have some idea on this :)
The ColdFusion timeout does not apply to 'third party' connections.
A long-running LDAP query, for example, will take as long as it needs. When the calling template gets the result from the query your timeout will apply.
This often leads to confusion interpreting errors. You will get an error saying that whichever function after the long running request causes the timeout.
Further reading available here
You can (and probably should) set a timeout on the CFLDAP call itself. http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7f97.html
Thanks, Antony, for recommending my blog entry CF911: Lies, Damned Lies, and CF Request Timeouts...What You May Not Realize. This problem of requests not timing out when expected can be very troublesome and a surprise for most.
But Anooj, while that at least explains WHY they don't die (and you can't kill them within CF), one thing to consider is that you may be able to kill them in the REMOTE server being called, in your case, the LDAP server.
You may be able to go to the administrator of THAT server and on showing them that CF has a long-running request, they may be able to spot and resolve the problem. And if they can, that may free the connection from CF and your request then will stop.
I have just added a new section on this idea to the bottom of that blog entry, as "So is there really nothing I can do for the hung requests?"
Hope that helps.

Resources