Run when you can - ruby

In my sinatra web application, I have a route:
get "/" do
temp = MyClass.new("hello",1)
redirect "/home"
end
Where MyClass is:
class MyClass
#instancesArray = []
def initialize(string,id)
#string = string
#id = id
#instancesArray[id] = this
end
def run(id)
puts #instancesArray[id].string
end
end
At some point I would want to run MyClass.run(1), but I wouldn't want it to execute immediately because that would slow down the servers response to some clients. I would want the server to wait to run MyClass.run(temp) until there was some time with a lighter load. How could I tell it to wait until there is an empty/light load, then run MyClass.run(temp)? Can I do that?
Addendum
Here is some sample code for what I would want to do:
$var = 0
get "/" do
$var = $var+1 # each time a request is recieved, it incriments
end
After that I would have a loop that would count requests/minute (so after a minute it would reset $var to 0, and if $var was less than some number, then it would run tasks util the load increased.

As Andrew mentioned (correctly—not sure why he was voted down), Sinatra stops processing a route when it sees a redirect, so any subsequent statements will never execute. As you stated, you don't want to put those statements before the redirect because that will block the request until they complete. You could potentially send the redirect status and header to the client without using the redirect method and then call MyClass#run. This will have the desired effect (from the client's perspective), but the server process (or thread) will block until it completes. This is undesirable because that process (or thread) will not be able to serve any new requests until it unblocks.
You could fork a new process (or spawn a new thread) to handle this background task asynchronously from the main process associated with the request. Unfortunately, this approach has the potential to get messy. You would have to code around different situations like the background task failing, or the fork/spawn failing, or the main request process not ending if it owns a running thread or other process. (Disclaimer: I don't really know enough about IPC in Ruby and Rack under different application servers to understand all of the different scenarios, but I'm confident that here there be dragons.)
The most common solution pattern for this type of problem is to push the task into some kind of work queue to be serviced later by another process. Pushing a task onto the queue is ideally a very quick operation, and won't block the main process for more than a few milliseconds. This introduces a few new challenges (where is the queue? how is the task described so that it can be facilitated at a later time without any context? how do we maintain the worker processes?) but fortunately a lot of the leg work has already been done by other people. :-)
There is the delayed_job gem, which seems to provide a nice all-in-one solution. Unfortunately, it's mostly geared towards Rails and ActiveRecord, and the efforts people have made in the past to make it work with Sinatra look to be unmaintained. The contemporary, framework-agnostic solutions are Resque and Sidekiq. It might take some effort to get up and running with either option, but it would be well worth it if you have several "run when you can" type functions in your application.

MyClass.run(temp) is never actually executing. In your current request to / path you instantiate a new instance of MyClass then it will immediately do a get request to /home. I'm not entirely sure what the question is though. If you want something to execute after the redirect, that functionality needs to exist within the /home route.
get '/home' do
# some code like MyClass.run(some_arg)
end

Related

Prefered way to fork / start subprocesses in Cucumber

Let's say I have this scenario:
Scenario: Test LDAP access
Given that the LDAP dummy server is started
And the LDAP query is executed
...
I wish to start a LDAP server in that step. In my case, I use ruby-ldapserver, so I could, in theory, do this in my step:
args = { ... }
#ldap_pid = fork do
redirect_stdout_stderr_to_logfile()
wait_for_ldap_requests(args)
exit # avoid messing with Cucumber/web driver cleanup
end
...
After do
if #ldap_pid
Process.kill("HUP", #ldap_pid)
Process.wait #ldap_pid
end
end
A totally different approach:
system("some_script_that_starts_ldap_dummy < #{input} >#{tmpfile} 2>&1 &")
This certainly works but is rather unelegant (starting a ruby program from inside ruby - unnecessary process creation, and I need to set up the input parameters for that subprogram as well).
All that said, I'm not too altogether about either approach (the "warm fuzzy feeling" is not there).
What is your standard approach to these things? Is there one to speak of? Does Cucumber bring something to the table that could support me here? Should I run something to tell Cucumber that it has forked and should handle itself like a child process?
Edit: actually, when playing around with the fork approach, I did not notice any problems with the DB at all. I did notice that if I kill the child with SIGINT, it will break the web driver (Poltergeist / PhantomJS) in my case. A functioning workaround for this is to send a SIGHUP, handle it in the child by shutting down gracefully (if needed) but not callingexit; and then, after a few seconds a SIGKILL (which denies the child any chance to close down any protocols and just rips it away). Not nice... and not free of race conditions, say if the CI server should be under load.

Ruby thread dies after redirect

i start a thread in my controller and want to redirect the user immediately after that.
class Profile::GeneralController < ProfileController
def update
startTheThread(profile)
#sleep(5)
redirect_to 'selection_controller'
end
end
class ProfileController < ApplicationController
def startTheThread(profile = nil)
$collector_threads[current_user.id][sector] = Thread.new {
Thread.current['collecting_status'] = { a: 1, c: 0, c_id: -1, r: false }
start_threaded_collector(sector)
}
end
end
When i tell the controller to sleep for - let's say - 5 seconds, the thread finishes like it was supposed to.
The thread is dead as soon as the user changes to another page - why is that and how can i keep threads alive across controllers.
I'm with #meagar here, this is asking for serious trouble. Presume your process will be killed after you finish making the web request, as that's something that happens under some Ruby on Rails process managers when they're pruning off excess instances.
Sharing data between controller instances should also be considered impossible unless you're persisting the data somehow: Database, session, cookies, or arguments via GET or POST.
In a typical system you'll have N Ruby processes on M machines, and the processes will be started and stopped arbitrarily, without warning, if they're not actively processing any requests. There's no way to reliably share data between these without some external IPC.
You probably want a background server process that these controllers can contact for any information they might need, or a process that can dump data into a database or a service like Redis where it can be picked up.
The way you could architect this is by pushing a job into a Redis queue, have another process that's watching that queue for work and pops the job and processes it. This is easily done with the BLPOP command in Redis, where your thread will block waiting for work, then immediately continue when there's something to do.

How to access a python object from a previous HTTP request?

I have some confusion about how to design an asynchronous part of a web app. My setup is simple; a visitor uploads a file, a bunch of computation is done on the file, and the results are returned. Right now I'm doing this all in one request. There is no user model and the file is not stored on disk.
I'd like to change it so that the results are delivered in two parts. The first part comes back with the request response because it's fast. The second part might be heavy computation and a lot of data, so I want it to load asynchronously, whenever it's done. What's a good way to do this?
Here are some things I do know about this. Usually, asynchronicity is done with ajax requests. The request will be to some route, let's say /results. In my controller, there'll be a method written to respond to /results. But this method will no longer have any information from the previous request, because HTTP is stateless. To get around this, people pass info through the request. I could either pass all the data through the request, or I could pass an id which the controller would use to look up the data somewhere else.
My data is a big python object (a pandas DataFrame). I don't want to pass it through the network. If I use an id, the controller will have to look it up somewhere. I'd rather not spin up a database just for these short durations, and I'd also rather not convert it out of python and write to disk. How else can I give the ajax request access to the python object across requests?
My only idea so far is to have the initial request trigger my framework to render a second route, /uuid/slow_results. This would be served until the ajax request hits it. I think this would work, but it feels pretty ad hoc and unnatural.
Is this a reasonable solution? Is there another method I don't know? Or should I bite the bullet and use one of the aforementioned solutions?
(I'm using the web framework Flask, though this question is probably framework agnostic.
PS: I'm trying to get better at writing SO questions, so let me know how to improve it.)
So if your app is only being served by one Python process, you could just have a global object that's a map from ids to DataFrames, but you'd also need some way of expiring them out of the map so you don't leak memory.
So if your app is running on multiple machines, you're screwed. If your app is just running on one machine, it might be sitting behind apache or something and then apache might spawn multiple Python processes and you'd still be screwed? I think you'd find out by doing ps aux and counting instances of python.
Serializing to a temporary file or database are fine choices in general, but if you don't like either in this case and don't want to set up e.g. Celery just for this one thing, then multiprocessing.connection is probably the tool for the job. Copying and lightly modifying from here, the box running your webserver (or another, if you want) would have another process that runs this:
from multiprocessing.connection import Listener
import traceback
RESULTS = dict()
def do_thing(data):
return "your stuff"
def worker_client(conn):
try:
while True:
msg = conn.recv()
if msg['type'] == 'answer': # request for calculated result
answer = RESULTS.get(msg['id'])
conn.send(answer)
if answer:
del RESULTS[msg['id']]
else:
conn.send("doing thing on {}".format(msg['id']))
RESULTS[msg['id']] = do_thing(msg)
except EOFError:
print('Connection closed')
def job_server(address, authkey):
serv = Listener(address, authkey=authkey)
while True:
try:
client = serv.accept()
worker_client(client)
except Exception:
traceback.print_exc()
if __name__ == '__main__':
job_server(('', 25000), authkey=b'Alex Altair')
and then your web app would include:
from multiprocessing.connection import Client
client = Client(('localhost', 25000), authkey=b'Alex Altair')
def respond(request):
client.send(request)
return client.recv()
Design could probably be improved but that's the basic idea.

Basic Sidekiq Questions about Idempotency and functions

I'm using Sidekiq to perform some heavy processing in the background. I looked online but couldn't find the answers to the following questions. I am using:
Class.delay.use_method(listing_id)
And then, inside the class, I have a
self.use_method(listing_id)
listing = Listing.find_by_id listing_id
UserMailer.send_mail(listing)
Class.call_example_function()
Two questions:
How do I make this function idempotent for the UserMailer sendmail? In other words, in case the delayed method runs twice, how do I make sure that it only sends the mail once? Would wrapping it in something like this work?
mail_sent = false
if !mail_sent
UserMailer.send_mail(listing)
mail_sent = true
end
I'm guessing not since the function is tried again and then mail_sent is set to false for the second run through. So how do I make it so that UserMailer is only run once.
Are functions called within the delayed async method also asynchronous? In other words, is Class.call_example_function() executed asynchronously (not part of the response / request cycle?) If not, should I use Class.delay.call_example_function()
Overall, just getting familiar with Sidekiq so any thoughts would be appreciated.
Thanks
I'm coming into this late, but having been around the loop and had this StackOverflow entry appearing prominently via Google, it needs clarification.
The issue of idempotency and the issue of unique jobs are not the same thing. The 'unique' gems look at the parameters of job at the point it is about to be processed. If they find that there was another job with the same parameters which had been submitted within some expiry time window then the job is not actually processed.
The gems are literally what they say they are; they consider whether an enqueued job is unique or not within a certain time window. They do not interfere with the retry mechanism. In the case of the O.P.'s question, the e-mail would still get sent twice if Class.call_example_function() threw an error thus causing a job retry, but the previous line of code had successfully sent the e-mail.
Aside: The sidekiq-unique-jobs gem mentioned in another answer has not been updated for Sidekiq 3 at the time of writing. An alternative is sidekiq-middleware which does much the same thing, but has been updated.
https://github.com/krasnoukhov/sidekiq-middleware
https://github.com/mhenrixon/sidekiq-unique-jobs (as previously mentioned)
There are numerous possible solutions to the O.P.'s email problem and the correct one is something that only the O.P. can assess in the context of their application and execution environment. One would be: If the e-mail is only going to be sent once ("Congratulations, you've signed up!") then a simple flag on the User model wrapped in a transaction should do the trick. Assuming a class User accessible as an association through the Listing via listing.user, and adding in a boolean flag mail_sent to the User model (with migration), then:
listing = Listing.find_by_id(listing_id)
unless listing.user.mail_sent?
User.transaction do
listing.user.mail_sent = true
listing.user.save!
UserMailer.send_mail(listing)
end
end
Class.call_example_function()
...so that if the user mailer throws an exception, the transaction is rolled back and the change to the user's flag setting is undone. If the "call_example_function" code throws an exception, then the job fails and will be retried later, but the user's "e-mail sent" flag was successfully saved on the first try so the e-mail won't be resent.
Regarding idempotency, you can use https://github.com/mhenrixon/sidekiq-unique-jobs gem:
All that is required is that you specifically set the sidekiq option
for unique to true like below:
sidekiq_options unique: true
For jobs scheduled in the future it is possible to set for how long
the job should be unique. The job will be unique for the number of
seconds configured or until the job has been completed.
*If you want the unique job to stick around even after it has been successfully processed then just set the unique_unlock_order to
anything except :before_yield or :after_yield (unique_unlock_order =
:never)
I'm not sure I understand the second part of the question - when you delay a method call, the whole method call is deferred to the sidekiq process. If by 'response / request cycle' you mean that you are running a web server, and you call delay from there, so all the calls within the use_method are called from the sidekiq process, and hence outside of that cycle. They are called synchronously relative to each other though...

Ruby Eventmachine queueing problem

I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.

Resources