Eventmachine memory management - ruby

I'm running an eventmachine process on heroku, and it seems to be hitting their memory limit of 512MB after an hour or so. I start seeing messages like this:
Error R14 (Memory quota exceeded)
Process running mem=531M(103.8%)
I'm running a lot of events through the reactor, so I'm thinking maybe the reactor is getting backed up (I'm imagining it as a big queue)? But there could be some other reason, I'm still fairly new to eventmachine.
Are there any good ways to profile eventmachine and and get some stats on it. As a simple example, I was hoping to see how many events were scheduled in the queue to see if it was getting backed up and keeping those all in memory. But if anyone has other suggestions I'd really appreciate it.
Thanks!

I use eventmachine extensively and never ran into any memory leak inside the reactor so your bet is that the ruby code is but without knowing more about your application it is hard to give you a real answer.
The only queue I can think of right now is the thread pool, each time you use the defer method the block is either given to a free thread or queued waiting for a free thread, I suppose if all your threads are blocking waiting for something the queue could grow and use all the memory available.

The leak turned out to be in Mongoid's identity_map (nothing to do with EventMachine). Setting Mongoid.identity_map_enabled = false at the beginning of the event machine process resolved it.

Related

Unbounded memory growth after Rack::Timeout in a puma Rails app

We're using Ruby 2.3.3 with a Rails 5.0 app, running on puma 3.8.2, and rack-timeout 0.4.2.
We get semi frequent Rack::Timeout exceptions that have no ill effect. Occasionally though, something bad will happen that causes a slew of timeouts, and after that point, the puma worker process seems to just keep generating more and more live objects. GC appears to still be running - at least, NewRelic is still reporting time spent in GC - but it doesn't seem to be having much effect:
The worker process keeps serving requests, but gets slower & slower as the heap size goes up, before eventually dying with out-of-memory errors.
I know this is quite a vague question, but I wondered if anyone had run into anything similar. Any tips for diagnosing it further?

Ruby Threading Performance: Slow Thread.pass

We're running a threaded ruby server (Puma), and have seen serious performance issues with our Sinatra app. Specifically, something as simple as Thread.pass can take over 2s. How is it possible that a server with 16 threads can take over 2s to return control to a thread? Is the Ruby scheduler that bad, or is there something we can do to fix this?
Details:
Ruby implementation: MRI 2.1
Sinatra App
Running on Heroku 1x dynos
Puma server, running 16 threads, 1 process
Some routes are doing fairly heavy work, but routes doing almost no work are impacted
Over 100MB in free memory
Thanks in advance!
The time that Thread.pass takes is a non-specified value, it may take 10s or it might not pass at all (i.e. continue execution immediately).
Thread.pass is more of a hint or a suggestion.
Long story short: it's the heroku virtual machine.
Sometimes your whole virtual machine pauses, so the program (in whatever language) just stops responding for a few seconds. Running on dedicated boxes 100% resolved this issue. Heroku 1x/2x dynos don't really seem reliable for applications where multi-seconds pauses are unacceptable. I get that sharing resources is needed, but completely pausing the world for multiple seconds is too much. Seems like their scheduling could use some work.

BOOST ASIO using

I wanted to know how I can I make the io do something like a thread.join() wait for all tasks to finish.
io_type->post( strand->wrap(boost::bind &somemethod,ptr,parameter)));
In the above code if 4 threads were initially launched this would give work to the next available thread. However I want to know how I could actually wait for all the threads to finish work. Like we do with threads.join().
If this really needs to be done, then you could setup a mutex or critical section to stop your io handlers from processing messages off of the socket. This would need to be activated from another thread. But, more importantly...
Perhaps you should rethink your design. The problem with having the io wait for other threads to finish is that the io would then be unresponsive. In general, not a good idea. I suspect that most developers working on networking software would not even consider it. If you are receiving messages that are not ready to be processed yet due to other processing that is going on, then consider storing them in a queue and process them on a different thread when the other threads have signaled that they have completed their work.

Heroku, apparent silent failure of sucker_punch

My app runs on Heroku with unicorn and uses sucker_punch to send a small quantity of emails in the background without slowing the web UI. This has been working pretty well for a few weeks.
I changed the unicorn config to the Heroku recommended config. The recommended config
includes an option for the number of unicorn processes and I upped the number of processes from 2 to 3.
Apparently that was too much. The sucker_punch jobs stopped running. I have log messages that indicate when they are queued and I have messages that indicate when they start processing. The log shows them being queued but the processing never starts.
My theory is that I exceeded memory by going from 2 to 3 unicorns.
I did not find a message anywhere indicating a problem.
Q1: should I expect to find a failure messsage somewhere? Something like "attempting to start sucker_punch -- oops, not enough memory"?
Q2: Any suggestions on how I can be notified of a failure like this in the future.
Thanks.
If you are indeed exceeding dyno memory, you should find R14 or R15 errors in your logs. See https://devcenter.heroku.com/articles/error-codes#r14-memory-quota-exceeded
A more likely problem, though, given that you haven't found these errors, is that something within the perform method of your sucker punch worker is throwing an exception. I've found sucker punch tasks to be a pain to debug because it appears the lib swallows all exceptions silently. Try instantiating your task and calling perform on it from a rails console to make sure that it behaves as you expect.
For example, you should be able to do this without causing an exception:
task = YourTask.new
task.perform :something, 55

Twisted process is huge

A Twisted app I have was constantly getting killed due to memory problems. The program grew in size, consuming all of the system's memory before being shut down by the os. Restart and repeat.
This is on a virtual server, so I doubled the memory, and the issue resolved - the daemon stabilized at around 1.25GB of memory
Does anyone have advice on how I can best profile this to tell what/where all the memory is getting sucked up into ?
If info on the app helps, I'm using the twisted reactor and internet.timer.TimerService to poll a database for items to update through three 'services'. the items to process are pushed into a twisted.internet.defer.DeferredList , and their processing occurs in a deferToThread block. In the deferred process there are a handful of blocking operations ( fetching web pages, etc ) and a lot of HTML parsing ( beautiful soup and other libraries ). I've suggested the reactor.threadpool size to be 10 and each 'service' defers to thread using a SemaphoreService that has 10 tokens. I really expected this daemon to max out at around 400MB of memory, not 3x that.
This is more of a generic share of thoughts how I debug memory leak/usage problems in my twisted applications.
Twisted has a ssh server support, and is something which I add in to almost all of my projects in development.
The ssh provides a interactive python interpreter access to the method which has python garbage collector available and a number of helper functions which allow me to a) inspect count of the instances from a same class, b) start and stop inspection of changes of that count over time and c) to get all references of that class. The nice thing with the interactive interpreter is that it allows ad-hoc introspection of offending instances, their relation to other objects and the state of process they are in. This so far has always proven a valuable instrument to pinpoint exact location where I have forgot / unforseen the ref release problems in my projects.

Resources