If I have a server with 1 core, how many puma workers, threads and what database pool size is appropriate?
What's the general thumb here?
Not an easy answer.
The two main sources of information are:
Puma github repository (the authors' point of view)
Heroku's web page (the main big user's point of view)
Unfortunately they are inconsistent mostly because heroku has different deployment metrics and terminology.
So I ended up following the puma repository guidelines which says:
One worker per core
Threads to be determined in connection with RAM availability and application and
Threads = Connection Pool
So the number of threads is mostly a try and check operation.
Related
I recently enabled the beta Ruby Language Metrics in Heroku (docs)
I have 2 Performance-M web dynos running. The Puma pool metrics are below.
The usage seems unexpectedly low to me over the first 2 hours of this enabled. Am I missing something, or are these numbers somewhat expected?
Puma Pool Usage Metrics screenshot
So did I recently. I believe what you are seeing is simply that there are no data for that time frame. This view corresponds to what I see before the deploy that activated the metrics.
I am not able to understand the values in WAS PMI for ConnectionPoolModule.
In one application I am monitoring, I am getting perf metrics for "Allocate Count", and in other I am getting perf metrics for "Create Count"
In the case of the AllocateCount - i can see that this value keeps increasing over time, and not sure what the effects of this state is.
What are the differences between these count types?
What should I be looking for to review connection pools?
Why are these metrics not showing up at the same time?
Should I be bothered about the increase in AllocateCount, or should I match it with other metrics to review the application state?
Thanks!
With these metrics, an allocate is an application request for a connection, e.g. a DataSource.getConnection(). The WebSphere pool manager either satisfies the request with an already-pooled connection, or creates a new one, and in the latter case the create count gets incremented. So if your allocate and create counts were the same, you'd be doing no pooling, probably a bad thing!
But that's not necessarily the best thing to monitor. Things like the average wait time could be the best starting point.
Let me refer you to some other links to help you monitor WebSphere JDBC connection pool data:
JDBC chapter in WebSphere Application Server performance cookbook
WebSphere Application Server Performance Tuning Toolkit with video
Older but still relevant, some slides specifically detailing some connection pool monitoring techniques.
I have a Ruby on Rails application deployed with Phusion passenger + Apache web server. Does each request runs in its own thread spawned by Phusion Passenger?
Passenger (along with most other application servers) runs no more than one request per thread. Typically there is also only one thread per process. From the Phusion Passenger docs:
Phusion Passenger supports two concurrency models:
process: single-threaded, multi-processed I/O concurrency. Each application process only has a single thread and can only handle 1 request at a time. This is the concurrency model that Ruby applications traditionally used. It has excellent compatibility (can work with applications that are not designed to be thread-safe) but is unsuitable workloads in which the application has to wait for a lot of external I/O (e.g. HTTP API calls), and uses more memory because each process has a large memory overhead.
thread: multi-threaded, multi-processed I/O concurrency. Each application process has multiple threads (customizable via PassengerThreadCount). This model provides much better I/O concurrency and uses less memory because threads share memory with each other within the same process. However, using this model may cause compatibility problems if the application is not designed to be thread-safe.
(Emphasis my own)
Passenger open source edition only uses one thread per application, as listed in your apache virtual hosts files (not sure about nginx). So you could conceivably have multiple instances of your app running on the same apache server, but you would have to install your app into multiple directories and point vhosts entries at them, and put some kind of load-balancer in front of it.
Passenger enterprise enables much more control over concurrency.
EDIT: clarity.
I'm hoping the community can clarify something for me, and that others can benefit.
My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).
This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.
I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.
I know that this is very much an oversimplification, but am I thinking about things correctly?
Some relevant info:
https://devcenter.heroku.com/articles/process-model
https://devcenter.heroku.com/articles/background-jobs-queueing
https://devcenter.heroku.com/articles/django#running-a-worker
http://gunicorn.org/configure.html#workers
http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/
Other Quasi-Related Helpful SO Questions for those researching this topic:
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
Performance degradation for Django with Gunicorn deployed into Heroku
Configuring gunicorn for Django on Heroku
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.
The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.
Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.
The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.
I'm running compojure on Heroku. They have a limit of a 100 threads per process. So when I go over that limit, I get: java.lang.OutOfMemoryError: unable to create new native thread.
Compojure is using the jetty ring adapter. Is there away of configuring the the server to only accept a hundred threads to the servlet at a time?
The solution comes from Chris Perkins over at the compojure google group.
(run-jetty app {:configurator #(.setThreadPool % (QueuedThreadPool. 5))})
This initializes a QueuedThreadPool (with a concurrent limit of five threads) to the jetty instance, before it starts.