Horizontal vs Vertical Scaling on Heroku NodeJS App - heroku

I have a single NodeJS web process that is a simple api server. This process hits memory limits before CPU limits. What is the proper way to scale this app on Heroku? Do I add more dynos or switch to a larger sized dyno?

Related

Heroku Hobby tier allows up to "10" process types but not allowed to scale horizontally?

In the detailed comparison chart, its says max number of process type is 10 (could have 10 dynos). It also says that hobby tier is not allowed horizontal scaling (meaning only 1 web dyno, no more web or worker dynos). From what I understand horizontal scaling means adding more web or/and worker processes.
It seems contradictory to me. I tried to go through its documentation, but still don't understand.
Simple Web Apps just serve webpages or API from the web process type.
Advanced Web Apps may have extra features, for e.g. a worker process (say, you've to send 1000 emails, so you put that task in a task queue, and the worker process handles it. So your main web process can respond quickly to the user), a scheduler process (to do cron stuff) etc.
In hobby tier on Heroku you can have 10 of these workers, however your main web process (which handles the webpages/API calls) will still be handled by a single, basic, dyno. A dyno is akin to 1 core of a processor (hardware). Now if you've a lot of load/visitors on your website, this might not be enough; you might want 4 dynos to be handling your web process alone. Well, that you can't do on the hobby tier.

Heroku worker dyno load with multiple dynos

I'm trying to understand the Dyno Load section of my metrics of my app. My particular app has five worker dynos. Given that information, if I see a Load Max or Load Avg of 2 or 2.5 then I should be ok, right? With this setup my goal would be to keep the load under five (1 for each dyno)? Is that the correct way to view this?
The load you see in Heroku Metrics is per dyno. Each dyno sends its own load, the max being the maximum value.
So expecting 5 to be a good value because you have 5 dynos isn't right.
You need to evaluate that value based on the type of dynos you have, as each of them will have more CPU shares and be able to handle more load.
Heroku recommends (here) keeping Free, Hobby and Standard dynos between 0.5 and 1.0 load.
Performance-M dynos can go to 3.0 or 4.0, and PL can go up to 16.0.
See also dyno sizes and their CPU share: https://devcenter.heroku.com/articles/dyno-types

Preserve Heroku session affinity when scaling up

According to the Heroku documentation when session affinity is turned on and number of Dynos (nodes) scales up the existing traffic is evenly distributed to the new Dynos.
This has affect that client of 'old' Dynos are assigned to the Dynos they have not communicated before. This is unavoidable when scaling down but is not be necessary when scaling up.
Is there a possiblity to prevent Heroku load balancer from assigning existing sessions to the new Dynos and rather keeping them with the original ones ?

What is the difference between four 1X dynos and two 2X dynos in heroku?

Does anyone know the answer for this? I spawned two 2X dynos on heroku and the performance with the free 1X dyno is much better than the two 2X dynos. They both have the same Rails app talking to the same database. Boggles my mind!
More dynos give you more concurrency. Based on a single threaded web server such as thin, if you have 4 x 1x dynos, you can serve four requests at the same time.
With 2 x 2x dynos you can only serve two requests.
2x dynos have more memory (1024MB) and more CPU available. This is useful if your application takes up a lot of memory. Most apps should not need 2x dynos.
Heroku have recently added PX dynos as well, which have significantly more power available.
You can read about the different dynos Heroku offers on their website.
The actual data changed since the accepted answer was posted.
The current difference is the memory (512MB for 1x and 1024MB for 2x), as well as the CPU share that is doubled for 2x.
More details can be found on the following Heroku dev center page: https://devcenter.heroku.com/articles/dyno-types

Difference Between Gunicorn Worker Processes and Heroku Worker Dynos

I'm hoping the community can clarify something for me, and that others can benefit.
My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).
This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.
I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.
I know that this is very much an oversimplification, but am I thinking about things correctly?
Some relevant info:
https://devcenter.heroku.com/articles/process-model
https://devcenter.heroku.com/articles/background-jobs-queueing
https://devcenter.heroku.com/articles/django#running-a-worker
http://gunicorn.org/configure.html#workers
http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/
Other Quasi-Related Helpful SO Questions for those researching this topic:
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
Performance degradation for Django with Gunicorn deployed into Heroku
Configuring gunicorn for Django on Heroku
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.
The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.
Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.
The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.

Resources