How exactly do dynos/memory/processes work? - heroku

For anyone who has used Heroku (and perhaps anyone else who has deployed to an PaaS before and has experience):
I'm confused on what Heroku means by "dynos", how dynos handle memory, and how users scale. I read that they define dynos as "app containers", which means that the memory/file system of dyno1 can't be accessed by dyno2. Makes sense in theory.
The containers used at Heroku are called “dynos.” Dynos are isolated, virtualized Linux containers that are designed to execute code based on a user-specified command. (https://www.heroku.com/dynos)
Also, users can define how many dynos, or "app containers", are instantiated, if i understand correctly, through commands like heroku ps:scale web=1, etc etc.
I recently created a webapp (a Flask/gunicorn app, if that even matters), where I declare a variable that keeps track of how many users visited a certain route (I know, not the best approach, but irrelevant anyways). In local testing, it appeared to be working properly (even for multiple clients)
When I deployed to Heroku, with only a single web dyno (heroku ps:scale web=1), I found this was not the case, and that the variable appeared to have multiple instances and updated differently. I understand that memory isn't shared between different dynos, but I have only one dyno which runs the server. So I thought that there should only be a single instance of this variable/web app? Is the dyno running my server on single/multiple processes? If so, how can I limit it?
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
Perhaps someone can enlighten me? I'm a beginner to deployment, but willing to learn/understand more!

Is the dyno running my server on single/multiple processes?
Yes, probably:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
The WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes’ Dyno size. This feature is intended to be a sane starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.
The solution isn't to limit your processes, but to fix your application. Global variables shouldn't be used to store data across processes. Instead, store data in a database or in-memory data store.
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
If you're just trying to check which dyno you're on, fine. But you probably don't want to be saving actual data to the dyno's filesystem because it is ephemeral. You'll lose all changes made to the filesystem whenever your dyno restarts. This happens frequently (at least once per day).

Related

Stop Heroku Dyno from cycling

I have a Hobby Dyno that hosts an application in Heroku in which users can upload images.
What I've noticed is the Dyno restarts during his cycle causing all images to be gone.
2018-07-27T16:23:09.914767+00:00 heroku[web.1]: Cycling
2018-07-27T16:23:09.915421+00:00 heroku[web.1]: State changed from up to starting
I am aware of solutions that involve a third-party storage or host the app in another platform altogheter.
I am wondering if there is a way to stop a dyno cycle and never make it restart such as is always in the up state?
Thank you.
There is no way to prevent dyno cycling. Heroku does this automatically at least once per day.
Heroku's entire design is based on The 12-Factor App, which states that your app's processes are disposable. Heroku accomplishes this (in part) with its ephemeral file system, which is why you must persist files using an external service.

What is the difference between Process Types and Dynos in Heroku

I subscribed a Hobby plan in Heroku.
The details of the plan specifies that it allows up to 10 Process Types.
So I developed an app with the following Procfile:
backend-dev: node ./backend-dev/backend.js
backend-prod: node ./backend-prod/backend.js
Which represents 2 Process Types, right ?
But when I run it with:
heroku ps:scale backend-dev=1
heroku ps:scale backend-prod=1
I end up with two Hobby Dynos...
As the plan also specifies 7€/month/Dyno I am billed 14€/month.
So my questions are:
What is the difference between Process Types and Dynos?
Can I run 2 Process Types within a single Dyno?
Can I run for instance 1 free Dyno (for backend-dev) and 1 Hobby Dyno (for backend-prod)?
Consider this simple example of web application with background worker, so it has web process and worker process. When such app receives a lot of web traffic, but processes very few background jobs, you can increase the number of dynos for your web process, but have only one dyno for worker process. It is also possible to have different dyno size per process. Instead of using more dynos, you can use performance-l dyno for web process and standard-1x for worker process. In other words, Process Types describe different processes that are working together within one application. They are not supposed to be different applications like in your case.
No. You can run one Process Type on multiple dynos.
Technically you can run one process on free dyno and another on hobby, but it won't work in your case. When you upgrade to professional dynos, then all processes must run on professional dynos.
Your Procfile is all wrong. You must have Process Type name web to receive web traffic. If you start your current setup, you will be running two processes, but they will never receive any web requests. It is described in Heroku docs, only web process can receive web traffic and you can only have one such process. So to run two versions of your app, you need to create two different Heroku applications. And ideally you should allow to configure your app via environmental variables so you can deploy the same code to both apps.

Does running on mulitple web dynos mean I have multiple server instances?

I'm a little bit confused about the whole dyno business with heroku . On their site, they define a dyno as:
A dyno is a lightweight Linux container that runs a single
user-specified command. A dyno can run any command available in its
default environment (what we supply in the Cedar stack) or in your
app’s slug (a compressed and pre-packaged copy of your application and
its dependencies).
This (somewhat?) makes sense in my head, but when I think about dynos in the context of multiple dynos running for one web app, my brain gets twisted.
Lets I'm building a web app with a server which runs a very important task only once every 3 hours for all users. If I am running multiple web dynos for this site, does that mean there is a separate server instance running on each dyno? And would that very-important-task that is run every 3 hours be run on each dyno every 3 hours?
Thanks a lot for any clarification!
Each dyno is an LXC container running in one of the heroku instances.
Depending of your dyno size, there might be other containers on the same instance or not.
But the rough idea is that yes, each dyno you have running is a different instance, and a task set to run on any web dyno each 3 hours will run on all of them.
You may want to look into the heroku scheduler addon. It will run a one-off dyno at a specified interval, allowing you to run cron-like tasks.

Difference Between Gunicorn Worker Processes and Heroku Worker Dynos

I'm hoping the community can clarify something for me, and that others can benefit.
My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).
This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.
I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.
I know that this is very much an oversimplification, but am I thinking about things correctly?
Some relevant info:
https://devcenter.heroku.com/articles/process-model
https://devcenter.heroku.com/articles/background-jobs-queueing
https://devcenter.heroku.com/articles/django#running-a-worker
http://gunicorn.org/configure.html#workers
http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/
Other Quasi-Related Helpful SO Questions for those researching this topic:
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
Performance degradation for Django with Gunicorn deployed into Heroku
Configuring gunicorn for Django on Heroku
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.
The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.
Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.
The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.

How to identify a heroku dyno number from within the app?

Is there a way to identify the heroku dyno name (e.g. web.1, web.2) from within the application? I'd like to be able to generate a unique request id (e.g. to track requests between web and worker dynos for consolidated logging of the entire request stack) and it seems to me that the dyno identifier would make a decent starting point.
If this can't be done, does anyone have a fallback recommendation?
Recently that issue has been addressed by Heroku team.
The Dyno Manager adds DYNO environment variables that holds identifier of your dyno e.g. web.1, web.2, foo.1 etc. However, the variable is still experimental and subject to change or removal.
I needed that value (actually instance index like 1, 2 etc) to initialize flake id generator at instance startup and this variable was working perfectly fine for me.
You can read more about the variables on Local environment variables.
I asked this question of Heroku support, and since there are others here who have asked similar questions to mine I figured I should share it. Heroku staff member JD replied with the following:
No, it's not possible to see this information from inside the dyno.
We've reviewed this feature request before and have chosen not to
implement it, as this would introduce a Heroku-specific variable which
we aim to avoid in our stack. As such, we don't have plans to
implement this feature.
You can generate / add to your environment a unique identifier (e.g. a
UUID) on dyno boot to accomplish a similar result, and you can
correlate this to your app's dynos by printing it to your logs at that
time. If you ever need to find it later, you can check your logs for
that line (of course, you'll need to drain your logs using Papertrail,
Loggly, etc, or to your own server).
Unfortunately for my scenario, a UUID is too long (if I wanted such a large piece of data, I would just use a UUID to track things in the first place). But it's still good to have an official answer.
Heroku has a $DYNO environment variable, however there are some big caveats attached to it:
"The $DYNO variable is experimental and subject to change or removal." So they may take it away at any point.
"$DYNO is not guaranteed to be unique within an app." This is the more problematic one, especially if you're looking to implement something like Snowflake IDs.
For the problem you're attempting to solve, the router request ID may be more appropriate. Heroku passes a unique ID to every web request via the X-Request-ID header. You can pass that to the worker and have both the web and worker instance log the request ID anytime they log information for a particular request/bit of work. That will allow you to correlate incidents in the logs.
This may not exactly answer the question, but you could have a different line in your Procfile for each worker process (using a ps:scale of 1 for each). You could then pass in the worker number as an environment variable from the Procfile.
Two lines from an example procfile might look like:
worker_1: env WORKER_NUMBER=1 node worker
worker_2: env WORKER_NUMBER=2 node worker
The foreman package which heroku local uses seems to have changed the ENV variable name again (heroku/7.54.0). You can now get the worker name via $FOREMAN_WORKER_NAME when running locally. It has the same value $DYNO will have when running on Heroku (web.1, web.2, etc)
The foreman gem still uses $PS, so to access the dyno name and have it work both on heroku and in development (when using foreman) you can check $PS first and then $DYNO. To handle the case of a local console, check for Rails.console
dyno_name = ENV['PS'] || ENV['DYNO'] || (defined?(Rails::Console) ? "console" : "")
It's dangerous to use the DYNO environment variable because its value is not guaranteed to be unique. That means you can have two dynos running at the same time that briefly have the same DYNO variable value. The safe way to do this is to enable dyno metadata and then use the HEROKU_DYNO_ID environment variable. That will better let you generate unique request ids. See: https://devcenter.heroku.com/articles/dyno-metadata

Resources