Is there a way to identify the heroku dyno name (e.g. web.1, web.2) from within the application? I'd like to be able to generate a unique request id (e.g. to track requests between web and worker dynos for consolidated logging of the entire request stack) and it seems to me that the dyno identifier would make a decent starting point.
If this can't be done, does anyone have a fallback recommendation?
Recently that issue has been addressed by Heroku team.
The Dyno Manager adds DYNO environment variables that holds identifier of your dyno e.g. web.1, web.2, foo.1 etc. However, the variable is still experimental and subject to change or removal.
I needed that value (actually instance index like 1, 2 etc) to initialize flake id generator at instance startup and this variable was working perfectly fine for me.
You can read more about the variables on Local environment variables.
I asked this question of Heroku support, and since there are others here who have asked similar questions to mine I figured I should share it. Heroku staff member JD replied with the following:
No, it's not possible to see this information from inside the dyno.
We've reviewed this feature request before and have chosen not to
implement it, as this would introduce a Heroku-specific variable which
we aim to avoid in our stack. As such, we don't have plans to
implement this feature.
You can generate / add to your environment a unique identifier (e.g. a
UUID) on dyno boot to accomplish a similar result, and you can
correlate this to your app's dynos by printing it to your logs at that
time. If you ever need to find it later, you can check your logs for
that line (of course, you'll need to drain your logs using Papertrail,
Loggly, etc, or to your own server).
Unfortunately for my scenario, a UUID is too long (if I wanted such a large piece of data, I would just use a UUID to track things in the first place). But it's still good to have an official answer.
Heroku has a $DYNO environment variable, however there are some big caveats attached to it:
"The $DYNO variable is experimental and subject to change or removal." So they may take it away at any point.
"$DYNO is not guaranteed to be unique within an app." This is the more problematic one, especially if you're looking to implement something like Snowflake IDs.
For the problem you're attempting to solve, the router request ID may be more appropriate. Heroku passes a unique ID to every web request via the X-Request-ID header. You can pass that to the worker and have both the web and worker instance log the request ID anytime they log information for a particular request/bit of work. That will allow you to correlate incidents in the logs.
This may not exactly answer the question, but you could have a different line in your Procfile for each worker process (using a ps:scale of 1 for each). You could then pass in the worker number as an environment variable from the Procfile.
Two lines from an example procfile might look like:
worker_1: env WORKER_NUMBER=1 node worker
worker_2: env WORKER_NUMBER=2 node worker
The foreman package which heroku local uses seems to have changed the ENV variable name again (heroku/7.54.0). You can now get the worker name via $FOREMAN_WORKER_NAME when running locally. It has the same value $DYNO will have when running on Heroku (web.1, web.2, etc)
The foreman gem still uses $PS, so to access the dyno name and have it work both on heroku and in development (when using foreman) you can check $PS first and then $DYNO. To handle the case of a local console, check for Rails.console
dyno_name = ENV['PS'] || ENV['DYNO'] || (defined?(Rails::Console) ? "console" : "")
It's dangerous to use the DYNO environment variable because its value is not guaranteed to be unique. That means you can have two dynos running at the same time that briefly have the same DYNO variable value. The safe way to do this is to enable dyno metadata and then use the HEROKU_DYNO_ID environment variable. That will better let you generate unique request ids. See: https://devcenter.heroku.com/articles/dyno-metadata
Related
For anyone who has used Heroku (and perhaps anyone else who has deployed to an PaaS before and has experience):
I'm confused on what Heroku means by "dynos", how dynos handle memory, and how users scale. I read that they define dynos as "app containers", which means that the memory/file system of dyno1 can't be accessed by dyno2. Makes sense in theory.
The containers used at Heroku are called “dynos.” Dynos are isolated, virtualized Linux containers that are designed to execute code based on a user-specified command. (https://www.heroku.com/dynos)
Also, users can define how many dynos, or "app containers", are instantiated, if i understand correctly, through commands like heroku ps:scale web=1, etc etc.
I recently created a webapp (a Flask/gunicorn app, if that even matters), where I declare a variable that keeps track of how many users visited a certain route (I know, not the best approach, but irrelevant anyways). In local testing, it appeared to be working properly (even for multiple clients)
When I deployed to Heroku, with only a single web dyno (heroku ps:scale web=1), I found this was not the case, and that the variable appeared to have multiple instances and updated differently. I understand that memory isn't shared between different dynos, but I have only one dyno which runs the server. So I thought that there should only be a single instance of this variable/web app? Is the dyno running my server on single/multiple processes? If so, how can I limit it?
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
Perhaps someone can enlighten me? I'm a beginner to deployment, but willing to learn/understand more!
Is the dyno running my server on single/multiple processes?
Yes, probably:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
The WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes’ Dyno size. This feature is intended to be a sane starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.
The solution isn't to limit your processes, but to fix your application. Global variables shouldn't be used to store data across processes. Instead, store data in a database or in-memory data store.
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
If you're just trying to check which dyno you're on, fine. But you probably don't want to be saving actual data to the dyno's filesystem because it is ephemeral. You'll lose all changes made to the filesystem whenever your dyno restarts. This happens frequently (at least once per day).
I am thinking of way to pass/set environment variable to a cloud foundry instance when we scale it horizontally. The use-case is, we have a producer app using rabbitMQ and a consumer App which is consuming from the queue. We want the consumer app to scale horizontally. For that we are planning to use "consistent-hash-exchange" of rabbitMQ.
The problem is, we are planning to have 3 queue bounded to the 3 consumer instance and we want to pass each queue name to each instance to that there is kind of one-to-one mapping between them.
1) Is there any way in which we can set environment properties at individual instance level ?
2) Is this the right approach?
Thanks in Advance,Sagar
1) Is there any way in which we can set environment properties at individual instance level ?
The cf cli will tell you that you need to restage an app for changes to environment variables to take effect. Normally, you can get away with a restart, unless you're changing an env variable that impacts a buildpack/what happens during staging.
I did a quick test and it seems like at least a restart is required. If I change an env variable and scale up my app, all instances have the original env value. After a restart, all instances have the new value. This is probably a good thing, as you don't really want to have different values for different app instances as that could get really confusing.
2) Is this the right approach?
Probably not for Cloud Foundry. I would suggest doing something where you have a queue name that's a pattern, like 'somequeue-. You can then pull the app instance number out of theVCAP_APPLICATION` environment variable and use that to get a unique queue name per application instance, like 'somequeue-0', 'somequeue-1', and 'somequeue-2'.
Otherwise, you'd need to have the app query the queue name from somewhere like a config server or database. That way it could dynamically load the queue name at start up.
Hope that helps!
We run a Laravel 5.1 API, which in some cases queues commands to provision server instances with various packages and settings. Each instance has a specific id, which is included in the queue item data. That instance id is in turn used as a prefix for Redis cache keys. For example: "instance-0:clients:list" for a list of clients running on instance id 0. The "cache.prefix" config setting is updated through a middleware with the current instance id.
Various create/update/delete endpoints "forget" cache keys, which are then rebuilt when list/show endpoints are called. Everything is fine and dandy up to this point when those actions occur through the API endpoints directly. It also works if I run the queue manually with "artisan queue:work".
BUT...
The regular queue is run as a daemon through supervisord. When the queue runs as a daemon, the cache prefix is never changed because (I'm guessing) it doesn't go through the middleware when it runs a given queue item. This also happens if I run the queue as a daemon manually (not through supervisord).
I've tried force-setting the value through \Config::set('cache.prefix', 'instance-X') as well as putenv('CACHE_PREFIX=instance-X') but they have no effect on the actual prefix used by the cache store itself. The only way I was able to set it successfully was to set CACHE_PREFIX in the ".env" file but it does not work with out architecture. We run API and worker instances in Docker containers and workers are not specific to any given API instance, hence the inclusion of the instance id in the queue item data for later use.
So, I'm kind of stuck as to how I can set the cache prefix on a per-queue item basis. Is that even possible? Any help would be greatly appreciated!
Try the method Cache::setPrefix('instance-X').
It will force a the cache prefix to change for the given request. It should work for you since I had a similar use case but was I needed it to manage my Cache. It may or may not work. I've not tested this with queues but since the cache prefix is shared by both session and queue drivers in Laravel it should work.
Just to be clear the method does not affect the config values. If you use config('cache.prefix') to get the cache prefix immediately after running the method, the value will still be that in your config file.
I have 5+ Applications running on my system which are using Sidekiq for background process. How to identify which sidekiq process belongs to which application.
I can't give you a "call this Sidekiq method" sort of answer, but I can give you an approach. Using the Sidekiq server middleware, you can create a Redis key (e.g. "Process_") and assign it the name of the app, then it's just a simple matter of looking up the value of the key to determine which app created it. If you want to go the opposite direction, create a key based on the app name (e.g. "application_") as a set and add the process id as a member. There are examples of server middleware use in the Sidekiq Wiki, and you can dig through the Sidekiq code and refer to the Redis documentation to determine how to set keys in Redis.
Hope this helps.
I have a django application in heroku and one thing I need to do sometimes that take a little bit of time is sending emails.
This is a typical use case of using workers. Heroku offers support for workers, but I have to leave them running all the time (or start and stop them manually), which is annoying.
I would like to use a one-off process to send every email. One possibility I first thought of was using IronWorker, since I thought that I could simply add the job to ironworker's queue and it would be exectuted with a mex of 15 min delay, which is ok for me.
The problem is that with ironworker, I need to put in a zip file all the modules and their dependencies in order to run the job, so in my email use case, as I use "EmailMultiAlternatives" from "django.core.mail.message", I would need to include all the django framework in my zip file in order to be able to use it.
According to this link, it's possible to add/remove workers from the app. Is it possible to start one-off processes from the app?
Does anyone has a better solution?
Thanks in advance