I'm new to heroku and wondering about their terminology.
I host a project that requires seeding to populate a database with tens of thousands of rows. To do this I employ a web dyno to extract information from APIs across the web.
As my dyno is running I get memory notifications saying that the dyno has exceeded memory requirements (specific heroku errors are R14 and R15).
I am not sure whether this merely means that my seeding process (web dyno) is running too fast and will be throttled, or whether my database itself is too large and must be reduced?
R14 and R15 errors are only thrown on their runtime dynos. For reference, Heroku Postgres databases do not run on dynos. If you're hitting R14/R15 errors it means that the seed data you're pulling down is likely exhausting your memory quota. You'll need to either decrease the size of the data or batch the data, write to Postgres and then clean up before proceeding.
Related
For anyone who has used Heroku (and perhaps anyone else who has deployed to an PaaS before and has experience):
I'm confused on what Heroku means by "dynos", how dynos handle memory, and how users scale. I read that they define dynos as "app containers", which means that the memory/file system of dyno1 can't be accessed by dyno2. Makes sense in theory.
The containers used at Heroku are called “dynos.” Dynos are isolated, virtualized Linux containers that are designed to execute code based on a user-specified command. (https://www.heroku.com/dynos)
Also, users can define how many dynos, or "app containers", are instantiated, if i understand correctly, through commands like heroku ps:scale web=1, etc etc.
I recently created a webapp (a Flask/gunicorn app, if that even matters), where I declare a variable that keeps track of how many users visited a certain route (I know, not the best approach, but irrelevant anyways). In local testing, it appeared to be working properly (even for multiple clients)
When I deployed to Heroku, with only a single web dyno (heroku ps:scale web=1), I found this was not the case, and that the variable appeared to have multiple instances and updated differently. I understand that memory isn't shared between different dynos, but I have only one dyno which runs the server. So I thought that there should only be a single instance of this variable/web app? Is the dyno running my server on single/multiple processes? If so, how can I limit it?
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
Perhaps someone can enlighten me? I'm a beginner to deployment, but willing to learn/understand more!
Is the dyno running my server on single/multiple processes?
Yes, probably:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
The WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes’ Dyno size. This feature is intended to be a sane starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.
The solution isn't to limit your processes, but to fix your application. Global variables shouldn't be used to store data across processes. Instead, store data in a database or in-memory data store.
Note, this web app does save files on disk, and through each API request, I check to see if the file does exist. Because it does, this tells me that I am requesting from the same dyno.
If you're just trying to check which dyno you're on, fine. But you probably don't want to be saving actual data to the dyno's filesystem because it is ephemeral. You'll lose all changes made to the filesystem whenever your dyno restarts. This happens frequently (at least once per day).
I'm running an RoR app no heroku which rapidly takes the available 512 Mb. I'm using puma (4.3.5) .
I've followe the tutorials here and the derailed benchmarks on local machine. The perf:mem_over_time and benchmarks on local never raise any issues. What is astounding is the fact that no matter what, the memory on local machine does not increase whereas when app is deployed on heroku, it steadily increases.
Any ideas on how to debug the problem on heroku? Running the derailed benchmarks is not possible on heroky since it complains that it cannot connect to postgres server ( User does not have CONNECT privilege.)
Ok, the problem seemed to be an obvious one : The number of workers on prod was set to 5. Each one take on average 80Mb, to start with, so just a minor increase in memory, triggere R14 not enough memory. I've reduced it to 2 workers and it' fine now.
I have a Heroku server with about $250.00 worth of monthly addons (due to upgrades Heroku Postgres and Heroku Redis). I'm no longer using the server for the foreseeable future, but would like to be able to boot the server back up at a later date with the same configuration.
Is there a way to temporarily halt all server functionality to prevent myself from getting billed, with the possibility of rebooting the server at a later date?
Well, you can step down the dynos to hobby-dev tier if you've less than 2 process types. Or you can simply shut them down. Just go to https://dashboard.heroku.com/, click on your app and then go to the 'resources' tab to control the dynos.
Stepping down heroku-redis should be easy too. It's anyway temporary storage, that you can restart/scale up later. Also see this
The only sticking point might be your Postgres DB. If it has more than 10,000 rows, you'll have to pay atleast $9 per month, and if you've more than 1Mn rows in the DB, you'll have to pay atleast $50 per month. Many times DBs collect a lot of logs data. You can consider cleaning and compacting the data if that's possible. Or you can take a local Database dump and decommission the DB and when you decide to start the app again upload the DB (this is a bit of an extreme step though, so be doubly sure that you've everything backup up.)
Trying to get Heroku to run some Puppeteer jobs. Locally, it works. It's slow but it works. Monitoring the memory in OS X Activity Monitor, it doesn't get above 50MB. But when I deploy this script to Heroku, I'm getting a Memory quota exceeded every time, and the memory footprint it much larger.
Looking at the logs, I'm getting the message:
Process running mem=561M(106.5%) .
Error R14 (Memory quota exceeded) .
Restarting .
State changed from up to starting
Either Activity Monitor is not reporting the memory correctly, or something is going wrong only when running the script on Heroku. I can't imagine why a page scrape of 25 pages would be 561M.
Also since the Puppeteer scripts must be contained in try/catch—the memory error is crashing the Dyno and restarting. By the time the Dyno restarts, the browser hangs up. So the restarting does little good. Is there a way to catch 'most' errors on Heroku but throw when there is a memory R14 error?
I had a similar issue. What I discovered is that if you are not closing the browser you will get immediately an R14 error. What I recommend:
Make sure you use a single browser instance and multiple contexts instead of multiple browsers.
Make sure you close the contexts after you call pdf
If you are processing large pages you need to scale your heroku instance, you don't have a choice. Unfortunately, you need to pay 50$ for 1GB of memory on heroku...
Some ugly code but it points the fact you context is closed after calling pdf function.
browser.createIncognitoBrowserContext().then((context)=>{
context.newPage().then((page)=>{
page.setContent(html).then(()=>{
page.pdf(options).then((pdf)=>{
let inputStream = bufferToStream(pdf);
let outputStream = fs.createWriteStream(path);
inputStream.pipe(outputStream).on("finish", () => {
context.close().then(()=>{
resolve();
}).catch(reject);
});
});
}).catch(reject)
}).catch(reject)
}).catch(reject);
I have a web app on heroku which all the time is using around 300% of the allowed RAM (512 MB). I see my logs full of Error R14 (Memory quota exceeded) [an entry every second]. Although in bad condition, my app still works.
Apart from degraded performance, are there any other consequences also which I should be aware of ( like heroku be charging extra for anything related to this issue, scheduled jobs might fail etc) ?
To the best of my knowledge Heroku will not take action even if you continue to exceed the memory requirements. However, I don't think the availability of the full 1 GB of overage (out of the 1.5 GB that you are consuming) is guaranteed, or is guaranteed to be physical memory at all times. Also, if you are running close to 1.5 GB, then you risk going over the hard 1.5 GB limit at which point your dyno will be terminated.
I also get the following every time I run a specific task on my Heroku app and check heroku logs --tail:
Process running mem=626M(121.6%)
Error R14 (Memory quota exceeded)
My solution would be to check out Celery and Heroku's documentation on this.
Celery is an open source asynchronous task queue, or job queue, which makes it very easy to offload work out of the synchronous request lifecycle of a web app onto a pool of task workers to perform jobs asynchronously.