Heroku Error R14 (Memory quota exceeded) consequences? - heroku

I have a web app on heroku which all the time is using around 300% of the allowed RAM (512 MB). I see my logs full of Error R14 (Memory quota exceeded) [an entry every second]. Although in bad condition, my app still works.
Apart from degraded performance, are there any other consequences also which I should be aware of ( like heroku be charging extra for anything related to this issue, scheduled jobs might fail etc) ?

To the best of my knowledge Heroku will not take action even if you continue to exceed the memory requirements. However, I don't think the availability of the full 1 GB of overage (out of the 1.5 GB that you are consuming) is guaranteed, or is guaranteed to be physical memory at all times. Also, if you are running close to 1.5 GB, then you risk going over the hard 1.5 GB limit at which point your dyno will be terminated.

I also get the following every time I run a specific task on my Heroku app and check heroku logs --tail:
Process running mem=626M(121.6%)
Error R14 (Memory quota exceeded)
My solution would be to check out Celery and Heroku's documentation on this.
Celery is an open source asynchronous task queue, or job queue, which makes it very easy to offload work out of the synchronous request lifecycle of a web app onto a pool of task workers to perform jobs asynchronously.

Related

Puppeteer on Heroku, memory errors

Trying to get Heroku to run some Puppeteer jobs. Locally, it works. It's slow but it works. Monitoring the memory in OS X Activity Monitor, it doesn't get above 50MB. But when I deploy this script to Heroku, I'm getting a Memory quota exceeded every time, and the memory footprint it much larger.
Looking at the logs, I'm getting the message:
Process running mem=561M(106.5%) .
Error R14 (Memory quota exceeded) .
Restarting .
State changed from up to starting
Either Activity Monitor is not reporting the memory correctly, or something is going wrong only when running the script on Heroku. I can't imagine why a page scrape of 25 pages would be 561M.
Also since the Puppeteer scripts must be contained in try/catch—the memory error is crashing the Dyno and restarting. By the time the Dyno restarts, the browser hangs up. So the restarting does little good. Is there a way to catch 'most' errors on Heroku but throw when there is a memory R14 error?
I had a similar issue. What I discovered is that if you are not closing the browser you will get immediately an R14 error. What I recommend:
Make sure you use a single browser instance and multiple contexts instead of multiple browsers.
Make sure you close the contexts after you call pdf
If you are processing large pages you need to scale your heroku instance, you don't have a choice. Unfortunately, you need to pay 50$ for 1GB of memory on heroku...
Some ugly code but it points the fact you context is closed after calling pdf function.
browser.createIncognitoBrowserContext().then((context)=>{
context.newPage().then((page)=>{
page.setContent(html).then(()=>{
page.pdf(options).then((pdf)=>{
let inputStream = bufferToStream(pdf);
let outputStream = fs.createWriteStream(path);
inputStream.pipe(outputStream).on("finish", () => {
context.close().then(()=>{
resolve();
}).catch(reject);
});
});
}).catch(reject)
}).catch(reject)
}).catch(reject);

Heroku Dyno Memory versus Database Memory

I'm new to heroku and wondering about their terminology.
I host a project that requires seeding to populate a database with tens of thousands of rows. To do this I employ a web dyno to extract information from APIs across the web.
As my dyno is running I get memory notifications saying that the dyno has exceeded memory requirements (specific heroku errors are R14 and R15).
I am not sure whether this merely means that my seeding process (web dyno) is running too fast and will be throttled, or whether my database itself is too large and must be reduced?
R14 and R15 errors are only thrown on their runtime dynos. For reference, Heroku Postgres databases do not run on dynos. If you're hitting R14/R15 errors it means that the seed data you're pulling down is likely exhausting your memory quota. You'll need to either decrease the size of the data or batch the data, write to Postgres and then clean up before proceeding.

Azure Website Kudu HTMLLog Analysis shows Always On with high response time

We deployed our WebAPI as an azure website under the standard plan and have turned on Always On. After getting multiple memory and CPU alerts we decided on checking the logs via xyz.scm.azurewebsites.net. It seems Always ON has a high response time. Could this be causing high memory and CPU issues. Sometimes the alerts come when none is even using the system and auto resolve within 5 mins.
The always on feature only invokes the root of your web app every 5 minutes.
If this is causing high memory or cpu it could be a memory leak within your application because if you don't use the always on feature your process gets recycled on idle.
You should check what your app does if you invoke it with the root path and determine why this is causing high response time.

Laravel Queue Daemon - Memory leak

I have a Laravel app and have several Daemon Queue Workers running on a server consuming messages from SQS. I know memory is a concern here but I'm having trouble figuring out how to free up memory during / between each job. if I leave the workers running long enough, the memory usage continues to build up.
Any suggestions on how to handle this?

Azure in role cache exceptions when service scales

I am using Windows Azure SDK 2.2 and have created an Azure cloud service that uses an in-role cache.
I have 2 instances of the service running under normal conditions.
When the services scales (up to 3 instances, or back down to 2 instances), I get lots of DataCacheExceptions. These are often accompanied by Azure db connection failures from the process going in inside the cache. (If I don't find the entry I want in the cache, I get it from the db and put it into the cache. All standard stuff.)
I have implemented retry processes on the cache gets and puts, and use the ReliableSqlConnection object with a retry process for db connection using the Transient Fault Handling application block.
The retry process uses a fixed interval retrying every second for 5 tries.
The failures are typically;
Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later
Any idea why the scaling might cause these exceptions?
Should I try a less aggressive retry policy?
Any help appreciated.
I have also noticed that I am getting a high percentage (> 70%) cache miss rate and when the system is struggling, there is high cpu utilisation (> 80%).
Well, I haven't been able to find out any reason for the errors I am seeing, but I have 'fixed' the problem, sort of!
When looking at the last few days processing stats, it is clear the high cpu usage corresponds with the cloud service having 'problems'. I have changed the service to use two medium instances instead of two small instances.
This seems to have solved the problem, and the service has been running quite happily, low cpu usage, low memory usage, no exceptions.
So, whilst still not discovering what the source of the problems were, I seem to have overcome them by providing a bigger environment for the service to run in.
--Late news!!! I noticed this morning that from about 06:30, the cpu usage started to climb, along with the time taken for the service to process as it should. Errors started appearing and I had to restart the service at 10:30 to get things back to 'normal'. Also, when restarting the service, the OnRoleRun process threw loads of DataCacheExceptions before it started running again, 45 minutes later.
Now all seems well again, and I will monitor for the next hours/days...
There seems to be no explanation for this, remote desktop to the instances show no exceptions in the event log, other logging is not showing application problems, so I am still stumped.

Resources