"Too Many Open Files" error on Cloudfoundry - spring

I receive the dreaded "Too Many Open Files" error after about 5 minutes running my application. This is a showstopper for me. I do know there is a 256 open file limit. I ran lsof to track down if I have a leak. I found that many of the open handles are simply connections Tomcat and other processes must make. The "nginx" process seems to be the only one that fluctuates but it still only goes to a maximum around 81. My application does not seem to leak file descriptors. I absolutely love Cloud Foundry. It is the first PAAS that hasn't required I refactor my application to make things work. When is the file limit going to be raised? I use Micro Cloud Foundry for testing but I want to run on the Hosted Cloud Foundry as soon as possible. I get this error in both versions. Is there a way around this? I tried modifying the limit on the Micro Cloud instance but I get errors saying I do not have the rights to make that kind of change. Any help or suggestions on this?

The new file descriptor quota is in the following database migration:
https://github.com/cloudfoundry/cloud_controller_ng/blob/master/db/migrations/20130131184954_new_initial_schema.rb
Line 185.
This particular setting will not take effect on our http://cloudfoundry.com until the April time frame when we emerge from our beta status and our "Next Generation" components are in production.
If you run your own version of Cloud Foundry, you could run this migration, assuming you are using cloud_controller_ng.
The Micro Cloud Foundry we are using internally for development does have the new cloud controller in it. You can read how we get that running for our own purposes here:
We are in a bit of a transition period as we deprecate the legacy bits and move towards these NG components. Apologies for the hitches, but they will be worth the cost. Thanks for your patience.
Best,
Matt Reider
Product Manager
Cloud Foundry

I logged in as the root user. I modified both the /var/vcap/packages/dea/dea/lib/dea/agent.rb file and the /etc/security/limits.conf file.
For the agent.rb file I followed the instructions here:
http://mdahlman.wordpress.com/2012/04/20/micro_cloud_foundry/
For the limits.conf file I followed the instructions here:
http://myadventuresincoding.wordpress.com/2010/10/09/ubuntu-increasing-the-maximum-number-of-open-files/
I am not sure which fix helped or if it required modifying both files. The application seems to be working now so I am moving on. If someone has a better solution I would be happy to hear it.
This only allows me to run my application on a self hosted micro cloud foundry. It would be much better if someone had a solution for the hosted version. Unfortunately I would have to find some way to limit my app to use under 256 file descriptors and that is not likely to happen.

Related

Why simple Spring cloud Run is taking so much RAM

When a simple (just sample) Spring Boot Application in Cloud Run, there is no files being written in the application, gets terminated with the following error.
Memory limit of 256M exceeded with 257M used. Consider increasing the
memory limit, see
https://cloud.google.com/run/docs/configuring/memory-limits
When I look at Deleting temporary files, it says that the disk storage is in-memory, so if the code is not writing any files, then how these files are being written. How to find these files using
gsutil ls -h gs://projectName
An interesting question.
Are you confident that your app isn't writing any files? I bet it is.
There's a wrinkle to Google's statement. Anything written to /var/log is not part of the in-memory filesystem|quota and is shipped to Cloud Logging.
I think Google should consider providing metrics that help differentiate between the container's process(es)' use of memory and the in-memory filesystem's use. Currently there is no way to disambiguate this usage (Cloud Monitoring metrics for Cloud Run). Perhaps raise a feature request on Google's public issue tracker?
To answer you question, you may want to consider running the container locally. Then you can grab the container's (process') process id (PID) and then try e.g. ls -la /proc/${PID}/fd to list files that the container is producing.
I considered suggesting Google Cloud Profiler but it requires an agent for Java and so it would be cumbersome to deploy to Cloud Run and would not obviously yield an answer to your question.
It's the problem of Java, and it's worse with Spring. Java, with a standard JVM use a lot of memory by default (at least 128MB). When you run Spring on top of java, tons of beans are loaded in memory, the library loaded in memory and it take easily more than 350Mb for a simple hello world app.
I wrote an article on that. You have 2 solutions to mitigate the cold start and the memory (and container) size:
Use raw java without heavy framework
Use native compilation (graalVM for instance).
I tried to optimize the JVM used (a micro JVM) or Spring directly (limit the beans loaded, use lazy loading, add JVM parameters). It saved a few second at startup (cold start) but not really memory.
I also started to investigate AppCDS with a former Google CLoud Dev Advocate, but he left the company 1 year ago and I stopped my effort (I'm no longer a Java/Spring developer, but I always liked the concept and I worked on it).
Eventually, you can also have a look to new generation framework, like Micronaut, Quarkus or Vert.x

Spring boot application restart automatically when Cloud foundry updates/upgrade

I am using Cloud Foundry and I deployed my Spring boot application on Cloud. Whenever there is some updates/upgrade happens on Cloud foundry, my application got restart and some request got failed to reach to application as restart of application takes more time to get up.
Is there any way in CF that some instances of application will be running while upgrade/restart of application to process requests.
Also I want to know, if CF provides services from different locations/regions, so consider my application will be deployed on 2 CF containers available on different region. Wherever there is some updates/upgrade available, proceed upgrade on one region for Cf so other CF service from another region will be available and some application instances will be running to serve requests and vice versa.
-Thank you.
What you're describing is the intended behavior of CF.
If you have two or more instance of your application, they should never both go down at the same time. i.e. one will be taken down, then after it's restarted successfully, then the other will be taken down and restarted.
If your operator has configured multiple availability zones for the foundation that you've targeted, then application instances will be distributed across those AZs to help facilitate HA and best possible availability.
If you're not seeing this behavior then you should take a look at the following as these items can affect uptime of your apps:
Do you have more than one application instance? If you only have one application instance, then you can expect to see some small windows of downtime when updates are applied to the foundation and under other scenarios. This happens because a times Diego will need to evict applications running on a Diego Cell. It makes an attempt to start your app on another Cell before stopping the current instance, but there are no guarantees provided around this. Thus you can end up with some downtime, if for example your app is slow to start or your app does not have a good health check configured (like it passes the health check before the app is really up).
Did your operator set up multiple AZs? As a developer, you cannot really tell. This is abstracted away, so you would need to ask your platform operations team and confirm if there are more than one and if so how many. For best possible uptime, have at least as many app instances as you have AZs.
The other thing often overlooked, does your application depend on any services? If so, it is also possible that you will see downtime when services are being updated. That all depends on the services you are using and if there will be associated downtime for management and upgrades of those services. You may be able to tell if this is the case by looking more closely at your application logs when it fails to see if there are connection failures or errors like that. You might also be able to tell by looking at the plan defined in the CF Marketplace. Often the description will say if there are stipulations regarding the plan, like it is or isn't clustered or HA.
UPDATE
One other thing which can cause downtime:
If your operator has the "max in flight" value too high for the number of Diego Cells this can also cause downtime. Essentially, "max in flight" dictates how many Diego Cells will be taken out of service during an upgrade. If this value is too high, you can run into a situation where there is not enough capacity in the remaining Cells to host all of your applications. This ends up resulting in downtime for app instances as they cannot be rescheduled on another Cell in a timely manner. As a developer, I don't think this is something you can troubleshoot, you would need to work with your platform operators to investigate further.
That is probably a theme here. If you are an app developer, you should be talking to your platform operations team to debug this.
Hope that helps!

Is Heroku a replacement for a VPS?

We're currently evaluating Heroku to replace the initial workflow of renting a VPS for a small Web App (since we're working on NodeJS, cPanel hosting plans aren't enough, ergo, VPS).
The confusion lies in Heroku's actual usage as even though it's clear it's used as a platform as a service, there is no Disk (HDD/SSD) limit described.
Web App requirement includes file upload capabilities (profile picture, etc) so I'm not sure Heroku is what we need. Can I get a clear explanation on this?
Not a Heroku expert, but...
You could always use one of the various add-ins that offer database support for storing your images until that no longer works
As the usage of your site scales out, you'd probably want to place static content into a CDN.
I wouldn't consider placing files into Heroku that weren't related to running code and honestly I don't even know if you can.
(I originally just wanted to comment, but need a higher rep :/)

Communication between Heroku apps

I've build a distributed system consisting of several web-services and some web applications consuming them.
They are all hosted on Heroku.
Is there some way for request between these applications to be done "inside heroku" without going through the web.
Something analog to using localhost.
You are maybe in luck: such a feature has currently reached the experimental phase.
Let me take a moment to underscore that: this feature may disappear or change at any time. It's not supported, but bug reports are appreciated. Don't build a bank with it. Don't get yourself in a position to be incredibly sad if severe problems are found that render it unshippable and it's aborted.
However, it is still cool, and here it is: containerized-network
You can use, for example, the pub-sub interface of any of the hosted Redis solutions. Or any of the message brokers (IronMQ, RabbitMQ) to pass messages.

PG::Error: ERROR: out of memory on Heroku

I deployed an application on Heroku. I'm using the free service.
Quite frequently, I get the following error.
PG::Error: ERROR: out of memory
If I refresh the browser, it's ok. But then, it happens again randomly.
Why does this happen?
Thanks.
Sam Kong
If you experience these when running queries, your queries are complicated or inefficient. The free tier has no cache, so you're already out there.
If you're getting these errors otherwise, open a support ticket at https://help.heroku.com
heroku restart simply helped me though
If you are not in a free tier, its maybe because you are using too much memory connecting to PG.
Consider an app running on several dynos, with several processes, each with lots of threads, maybe you are filling up the pool.
Also, as it appears in Heroku's Help Center maybe you are caching too many statements that wont be used.

Resources