Why SpringBoot continues to consume a lot of memory even after the load test is done - spring-boot

My website is a SpringBoot application deployed in Linux server. I add JMX so that I can monitor this website by JVisualVM. I run it as below:
nohup java -Djava.rmi.server.hostname=10.25.161.45 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=22222 -jar sms-web.jar &
I start load test and I can see that the memory consume is escalating very fast, as the red rectangle shows here. This is understandable because there are a lot of TCP connections establised and disposed while in load test. But after the load test is done, it still continues to consume a lot of memeory, sometimes to 800MB, as you can see in the green rectangle. What happens under the cover? Why it consums so much memory?
Edit: Is there any way for JVM to do a through GC to release a lot of memory?

This is quite normal behaviour for any Java application. As long as your application is running, objects will be created, even when it's not being used (threadpools being checked/refreshed, ...).
That means that it's only normal that heap memory goes up over time. However, when necessary, the garbage collector will run, and will destroy any objects that are no longer in use. That's why you can see a clear memory usage drop at certain times.
Now, the garbage collector will only run when necessary. This process often happens when your memory in use (blue line) is getting close to the memory that is allowed to be used (orange line).
This explains two things:
It explains the zigzag lines during your loadtest, which means many objects were being created and being destroyed afterwards by the garbage collector.
It explains why your application can use more than 800MB of memory before freeing anything, because there was still more memory it allowed to be consumed.
Now, if you think it's problematic that your application allows about 1GB (or more) memory to be consumed, you'll have to play around with your JVM settings, and perhaps reduce the -Xmx parameter to a reasonable amount. Make sure to run some loadtests afterwards, to see how your application behaves when reducing the maximum allowed memory, as it could have a higher impact on the performance of your application since the garbage collector would have to run more often.

Related

Java buildpack memory calculation

Java buildpack memory calculator with Spring Boot application inside of Docker container with 1GB memory calculates memory as it says in documentation, it takes entire available memory and this are calculated JVM options:
Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx747490K -XX:MaxMetaspaceSize=157725K -Xss1M (Total Memory: 1G, Thread Count: 50, Loaded Class Count: 25433, Headroom: 0%)
Question is why does it takes entire available memory and gives it to JVM? It should leave some memory for java process outside of JVM. This can lead to OOM because JVM thinks it has 1GB for itself (747490K for heap), and in reality it has less because some of it's memory is used by native memory, outside of JVM.
Should I not use this calculator and set JVM configuration by myself or I can reconfigure this somehow?
Question is why does it takes entire available memory and gives it to JVM?
The assumption is that the only thing running in your container is your Java application, thus it assigns all of the available memory to be used.
If you do things like shell out and run other processes or run other processes in the container, you need to tell memory calculator so it can take that into account.
This can lead to OOM because JVM thinks it has 1GB for itself (747490K for heap), and in reality it has less because some of it's memory is used by native memory, outside of JVM.
The memory calculator takes into consideration the major memory regions within a Java process. Not just heap. That said, it cannot 100% guarantee that you will never go over your memory limit. That's impossible with a Java app.
There are things you can do as an application developer, like create 10,000 threads or JNI, that cannot be restricted and could potentially consume a whole ton of memory. If you do that, your app will go over its container memory limit and crash.
The memory calculator attempts to give you a reasonable memory configuration for most common Java workloads. Running a web app, running a microservice, running some batch jobs, etc...
If you are doing something that doesn't fit within that pattern, then you can simply tell the memory calculator and it'll adjust things accordingly.
Should I not use this calculator and set JVM configuration by myself or I can reconfigure this somehow?
Even if you need to customize what the calculator is doing it can be helpful. It's additional toil to calculate these values manually, especially when it's so easy to change the memory limits. If your ops team increases the memory limit of the container, you want your application to automatically adjust to that configuration (as well as it can).
Beyond that, memory calculator is also good at detecting problems early. If you configure the JVM manually and you mess it up, let's say you over-allocate memory, the JVM won't necessarily care until it tries to get more memory and can't. At some point down the road, you're going to have a problem but it's not clear when (probably at 3am on a Sat, lol).
With memory calculator, it's doing the math when your container first starts to make sure that memory settings are sane. If there's something off with the configuration, it'll fail and let you know.
TIPS:
You can override a memory calculator-defined value by simply setting that JVM option in the JAVA_TOOL_OPTIONS env variable. For example, if I want to allow for more direct memory, I would set JAVA_TOOL_OPTIONS='-XX:MaxDirectMemorySize=50M'. Then when you restart the container, the memory calculator will shift memory around to accommodate that.
The one thing you don't want to set is -Xmx. The memory calculator should always set this because it will set it to whatever is left after other regions have been accounted for. You can think of it like HEAP = CONTAINER_MEMORY_LIMIT - (all static memory regions).
If you were to set -Xmx, you have to get it exactly right. If it's too low then you're wasting memory. If it's too high then you could exceed the container memory limit and get crashes.
In short, if you think you want to set -Xmx, you should either increase the container memory limit or decrease one of the static memory regions.
If you run other things in the container, you need to set the headroom. This is done with the BPL_JVM_HEAD_ROOM env variable. Give it a percent of the total container memory limit. Ex: BPL_JVM_HEAD_ROOM=20 would use 80% of the container's memory limit for Java and 20 for other stuff.
Setting some headroom can be useful in other cases as well, like if you're troubleshooting a container crash and you want a little extra room, or if you don't like operating at 100% the memory limit. You can leave 5 or 10% unused to match your comfort level.
If you have an application that uses a lot of threads, you'll need to adjust this as well. The default is 250 threads, which works well for many web/servlet-based applications (thread per request model). We do automatically lower to 50 threads if you're specifically using Spring Webflux which does not need so many threads.
For other cases, it's up to you to configure this. For example, if you have a batch application that only needs a thread pool of 10, then you could set this 40 or 50. 40-50 seems weird in this example, but the JVM creates a number of its own threads and you need to account for those in addition to application-specific threads when in doubt look at a thread dump.

neo4j getting slow & stuck on amazon ec2

I have a neo4j running on an ec2 instance (large, ubuntu) and I'm running some scripts on it that do lots of writings.
I noticed that after a while that those scripts run (after they wrote couple of thousand nodes) the server starting to run very slow, sometimes to the point it get absolutely stuck. another weird part - resetting the instance from this situation usually ends up in the server taking much longer than usual to init.
first I suspected that neo4j uses up all the RAM and this is a paging problem, but I've read that neo4j calculates dynamically the heap size and stack size limits. also I checked memory usage with top and it looked like most of the RAM was unused, except for Java process occasionally popping up, taking few GBs and then disappear quickly, which I assumed was neo4j.
anyway here's my question(s): do I need to config neo4j server and/or wrapper, or should I let neo4j calculate it dynamically on its own? and did someone encountered something like I described and have any idea what could cause it?
thanks!
It's been my experience that you definitely need to tweak the memory settings to your needs. The neo4j manual has a whole section on it:
http://neo4j.com/docs/stable/configuration.html
I've not really heard of neo4j automatically adjusting to your server's memory capabilities, though just last night I did run across what seemed like a new configuration variable in conf/neo4j.properties:
# The amount of memory to use for mapping the store files, either in bytes or
# as a percentage of available memory. This will be clipped at the amount of
# free memory observed when the database starts, and automatically be rounded
# down to the nearest whole page. For example, if "500MB" is configured, but
# only 450MB of memory is free when the database starts, then the database will
# map at most 450MB. If "50%" is configured, and the system has a capacity of
# 4GB, then at most 2GB of memory will be mapped, unless the database observes
# that less than 2GB of memory is free when it starts.
#mapped_memory_total_size=50%

GC in .NET 4.0 not affecting Working Set in task manager

OK, just to be clear, I understand that the Task Manager is never a good way to monitor memory consumption of a program. Now that I've cleared the air...
I've used SciTech's .Net memory profiler for a while now, but almost exclusively on the .Net 3.5 version of our app. Typically, I'd run an operation, collect a baseline snapshot, run the operation again and collect a comparison snapshot and attack leaks from there. Generally, the task manager would mimic the rise and fall of memory (within a range of accuracy and within a certain period of time).
In our .net 4.0 app, our testing department reported a memory when performing a certain set of operations (which I'll call operation A). Within the profiler, I'd see a sizable change of live bytes (usually denoting a leak). If I immediately collect another snapshot, the memory is reclaimed (regardless of how long I waited to collect the initial comparison snapshot). I thought the finalizer might be getting stuck so I manually injected the following calls:
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
When I do that, I don't see the initial leak in the profiler, but my working set in the Task Manager is still ridiculously high (operation A involves loading images). I thought the issue might be a stuck finalizer (and that SciTech was able to do some voodoo magic when the profiler collects its snapshots), but for all the hours I spent using WinDbg and MDbg, I couldn't ever find anything that suggested the finalizer was getting stuck. If I just let my app sit for hours, the memory in the working set would never reduce. However, if I proceeded to perform other operations, the memory would drop substantially at random points.
MY QUESTION
I know the GC changed substantially with CLR 4.0, but did it affect how the OS sees allocated memory? My computer has 12 GB RAM, so when I run my app and ramp up my memory usage, I still have TONS free so I'm hypothesizing that it just doesn't care to reduce what it's already allocated (as portrayed in the Task Manager), even if the memory has been "collected". When I run this same operation on a machine with 1GB RAM, I never get an out of memory exception, suggesting that I'm really not leaking memory (which is what the profiler also suggests).
The only reason I care what the Task Manager shows because it's how our customers monitor our memory usage. If there is a change in the GC that would affect this, I just want to be able to show them documentation that says it's Microsoft's fault, not ours :)
In my due diligence, I've searched a bunch of other SO threads for an answer, but all I've found are articles about the concurrency of the generational cleanup and other unrelated, yet useful, facts.
You cannot expect to see changes in the use of managed heap immediately reflected in process memory. The CLR essentially acts as a memory manager on behalf of your application, so it allocates and frees segments as needed. As allocating and freeing segments is an expensive operation the CLR tries to optimize this. It will typically not free an empty segment immediately as it could be used to serve new managed allocations. If you want to monitor managed memory usage, you should rely on the .NET specific counters for this.

Help w/ memory management...allocations shows no leaks but program leaks like crazy

So I have autoreleased/released every object that I alloc/init/copy...and the allocations instrument seems to show minimal leaks...however...my program's memory usage does not stop increasing. I have included a screenshot of my allocations run (I have run allocations for longer but it remains relatively constant...it certainly does not compare to the amount the program gains when actually running. When running my program it will double in memory over the course of about 10 hours. The memory drastically increases in the first 5 minutes however (2-3MB), and just keeps on going. I don't understand why allocations would remain constant when running in instruments but my program would just keep gaining memory when actually run.
Since I can't post images yet...here is the link to the screenshot:
allocations run
UPDATE: Here are some screenshots from my memory heapshot analysis...I am not allocating these objects explicitly and don't really know where they are coming from. Almost all of them have their source with something similar to the second screenshot details on the right (lots of HTTPs and URLs in the call tree). Anybody know where these are coming from? I know I've read about some NSURLConnection leaks but I have tried all of the cache clearing that those suggest to no avail. Thanks for all the help so far!
memory heap analysis 1
memory heap analysis 2
Try heapshots.
Are you running with different environment variables when you run in different environments?
For example, you could have NSZombie enabled when you launch your app (causing all your objects to not be free'd) but not when you run in Instruments?
Just for a sanity check - How are you determining memory usage? You say that memory usage keeps going up, but not when you run in Instruments. Given that Instruments is a reliable way of measuring memory usage (the most reliable way?) this sounds a little odd - a bit like saying memory keeps going up except when i try to measure it.
If you are using autoreleased objects (like [NSString stringWithFormat:]) in a loop the pool won't be drained until that loop is exited and the program is allowed to complete the main event loop, at which point the autorelease pool is drained and a new one is instantiated.
If you have code like this the solution is to instantiate a new auto release pool before entering your loop, then draining it periodically during your loop (and reinstantiating the auto release pool after you drain it).
You can use Instruments to find out the location of where your allocations are originating. When running Instruments in the Allocation mode:
Move your mouse over the Category field in the Object Summary
Click on the Grey Circle with an arrow which appears next to the field name
This will bring up a list of locations where the objects in that category have been instantiated from, and the stats of how many allocations each have made.
If your memory usage is rising (but not leaking) you should be able to see where that memory was created, and then track down why it is hanging around.
This tool is also very useful in reducing your memory profile for mobile applications.

Garbage Collection in Biztalk, What would be the wise approach?

Our Biztalk 2006 application contains two orchestrations which are invoked on frequent basis (approx 15 requests per second). We identified possible memory leakages in our application by doing certain throttling threshold changes in the host. As we disabled the memory based throttling, the Process memory started increasing till 1400 MB and after that we started to experience the out of memory exceptions.
We are forced to restart the host instances when this situation occurs.
We were wondering if explicitly calling GC.Collect from the Orchestration is fruitful in such a case. and what could be the cons of using this approach?
Thanks.
Out of memory exceptions occur only if the garbage collector was unable to free enough memory to perform a requested allocation. This can happen if you have a memory leak, which in a garbage collected platform, means some object references are kept longer than they need to. Frequent causes of leaks are objects that hold global data (static variables), such as a a singleton, a cache or a pool that keeps references for too long.
If you explicitly call GC.Collect, it will also fail to free the memory for the same reasons as implicit collection failed. So the explit GC.Collect call would only result in slowing down the orchestration.
If you are calling .Net classes from your orchestrations, I suggest trying to isolate the problem by calling the same classes from a pure .Net application (no BizTalk involved)
It's also possible that there's no leak, but that each instance is consuming too much memory at the same time. BizTalk can usually dehydrate orchestrations when it finds it necessary, but it may be prevented for doing that if a step in the orchestration (or a large atomic scope) takes too long to execute.
1400 mb also looks large for only 15 concurrent instances. Are you doing manipulations on large messages in the orchestration? In that case you can greatly reduce memory usage by avoiding operations that force the whole message to be loaded in memory, and instead manipulate the message using streaming.
Not knowing Biztalk my answer may be way off…
I am assuming that many more orchestration instances running in a process increases the time it take for a single orchestration instances to complete. Then as you increase the number of orchestration instances that you let run at the same time, at some point the time it takes them to complete will be large enough that the size of the running orchestration instances is to great for your RAM.
I think you need to do throttling based on the number of running orchestration instances. If you graph “rate of completion” against “number of running orchestration instances” you will likely see a big flat zone in the middle of the graph, choose your throttling to keep you in the middle of this stable zone.
I agree with the poster above. Trying to clear the memory or resetting the host instance is not the solution but just a band aid. you need to find where you are leaking memory. I would look at the whole Application and not just the orchestration; it is possible that the ports could also be causing your memory leak. Do you use custom functoids in your maps? how about inline code? Custom xslt?
I would also look at custom pipelines if you are using them.
If its possible i would try isolating the different components and putting them under stress and volume tests individually; somehow i dont think your orchestration itself is the problem, but rather a map or a custom component.
Doing a garbage collect isn't going to free up memory leaked, as its still (in error) referenced by your application somehow. You would only call GC.Collect if you did a lot of generating short lived objects and knew you were at a good point to free them.
You must identify and fix the leaking code!
I completely agree with most of what the other's said - you should look into where is your leak and fix it; calling the GC directly will not help you and in any case is very unlikely to be a reasonable way forward.
I would add though, that the throttling exists to protect your environment from grinding to a halt should you encounter sudden rise in resource consumption; without throttling it is possible for BizTalk (like any other server) to reach a point where it cannot continue processing and effectively "get stuck"; throttling allows it to slow down in order to ensure processing is still happening, until the resource consumption level (hopefully) returns to normal levels;
for that reason I would also suggest that you consider having some throttling configured for your environment, the values of which would have to be tweaked to suit your scenario

Resources