IBM HTTP Server Memory Limit Recommended - websphere

I have some instances of IHS working with more than 75% of memory usage. Is there any official recommendation for this max memory usage?

Is it high or growing unbounded? Are you measuring the RSS of each process or some system-wide metric?
If it's just high, you can make sure you aren't creating far more threads/processes then needed by looking at mod_mpmstats output over time then reducing things like MinSpareThreads. Additionally, MaxMemFree may help.
If you have unbounded growth, you can use MaxSpareThreads and MaxRequestsPerChild to recycle processes regularly.

Related

Spring Boot High Heap Usage

We have a spring boot application that runs on 20 servers and we have a balancer that redirects the requests to our servers.
Since last week we are having huge problems with CPU usage (100% in all VM's) almost simultaneously without having any noticeable increase in the incoming requests.
Before that, we had 8 VM's without any issue.
In peak hours we have 3-4 thousand users with 15-20k requests per 15 minutes.
I am sure it has something to do with the heap usage since all the CPU usage comes from the GC that tries to free up some memory.
At the moment, we isolated some requests that we thought might cause the problem to specific VM's and proved that those VM's are stable even though there is traffic. (In the picture below you can see the OLD_gen memory is stable in those VM's)
The Heap memory looked something like this
The memory continues to increase and there are two scenarios, it will either reach a point and after 10 minutes it will drop on its own at 500MB or it will stay there cause 100% CPU usage and stay there forever.
From the heap dumps that we have taken, it seems that most of the memory has been allocated in char[] instances.
We have used a variety of tools (VisualVM, JProfiler etc) in order to try to debug the issue without any luck.
I don't know if I am missing something obvious, or something else.
I also tried, to change GC algorithm to G1 from the default and disable hibernate query cache plan since a lot of our queries are using the in parameter for filtering.
UPDATE
We managed to reduce the number of requests in our most heavily used API Call and the OLD gen looks like that now. Is that normal?

Poorly configured lambda

Recently I create my first serverless service. Nothing fancy, just a script that uses FFMPEG to encode some CCTV footage and reduce quality.
After a couple of days, I realize that many of my footage where not there. After stare for some time at the aws lambda metrics panel, I assume that the cause of the problem was too little time and to little memory, so I cranked up the timeout (8 minutes) and max memory (380MB). And then I left it work for a couple of days to see if it would get better.
Fast forward to today, I log in on aws and notice that I was never using more than 95MB of memory (as seen on the image).
Last logs
Is this right?
Also, looking at the graph, I notice that I still get some errors. Increase the timeout is the solution?
aws graphs
Sorry for the poor quality of the question, I really tried to lookup.
Available RAM is not the only thing that's influenced by the Memory setting, which is a bit counterintuitive at first. The memory setting in a Lambda function influences:
The amount of compute (vCPUs) that's available to the function
The amount of network throughput that's available
The amount of system memory / RAM that's available
All of these scale based on the memory. You can think of it like this: If you provision 128MB RAM, you also get about 1/8 of a vCPU, if you go to 256 MB, you get a quarter of a vCPU. These are not exact numbers, but it's a useful mental model. Somewhere between 1024 and 1280 MB you get a full vCPU and afterwards a second vCPU is added.
Your workload seems CPU intensive (since there's not GPU to offload it to), so I'd try to increase the Memory to give Lambda more compute power to see how it behaves.

Java 8 Concurrent Hash Map get Performance/Alternative

I have a high throughput low latency application (3000 Request/Sec, 100ms per request), and we heavily use Java 8 ConcurrentHashMap for performing lookups. Usually these maps are updated by a single background thread and multiple threads read from these maps.
I am seeing a performance bottleneck, and on profiling I find ConcurrentHashMap.get as being the hotspot and taking majority of the time.
I another case, I see ConcurrentHashMap.computeIfAbsent being the hotspot, although the mapping-function has very small latency and the profile shows computeIfAbsent spending 90% of the time executing itself, and very less time in executing the mapping-function.
My question is there any way i could improve the performance? I have around 80 threads concurrently reading from CHM.
I have around 80 threads concurrently reading from CHM.
The simplest things to do are
if you have a CPU bound process, don't have more active threads than you have CPUs, otherwise this will only add overhead and if these threads hold a lock while not running, because you have too many threads, it will really not help.
increase the number of partitions. You will want to have at least 4x the number of segments/partitions and you have threads accessing a single map. However, you will get strange behaviour in CHM if you access it with more than 40 threads due to the way cache coherency works. I suggest using a more efficient data structure for higher degrees of concurrency. In Java 8 the concurrencyLevel is a hint, but it is better than leaving the default initialise size of 16.
don't spend so much time in CHM. Find a way to do useful work without hitting a shared resource and your threads will run much more efficiently.
If you have any latencies you can see in a low latency system, you have a problem IMHO.

Reaching limits of Apache Storm

We are trying to implement a web application with Apache Storm.
Applicationreceives a huge load of ad-requests (100 TPS - a hundred transactions / second ),makes some simple calculation on them and then stores the result in a NoSQL database with a maximum latency of 10 ms.
We are using Cassandra as a sink for its writing capabilities.
However, we have already overpassed the 8 ms requirement, we are in 100ms.
We tried to minimize the size of buffers (Disruptor buffers) and to well balance the topology, using the parallelism of bolts.
But we still in 20ms.
With 4 worker ( 8 cores / 16GB ) we are at 20k TPS which is still very low.
Is there any suggestions for optimization orare we just reaching the limits of Apache Storm(limits of Java)?
I don't know the platform you're using, but in C++ 10ms is eternity. I would think you are using the wrong tools for the job.
Using C++, serving some local query should take under a microsecond.
Non-local queries that touch multiple memory locations and/or have to wait for disk or network I/O, have no choice but taking more time. In this case parallelism is your best friend.
You have to find the bottleneck.
Is it I/O?
Is it CPU?
Is it memory bandwidth?
Is it memory access time?
After you've found the bottleneck, you can either improve it, async it and/or multiply (=parallelize) it.
There's a trade-off between low latency and high throughput.
If you really need to have high throughput, you should rely on batching adjusting size of buffers bigger, or using Trident.
Trying to avoid transmitting tuples to other workers helps low latency. (localOrShuffleGrouping)
Please don't forget to monitor GC which causes stop-the-world. If you need low-latency, it should be minimized.

CPU Utilization and threads

We have a transaction intensive process at one customer site running on a quad core server with four processors. The process is designed to take advantage of every core available. So in this installation, we take an input queue, divide it by 16th's and allocate each fraction of the queue to a core. It works well and keeps up with the transaction volume on the box.
Looking at the CPU utilization on the box, it never seems to go above 33%. Now we have a new customer with at least double the volume of the existing customer. Some of us are arguing that since CPU usage is way below maximum utilization, that we should go with the same configuration.
Others claim that there is no direct correlation between cpu utilization and transaction processing speed and since the logic of the underlying software module is based on the number of available cores, that it makes sense to obtain a box with proportionately more cores available for the new client to accommodate the increased traffic volume.
Does anyone have a sense as to who is right in this instance?
Thank you,
To determine the optimum configuration for your new customer, understanding the reason for low CPU usage is paramount.
Very likely, the reason is one of the following:
Your process is limited by memory bandwidth. In this case, faster RAM will help if supported by the motherboard. If possible, a redesign to limit the amount of data accessed during processing will improve performance. Adding more CPU cores will, on its own, do nothing to improve performance.
Your process is limited by disk I/O. Using faster disk connections (SATA etc.) and/or upgrading to a SSD might help, but more CPU power will not.
Your process is limited by synchronization contention. In this case, adding more threads for more cores might even be counter productive. Redesigning your algorithm might help in this case.
Having said this, I have also seen situations where processes that are definitely CPU bound fail to achieve 100% CPU usage on modern processors (Core i7 etc.) because in certain turbo boost relevant cases, task manager will show less than 100%.
As 9000 said, you need to find out what your bottlenecks are when under load. Perfmon might provide enough data to find out.
Another afterthought: You could limit your process on the existing machine to part of the cores (but still at least 30% so that theoretically, CPU doesn't become a bottleneck due to this limitation) and check if overall throughput degrades. If it does not, adding more cores will not improve performance.

Resources