Why latency of requests to Jetty on EC2 Linux high? - performance

I'm running jetty-distribution-9.3.0.v20150612 on Java(TM) SE Runtime Environment (build 1.8.0_51-b16) over AWS EC2 m1.small Linux machine.
It communicates with mobile apps with a mean count of 36 hits per minute, about 60% of traffic using HTTP/2.0, mean CPU utilisation is ~15% at peak and network i/o stands around 5 MB per minute, so it doesn't have any resource choking due to traffic.
Jetty's AsyncNCSARequestLog latency logging shows an average latency of around 2000 ms. As explained in this post, latency is calculated (now - request.getTimeStamp()), so it does not separate the time it took Jetty to handle the request between the time it took to create the HTTP connection.
How do I analyse the request's latency in order to find the bottle neck?

Related

Tomcat maxthreads not increasing after 300, CPU under-utilised for high loads

I have a microservice using spring boot 2.7.0 with embedded NIO tomcat. The application is responsible for receiving requests and for each request it makes 6 parallel remote calls waits at most 2 seconds for response from any of the 6 requests.
While performance testing this microservice using jmeter I observed that the CPU remains under-utilised around 14-15% but the microservice's response time increases to more than a minute. Typically it shouldn't be more than 2-3 seconds.
There are 3 thread configurations in my microservice:
Tomcat threads here I tried various configuration of maxthreads, maxconnection,accept-like (5000,30000,2000), (500,10000,2000), (200,5000,2000) but the CPU is always under-utilised. Here are the properties I am changing
server.tomcat.max-threads=200
server.tomcat.max-connections=5000
server.tomcat.accept-count=2000
server.connection-timeout=3000
For each request received we create a ForkJoinPool with parallelism as 6 to make the 6 remote calls. We tried using an ExecutorService too with different configuration like newSingleThreadExecutor,newCachedThreadPool,newWorkStealingPool. Also increased pool size to around same as maxThreads of tomcat and beyond but the result was same CPU still underutilized but microservice taking more than a minute to respond.
On logging the active thread count here we saw that no matter how much thread pool size or tomcat maxthreads we increased the, active thread count went upto 300 then start declining. We tried with a 4core 8GB system and 8core 16GB system results were exactly same
For making remote calls we use spring rest template with maxConnTotal and maxConnTotalPerRoute same as maxthreads of tomcat. maxConnTotal and maxConnTotalPerRoute are same because all 6 remote calls are to the same server.
Here are the jmeter parameters used -GTHREADS=1000 -GRAMP_UP=180 -GDURATION=300
There are 3 instances of this microservice running, roughly after 2-2.5 minutes after jmeter starts, all 3 instance's response time goes beyond a minute for all requests while CPU remains at 14-15% only. Could someone please help figure out what CPU is not spiking if CPU would spike to 35% then autoscaling would kick in but since CPU is under-utilised no scaling is happening
Use a profiler tool like VisualVM, YourKit or JProfiler to see where your application spends the most time
CPU is not the only possible bottleneck, check Tomcat's connection pool utilization as it might be the case the requests are queuing up, memory usage, network usage, database pool usage, DB slow queries log and so on. If you don't have a better monitoring software or an APM tool in place you can consider using JMeter PerfMon Plugin
We replaced RestTemplate for remote calls with WebClient and introducted WebFlux Mono to make the complete request non-blocking. The request itself now returns our response wrapped in Mono. It solved our issue now there is no idle time as threads are not blocked on IO rather they are busy serving other requests.

How to increase the request per second on amazon EC2 T2.micro instance?

I recently lunched a Amazon EC2 instance, the T2.micro. After installed Wildfly 8.2.0Final, I try to do a load test of the web server. I tested the server to serve a static page of less than 500 byte size, and a dynamic page that write and read mysql. To my suprise, I got the similar result, both test get the result of around 1000 RPS. I monitored the system using top -d 1, the CPU hasn't reach the max, and there are free memory. I think either EC2 has some limitation on concurrent connections, or my setup needs improvement.
My setup is CentOS 7, WileFly/Jboss 8.2.0 Final, MariaDb 5.5. The test tool is jmeter in distributed mode or command line mode. Tests were performed on remote, on the same subnet, and on the localhost. All get the same result.
Can you please help identify where the bottleneck is. Are there any limitations on Amazon EC2 instance that could affect this? Thanks.
Yes, there are some limitations depending of the EC2 instance type and one of them is network performance.
Amazon doesn't publish the exact limitations of each type of instance, but in the Instance Types Matrix you can see that t2.micro has a low to moderate network performance. If you need better network performance, you can check on the AWS instance types page where it shows which instances have enhanced networking:
Enhanced Networking
Enhanced Networking enables you to get significantly higher packet per second (PPS) performance, lower network jitter and lower latencies. This feature uses a new network virtualization stack that provides higher I/O performance and lower CPU utilization compared to traditional implementations. In order to take advantage of Enhanced Networking, you should launch an HVM AMI in VPC, and install the appropriate driver. Enhanced Networking is currently supported in C4, C3, R3, I2, M4, and D2 instances. For instructions on how to enable Enhanced Networking on EC2 instances, see the Enhanced Networking on Linux and Enhanced Networking on Windows tutorials. To learn more about this feature, check out the Enhanced Networking FAQ section.
You have more information in these SO and SF questions:
Bandwidth limits for Amazon EC2
Does anyone know the bandwidth available for different EC2 Instances?
EC2 Instance Types's EXACT Network Performance?
You're right that 1000 RPS feels awfully low for Wildfly, given that the Undertow server powering it is one of the fastest in Java land and among the 10 fastest, period.
Starting points to optimize:
Make sure that you do not have request logging on (that could cause an I/O bottleneck), use the latest stable JVM, and it's probably worth using the most recent Wildfly version that your app works with.
With that done, you're almost certainly being bottlenecked by connection creation, not your AWS instance. This could be within JMeter, or within the Wildfly subsystem.
To eliminate JMeter as a culprit, try ApacheBenchmark ("ab") at the same concurrency level, and then try it with the -k option on (to allow connection reuse).
If the first ApacheBenchmark number is much higher than JMeter, the issue is the thread-based networking model that JMeter uses (Another load-testing tool, such as gatling or locust.io may be needed).
If the second number is much higher than the first, the bottleneck is proven to be connection creation. The may be solved by tuning the Undertow server settings.
As far as WildFly goes, I'd have to see the config.xml, but you may be able to improve performance by tweaking the Undertow subsystem settings. The defaults are usually solid, but you want a very low number of I/O threads (either 1, or the number of CPUs, no more).
I have seen a trivial Wildfly 10 application far exceed the performance you're seeing on a t2.micro instance.
Benchmark results, with Wildfly 10 + docker + Java 8:
Server setup (EC2 t2.micro running latest amazon linux, in US-east-1, different AZs)
sudo yum install docker
sudo service docker start
sudo docker run --rm -it -p 8080:8080 svanoort/jboss-demo-app:0.7-lomem
Client (another t2.micro, minimal load, different AZ):
ab -c 16 -k -n 1000 http://$SERVER_PRIVATE_IP:8080/rest/cached/500
16 concurrent connections with keep-alive, serving 500 bytes of cached randomly pre-generated data
Results over multiple runs:
430 requests per second (RPS), 1171 RPS, 1527 RPS, 1686 RPS, 1977 RPS, 2471 RPS, 3339 RPS, eventually peaking at ~6500 RPS after hundreds of thousands of requests.
Notice how that goes up over time? It's important to prewarm the server before benchmarking, to allow for enough handler threads to be created, and to allow for JIT compilation. 10,000 requests is a good starting point.
If I turn off connection keepalive? Peaks at about ~1450 RPS with concurrency 16. BUT WAIT! With a single thread (concurrency 1), it only gives ~340-350 RPS. Increasing concurrency beyond 16 does not give higher performance, it remains fairly stable (even up to 512 concurrent connections).
If I increase the request data size to 2000 bytes, by using http://$SERVER_PRIVATE_IP:8080/rest/cached/2000 then it still hits 1367 RPS, showing that almost all of the time is spent on connection handling.
With very large (300k) requests and connection keep-alive, I hit about 50 MB/s between hosts, but I've seen up to 90 MB/s in optimal situations.
Very impressive performance for JBoss/Wildfly there, I'd say. Note that higher concurrency may be needed if there is more latency between hosts, to allow for the impact of round-trip time on connection creation.

Web Api Owin Hosted - Throughput dropping

We've recently copied an api that we have from being IIS hosted into a console app (to be hosted with Owin + TopShelf as a service) and have been performance profiling the two hosting options using JMeter.
We throw 18 threads at the apis and we get differing results back from the IIS hosted vs console hosted, specifically as follows :
Response times through IIS are slower. This isn't surprising as the pipeline in IIS is more involved.
Throughput through IIS is consistent, i.e. we don't see significant increases/decreases in throughput (we achieve 5500 requests/responses per min)
Throughput when hosted in a console app starts off very high (20,000 per min) but degrades quickly to approximately 4,500 per min over a 10 minute period.
We're trying to determine what the cause of this throughput drop is when hosting as a console. Why is we start with 20,000 requests per min (presumably calculated on initial response times when it hasn't run for a minute) but degrade to 4,500?
Other things of note, CPU isn't a concern. It's fluctuates to start but settles below 30% available, and memory is average 1.34GB on a 4GB ram machine.
Why might the throughput in IIS be stable and why does it degrade when hosted in using MS Owin hosting through a console app (given stable CPU and Memory)?
Incidentally we're trying to isolate pieces of code that could cause the degradation.
Any thoughts on this would be appreciated.

What can interfere with testing a server's performance?

My HTTP server can't take load tests... It gives really high latency when multiple connections are made.
Server Configuration:
5 instances of (CPU 0.5vCore, Memory 512MB, Disk 20GB)
A load balancer
10G shared bandwidth
When I transfer a 3.5mb zip, it takes about 1second when there is only one connection. However, when over 30 connections are made, it goes up to 20~50 seconds.
I am testing with JMeter on my laptop. Is there a possibility that my testing environment interferes with the load-testing?
If so, what would be a solution to improve my testing environment?
First of all you need to monitor and pin down the problem(s).
Start off by picking up information on these four layers:
CPU Usage
Memory Usage
Network Usage
I/O Usage
All of them on the OS layer. (Monitoring tools will vary depending on your OS).
Once you have this data and you can narrow the problem path (CPU bound, network latency, I/O latency or whatever) an answer will kick in. Also doing this (if it is the first time you are trying to test your app) will help you get scaling information on your environment and your application in general.

Slow Magento Enterprise

We are using Magento Enterprise Edition on Nginx and FPC. Dedicated servers with ample ram and CPU. Everything runs fine with 60-70 visitors. However during high traffic like over 200 active visitors, we starts to have problem. During peak traffic our CPU is still under 10% and with 40% free memory. We have a dedicated App and DB server.
What could be wrong? Could this be a network issue? What are the chances that there is a problem with App server or the code base is not optimized given the fact that CPU is under 10% with ample ram.
Steve
Edit:
I am running a 32 core App server with 64GB of RAM. Have Nginx with PHP-FPM and FASTCGI. Upon checking the logs I found that that PHP-FPM has following errors during peak Hours:
[WARNING] [pool www] child 26196, script '/var/www/magento/index.php' execution timed out (600.011284 sec), terminating
I have 32 Workers process along with worker_connections 1024;
CDN is already setup and network is set to use 1G connection.

Resources