VMWare ESXi, RHEL, LUKS and network latency - performance

My company is running into a network performance problem that seemingly has all of the "experts" we're working with (VMWare support, RHEL support, our managed services hosting provider) stumped.
The issue is that network latency between our VMs (even VMs residing on the same physical host) increases--up to 100x or more!--with network throughput. For example, without any network load, latency (measured by ping) might be ~0.1ms. Start transferring a couple 100MB files, and latency grows to 1ms. Initiate a bunch (~20 or so) concurrent data transfers between two VMs, and the latency between the VMs can increase to upwards of 10ms.
This is a huge problem for us because we have application server VMs hosting processes that might issue 1 million or so queries against a database server (different VM) per hour. Adding a millisecond or two to each query therefore increases our runtime substantially--sometimes doubling or tripling our expected durations.
We've got what I would think is a pretty standard environment:
ESXi 6.0u2
4 Dell M620 blades with 2x Xeon E5-2650v2 processors and 128GB RAM
SolidFire SAN
And our base VM configuration consists of:
RHEL7, minimal install
Multiple LUNs configured for mount points at /boot, /, /var/log, /var/log/audit, /home, /tmp and swap
All partitions except /boot encrypted with LUKS (over LVM)
Our database server VMs are running Postgres 9.4.
We've already tried the following:
Change the virtual NIC from VMNETx3 to e1000 and back
Adjust RHEL ethernet stack settings
Using ESXi's "low latency" option for the VMs
Upgrading our hosts and vCenter from ESX 5.5 to 6.0u2
Creating bare-bones VMs (setup as above with LUKS, etc., but without any of our production services on them) for testing
Moving the datastore from the SSD SolidFire SAN to local (on-blade) spinning storage
None of these improved network latency. The only test that showed expected (non-deteriorating) latency is when we set up a second pair of bare-bones VMs without LUKS encryption. Unfortunately, we need fully encrypted partitions (for which we manage the keys) because we are dealing with regulated, sensitive data.
I don't see how LUKS--in and of itself--can be to blame here. Rather, I suspect that LUKS running with some combination of ESX, our hosting hardware, and/or our VM hardware configuration is to blame.
I performed a test in a much wimpier environment (MacBook Pro, i5, 8GB RAM, VMWare Fusion 6.0, Centos7 VMs configured similarly with LUKS on LVM and the same testing scripts) and was unable to reproduce the latency issue. Regardless of how much network traffic I sent between the VMs, latency remained steady at about 0.4ms. And this was on a laptop with a ton of the things going on!
Any pointers/tips/solutions will be greatly appreciated!

After much scrutiny and comparing the non-performing VMs against the performant VMs, we identified the issue as a bad selection for the advanced "Latency Sensitivity" setting.
For our poorly performing VMs, this was set to "Low". After changing the setting to "Normal" and restarting the VMs, latency dropped by ~100x and throughput (which we hadn't originally noticed was also a problem) increased by ~250x!

Related

Creating Elasticsearch cluster from three servers

We have three physical servers. Each server has 2 CPUs (32 cores), 96 TB HDD, and 768 GB RAM. We would like to use these servers in an Elasticsearch cluster.
Each server will be located in a different data center, connecting each server using a private connection.
How can be optimize our configuration for high performance? Also, how should we best run Elasticsearch on these machines. For example, should we use virtualization to create multiple nodes per machine, or not?
As you have huge RAM(768) available on each physical server and according to ES documentation on heap setting it shouldn't cross 32 GB, so you will have to use virtualization to create multiple nodes per physical server for better ultization of your infra.
Apart from these there are various cluster settings and node settings which you can optimize but as you have not provided them, its difficult to provide recommendation on them.
Another thing to note is that you have huge RAM and disk but CPU is not in proportion to it, so if you can increase them as well, it would be good.

How to increase the request per second on amazon EC2 T2.micro instance?

I recently lunched a Amazon EC2 instance, the T2.micro. After installed Wildfly 8.2.0Final, I try to do a load test of the web server. I tested the server to serve a static page of less than 500 byte size, and a dynamic page that write and read mysql. To my suprise, I got the similar result, both test get the result of around 1000 RPS. I monitored the system using top -d 1, the CPU hasn't reach the max, and there are free memory. I think either EC2 has some limitation on concurrent connections, or my setup needs improvement.
My setup is CentOS 7, WileFly/Jboss 8.2.0 Final, MariaDb 5.5. The test tool is jmeter in distributed mode or command line mode. Tests were performed on remote, on the same subnet, and on the localhost. All get the same result.
Can you please help identify where the bottleneck is. Are there any limitations on Amazon EC2 instance that could affect this? Thanks.
Yes, there are some limitations depending of the EC2 instance type and one of them is network performance.
Amazon doesn't publish the exact limitations of each type of instance, but in the Instance Types Matrix you can see that t2.micro has a low to moderate network performance. If you need better network performance, you can check on the AWS instance types page where it shows which instances have enhanced networking:
Enhanced Networking
Enhanced Networking enables you to get significantly higher packet per second (PPS) performance, lower network jitter and lower latencies. This feature uses a new network virtualization stack that provides higher I/O performance and lower CPU utilization compared to traditional implementations. In order to take advantage of Enhanced Networking, you should launch an HVM AMI in VPC, and install the appropriate driver. Enhanced Networking is currently supported in C4, C3, R3, I2, M4, and D2 instances. For instructions on how to enable Enhanced Networking on EC2 instances, see the Enhanced Networking on Linux and Enhanced Networking on Windows tutorials. To learn more about this feature, check out the Enhanced Networking FAQ section.
You have more information in these SO and SF questions:
Bandwidth limits for Amazon EC2
Does anyone know the bandwidth available for different EC2 Instances?
EC2 Instance Types's EXACT Network Performance?
You're right that 1000 RPS feels awfully low for Wildfly, given that the Undertow server powering it is one of the fastest in Java land and among the 10 fastest, period.
Starting points to optimize:
Make sure that you do not have request logging on (that could cause an I/O bottleneck), use the latest stable JVM, and it's probably worth using the most recent Wildfly version that your app works with.
With that done, you're almost certainly being bottlenecked by connection creation, not your AWS instance. This could be within JMeter, or within the Wildfly subsystem.
To eliminate JMeter as a culprit, try ApacheBenchmark ("ab") at the same concurrency level, and then try it with the -k option on (to allow connection reuse).
If the first ApacheBenchmark number is much higher than JMeter, the issue is the thread-based networking model that JMeter uses (Another load-testing tool, such as gatling or locust.io may be needed).
If the second number is much higher than the first, the bottleneck is proven to be connection creation. The may be solved by tuning the Undertow server settings.
As far as WildFly goes, I'd have to see the config.xml, but you may be able to improve performance by tweaking the Undertow subsystem settings. The defaults are usually solid, but you want a very low number of I/O threads (either 1, or the number of CPUs, no more).
I have seen a trivial Wildfly 10 application far exceed the performance you're seeing on a t2.micro instance.
Benchmark results, with Wildfly 10 + docker + Java 8:
Server setup (EC2 t2.micro running latest amazon linux, in US-east-1, different AZs)
sudo yum install docker
sudo service docker start
sudo docker run --rm -it -p 8080:8080 svanoort/jboss-demo-app:0.7-lomem
Client (another t2.micro, minimal load, different AZ):
ab -c 16 -k -n 1000 http://$SERVER_PRIVATE_IP:8080/rest/cached/500
16 concurrent connections with keep-alive, serving 500 bytes of cached randomly pre-generated data
Results over multiple runs:
430 requests per second (RPS), 1171 RPS, 1527 RPS, 1686 RPS, 1977 RPS, 2471 RPS, 3339 RPS, eventually peaking at ~6500 RPS after hundreds of thousands of requests.
Notice how that goes up over time? It's important to prewarm the server before benchmarking, to allow for enough handler threads to be created, and to allow for JIT compilation. 10,000 requests is a good starting point.
If I turn off connection keepalive? Peaks at about ~1450 RPS with concurrency 16. BUT WAIT! With a single thread (concurrency 1), it only gives ~340-350 RPS. Increasing concurrency beyond 16 does not give higher performance, it remains fairly stable (even up to 512 concurrent connections).
If I increase the request data size to 2000 bytes, by using http://$SERVER_PRIVATE_IP:8080/rest/cached/2000 then it still hits 1367 RPS, showing that almost all of the time is spent on connection handling.
With very large (300k) requests and connection keep-alive, I hit about 50 MB/s between hosts, but I've seen up to 90 MB/s in optimal situations.
Very impressive performance for JBoss/Wildfly there, I'd say. Note that higher concurrency may be needed if there is more latency between hosts, to allow for the impact of round-trip time on connection creation.

Server 2008 R2 slowness, GUI lag, slow file access

I have a box with Windows Server 2008 R2 on it that is running extremely slow.
GUI lag is so bad it can take several minutes just to open up an explorer window for example. Explorer crashes. File access is very slow from connected users. Etc.
Specs:
RAM: 16GB
CPU: Intel Xeon
Disks:
C: 1TB (This has the server OS as well as SQL Server.)
D: 1TB (This is two 1TB disks in RAID 1 holding user data.)
Roles:
AD, GP, DHCP, DNS
What I've checked/noticed:
CPU utilization never really gets above 30%. Average over the space of a day is roughly 2%.
RAM is at 55% utilization. Most all of that of course being used by SQL Server.
Network utilization averages <2%
Disk stats for 5 minutes perfmon while server is symptomatic:
C: Avg. Idle %: ~99%
D: Avg. Idle %: ~99%
C: Avg. Disk Queue Length: Between 0 and 0.015.
D: Avg. Disk Queue Length: Between 0 and 0.023.
In Resource Monitor, I cannot pull a list of processes that are running. Nor can I see any disks. It's all just blank white boxes no matter how long I give it to load the data.
Rebooting the box makes the machine run perfectly. It's always when I come in in the morning that it's slowed back down.
Checking the event log doesn't show any obvious tells as to what's causing this. The only thing I spotted was the classic "The winlogon notification subscriber took 464 second(s) to handle the notification event (Logon)." error. But that really only serves to let me know that the server is in fact running slow.
I'm dying over here trying to figure out what's causing this. Any ideas or help would be most appreciated.
Despite just about every metric telling me the disks were running fine, replacing the disks ultimately fixed the issue. I guess I'll never know why they were behaving like that.
I had the same issuee.
.. and I checked my settings by running this command
netsh interface tcp show global
Also if you just want to test by disabling it run this
netsh interface tcp set global autotuning=disabled
to set it back run this
netsh interface tcp set global autotuning=normal
Worth a try but you could/should reboot the server after making the change..
... and when I only disabled it, my server performance was fixed.

Best EC2 setup for redis server [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We are deploying a large scale web application that uses only redis as a data store. I notice the the benchmark of our redis master is around 8000 transactions per second on EC2, far less than the stated benchmarks on dedicated hardware.
I understand that there is a performance penalty for running Redis on a virtual machine like EC2, but I would love some pointers from people who have deployed Redis in production environments on EC2 on what EC2 setup you have found most effective for getting more out of redis.
Thanks.
EC2 is probably not the best environment to run Redis on virtualized hardware, but it is a popular one, and there are a number of points to know to get the best from Redis on this platform.
I'm one of the authors of http://redis.io/topics/benchmarks and http://redis.io/topics/latency which cover most of the topics I present below. This is just a summary of the main points.
Virtualization toll
It is not specific to EC2, but Redis is significantly slower when running on a VM (in term of maximum supported throughput). This is due to the fact for basic operations, Redis does not add much overhead to the epoll/read/write system calls required to handle client connections (like memcached, or other efficient key/value stores). System calls are typically more expensive on a VM, and they represent a significant part of Redis activity (especially in benchmarks). In that conditions, a 50% decrease in term of maximum throughput compared to bare metal is not uncommon.
Of course, it also depends on the quality of the hypervisor. For EC2, Xen is used.
Benchmarking in good conditions
Benchmarking can be tricky, especially on a platform like EC2. One point often forgotten is to ensure a proper configuration for both the benchmark client and server. For instance, do not run redis-benchmark on a CPU starved micro-instance (which will likely be throttled down by Amazon) while targeting your Redis server. Both machines are equally important to get a good maximum throughput.
Actually, to evaluate Redis performance, you need to:
run redis-benchmark locally (on the same machine than the server), assuming you have more than one vCPU core.
run redis-benchmark remotely (from a different VM), on a machine whose QoS configuration is equivalent to the server machine
So you can evaluate and compare performance of the machines and the network.
On EC2, you will have the best results with second generation M3 instances (or high-memory, or cluster compute instances) so you can benefit of HVM (hardware virtualization) instead of relying on slower para-virtualization.
The fork issue
This is not specific to EC2, but to Xen: forking a large process can be really slow on Xen (it looks better with kvm). For Redis this is a big problem if you plan to use persistence: both persistence options (RDB or AOF) require the main thread to fork and launch background save or rewrite processes.
In some cases, fork latency can freeze Redis event loop for several seconds. The more memory managed by the Redis instance, the more latency.
On EC2, be sure to use a HVM enabled instance (M3, high-memory, cluster), it will mitigate the issue.
Then, if you have large memory requirements, and your application can tolerate it, consider running several smaller Redis instances on the same machine, and shard your data. It can decrease the latency due to fork operations to an acceptable level.
Persistence configuration
This is a key point to get good performance from Redis (both on VM and bare metal). So please take the time to carefully read http://redis.io/topics/persistence
If you use RDB, keep in mind the memory copy-on-write mechanism will start duplicating pages once the save background process has been forked off. So you need to ensure there is enough memory for Redis itself, plus some margin to cope with the COW. the amount of extra memory depends on your workload. The more you write in the instance, the more extra memory you need.
Please note writing a file may also consume some memory (because of the filesystem cache), so during a Redis background save, you need to account for Redis memory, COW overhead, and size of the dump file.
The machine running the Redis server must never swap. If it does, the result will be catastrophic. Contrary to some other stores, Redis is not virtual memory friendly.
With Linux, be sure to set sensible system parameters: vm.overcommit_memory=1 and vm.swappiness=0 (or a very low value anyway). Do not use old kernel versions: they are quite bad at enforcing a low swappiness (resulting in swapping when a large file is written).
If you use AOF, review the fsync options. It is a tradeoff between raw performance and durability of the write operations. You need to make a choice and define a strategy.
You also need to get familiar with the EC2 storage options. On some VM, you have the choice between ephemeral storage and EBS. On some others, you only have EBS.
Ephemeral storage is generally faster, and you will probably get less issues than with EBS, but you can easily loose your data in case of disk failure or reboot of the host, etc ... You can imagine putting RDB snapshots on ephemeral storage, and then copying the resulting files to EBS directories, as a tradeoff between performance and robustness.
EBS is remote storage: it may eat the standard network bandwidth allocated to the VM, and impact the maximum throughput of Redis. If you plan to use EBS, consider selecting the "EBS-optimized" option to establish a QoS between the standard network and storage links.
Finally, a very common setup for performance demanding instances with EC2 is to deactivate persistence on the master, and only activate it on a slave instance. It is probably less safe for the data, but it may prevent a lot of potential latency issues on the master.

Amazon EC2 virtualization with VMware

http://aws.amazon.com/ec2/#pricing
I can't understand this. What is an instance? ("On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.")
Does this mean that I can use whole as my VMware server:
(Extra Large Instance)
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
For $0.96 per hour?
Or does it mean only like one operation or something? What is that instance exactly?
An instance signifies an operating system instance (a virtual machine). By using virtualization, Amazon (and cloud providers in general) offer you a virtualized environment where OS instances are running. You have full control over that operating system inside that environment. Per hour means that you pay that much for using your OS instance resources for a single hour. I believe that page has almost all the details about pricing.
An instance is a virtual machine. For example you can start up an ubuntu instance and then you can SSH into it and do whatever you want.

Resources