How to speed up nagios to monitor hosts over the cloud - amazon-ec2

while using nagios with multiple hosts spread over the network,hosts status shows a recognizable lag and takes a long time to reflect on nagios server cgi.Thus what is the optimal nrpe/nagios configration to speed up the status process for a distributed host environment.
In my case I use nagios core 4.1
nrpe 1.5
server/clients: Amazon ec2

The GUI is usually only updated once each minute (automatically), though clicking refresh can provide you with 'nearly' the latest information. I say nearly because there is a distinct processing loop inside of the Nagios core that causes it to never be real time. NRPE is going to run at the speed of your network connection - it does little else besides sending and receiving tiny amounts of data. About the only delay here is the time it takes to actually perform the check and send back the response - which, of course, has way to many factors to mention. Try looking at the output of
[nagioshome]/bin/nagiostats
There are several entries that tell you:
'Latency' - the time between when the check was scheduled to start, and the actual start time.
'Execution Time' - the amount of time checks are actually taking to run.
These entries will have three numbers, which are; Min / Max / Avg
High latency numbers (in my book that means Avg is greater than 1 second) usually means your Nagios server is over worked. There are a few things you can do to improve latency times, and these are outlined in the 'nagios.cfg' file. This latency has nothing to do with network speed or the speed of NRPE - it is primarily hardware speed. If you're already using the optimal values specified in nagios.cfg, then its time to find some faster hardware.
High execution times (for me an Avg greater than 5 seconds) can be blamed on just about everything except your Nagios system. This can be caused by faulty networks (improper packet routing), over loaded networks, faulty and/or poorly designed checks, slow target systems, ... the list is endless. Nothing you do with the Nagios and/or NRPE configs will help lower these values. Well, you could disable NRPE's encryption to improve wire time; but if you have encryption enabled in the first place, then its not likely you'd want it disabled.

Related

Resources needed for simulating 5000 virtual users, sending every 5 seconds, using Locust

I am trying to simulate 5000 virtual users using Locust, with each user sending a message every 5 seconds. What are the resources needed in terms of EC2 specifications in order to achieve this with some level of concurrency.
Number of users is not so important (in my experience, at least when talking about less than a couple of Users per worker), the only thing that matters is the number of requests per second.
Because the performance depends on exactly what your tests do, it is impossible to give a hard number. But the manual gives some best-case figures on what you can do with FastHttpUser/HttpUser
https://docs.locust.io/en/stable/increase-performance.html#increase-locust-s-performance-with-a-faster-http-client
It is impossible to say what your particular hardware can handle, but in a best case scenario you should be able to do close to 5000 requests per second per core, instead of around 850 for the normal HttpUser (tested on a 2018 MacBook Pro i7 2.6GHz)
If your test plan is reasonably simple then you should be fine running at ~50% of that load.

Records are inserting less in the database when we increase the thread group count from 100 to 200 in Jmeter

Initially i have ran a load test with 100 users for 10 minutes and 1000 records got inserted in the database for the below scenarios.
Employee Creation -- Test script design took 1 minute
Employee Update -- Test script design took 2 minutes
And then I ran the same load test with 200 users for 10 minutes and 1100 records got inserted without any error logs or deadlocks.
My question is when we increase/double the thread group count from 100 to 200, Records insertion also should be double or approximately double. then why is it not happening? Same case with the number requests/samples.
You reached a maximum in your test throughput at about 110 records per min. In other words, you have a bottleneck on client or server, which doesn't allow 200 users to process request concurrently and/or within the same amount of time (either some users wait until they can start processing a request, or each request takes longer, so total number of requests is lower).
Some bottlenecks can be resolved by you (if they are related to script, JMeter configuration or JMeter machine), others have to be resolved on server side (by whoever has access to it), and some cannot be resolved at all (they are true bottlenecks of your app).
Without knowing your application, it's hard to suggest anything beyond general "checklist" items:
Verify JMeter script and check if it has any places where it may wait, take a long time, and so on. For example if your ramp-up period is too high, it may be that "first" user will finish execution, before "last" user even started it. Scriptable samplers, pre- and post-processors may cause delays as well.
Make sure JMeter is configured properly to handle 200 concurrent threads. For example if JMeter heap is set too low, it could be that JMeter is very slow, as it constantly needs to run GC. See this question for how to look at and configure memory (it discusses out of memory error, but even without that error inadequate memory can cause slowness)
Make sure JMeter machine is configured correctly to allow creation of 200+ HTTP connections concurrently. A common issue on both Windows and Linux machine is that people assume that they can have 65535 connections (as maximal number of ports), but in reality, both Windows and Linux limit number of ports they allow by default to be used. Also after the use port may remain in TIME_WAIT or CLOSE_WAIT state for several minutes, which makes it unusable. As a result, running out of ports is quite common. Here's how to monitor and resolve this issue on Windows and Linux.
Check JMeter machine performance as a whole: does it have enough CPU, memory; is it swapping memory, etc.
If none of the above is a problem, you need to look at how requests arrive to the server. If client is capable of sending 200 concurrent requests (which you should have established in previous steps), but server receives them at slower rate, then maybe something in the network slows things down. For example something like slow DNS resolution or slow routing between JMeter and server can cause issues.
Also Item #3 on the client is also applicable to the server.
If requests do arrive to the server at the same speed as they are sent from the client, then probably their processing by the server slows down as number of parallel requests goes up. This is where you are on dev and devOP territory, and probably need to work with them to identify bottlenecks on server side. It could be configuration of the web or application server, application itself, ... anything on app way pretty much.
Performance testing is 10% execution, and 90% analysis and identification of bottlenecks, so here you go.

Socket.io: How to reduce emit delay with many concurrent connections?

Im running a 4-core Amazon EC2 instance(m3.xlarge) with 200.000 concurrent connections with no ressouce problems(each core at 10-20%, memory at 2/14GB). Anyway if i emit a message to all the user connected first on a cpu-core gets it within milliseconds but the last connected user gets it with a delay of 1-3 seconds and each CPU core goes up to 100% for 1-2 seconds. I noticed this problem even at "only" 50k concurrent users(12.5k per core).
How to reduce the delay?
I tried changing redis-adapter to mongo-adapter with no difference.
Im using this code to get sticky sessions on multiple cpu cores:
https://github.com/elad/node-cluster-socket.io
The test was very simple: The clients do just connect and do nothing more. The server only listens for a message and emits to all.
EDIT: I tested single-core without any cluster/adapter logic with 50k clients and the same result.
I published the server, single-core-server, benchmark and html-client in one package: https://github.com/MickL/socket-io-benchmark-kit
OK, let's break this down a bit. 200,000 users on four cores. If perfectly distributed, that's 50,000 users per core. So, if sending a message to a given user takes .1ms each of CPU time, that would take 50,000 * .1ms = 5 seconds to send them all.
If you see CPU utilization go to 100% during this, then a bottleneck probably is CPU and maybe you need more cores on the problem. But, there may be other bottlenecks too such as network bandwidth, network adapters or the redis process. So, one thing to immediately determine is whether your end-to-end time is directly proportional to the number of clusters/CPUs you have? If you drop to 2 cores, does the end-to-end time double? If you go to 8, does it drop in half? If yes for both, that's good news because that means you probably are only running into CPU bottleneck at the moment, not other bottlenecks. If that's the case, then you need to figure out how to make 200,000 emits across multiple clusters more efficient by examining node-cluster-socket.io code and finding ways to optimize your specific situation.
The most optimal the code could be would be to have every CPU do all it's housekeeping to gather exactly what it needs to send to all 50,000 users and then very quickly each CPU does a tight loop sending 50,000 network packets one right after the other. I can't really tell from the redis adapter code whether this is what happens or not.
A much worst case would be where some process gets all 200,000 socket IDs and then goes in a loop to send to each socket ID where in that loop, it has to lookup on redis which server contains that connection and then send a message to that server telling it to send to that socket. That would be a ton less efficient than instructing each server to just send a message to all it's own connected users.
It would be worth trying to figure out (by studying code) where in this spectrum, the socket.io + redis combination is.
Oh, and if you're using an SSL connection for each socket, you are also devoting some CPU to crypto on every send operation. There are ways to offload the SSL processing from your regular CPU (using additional hardware).

Understanding RESTful Web Service stress test results

I'm trying to stress-test my Spring RESTful Web Service.
I run my Tomcat server on a Intel Core 2 Duo notebook, 4 GB of RAM. I know it's not a real server machine, but i've only this and it's only for study purpose.
For the test, I run JMeter on a remote machine and communication is through a private WLAN with a central wireless router. I prefer to test this from wireless connection because it would be accessed from mobile clients. With JMeter i run a group of 50 threads, starting one thread per second, then after 50 seconds all threads are running. Each thread sends repeatedly an HTTP request to the server, containing a small JSON object to be processed, and sleeping on each iteration for an amount of time equals to the sum of a 100 milliseconds constant delay and a random value of gaussian distribution with standard deviation of 100 milliseconds. I use some JMeter plugins for graphs.
Here are the results:
I can't figure out why mi hits per seconds doesn't pass the 100 threshold (in the graph they are multiplied per 10), beacuse with this configuration it should have been higher than this value (50 thread sending at least three times would generate 150 hit/sec). I don't get any error message from server, and all seems to work well. I've tried even more and more configurations, but i can't get more than 100 hit/sec.
Why?
[EDIT] Many time I notice a substantial performance degradation from some point on without any visible cause: no error response messages on client, only ok http response messages, and all seems to work well on the server too, but looking at the reports:
As you can notice, something happens between 01:54 and 02:14: hits per sec decreases, and response time increase, okay it could be a server overload, but what about the cpu decreasing? This is not compatible with the congestion hypothesis.
I want to notice that you've chosen very well which rows to display on Composite Graph. It's enough to make some conclusions:
Make note that Hits Per Second perfectly correlates with CPU usage. This means you have "CPU-bound" system, and the maximum performance is mostly limited by CPU. This is very important to remember: server resources spent by Hits, not active users. You may disable your sleep timers at all and still will receive the same 80-90 Hits/s.
The maximum level of CPU is somewhere at 80%, so I assume you run Windows OS (Win7?) on your machine. I used to see that it's impossible to achieve 100% CPU utilization on Windows machine, it just does not allow to spend the last 20%. And if you achieved the maximum, then you see your installation's capacity limit. It just has not enough CPU resources to serve more requests. To fight this bottleneck you should either give more CPU (use another server with higher level CPU hardware), or configure OS to let you use up to 100% (I don't know if it is applicable), or optimize your system (code, OS settings) to spend less CPU to serve single request.
For the second graph I'd suppose something is downloaded via the router, or something happens on JMeter machine. "Something happens" means some task is running. This may be your friend who just wanted to do some "grep error.log", or some scheduled task is running. To pin this down you should look at the router resources and jmeter machine resources at the degradation situation. There must be a process that swallows CPU/DISK/Network.

How to decide on what hardware to deploy web application

Suppose you have a web application, no specific stack (Java/.NET/LAMP/Django/Rails, all good).
How would you decide on which hardware to deploy it? What rules of thumb exist when determining how many machines you need?
How would you formulate parameters such as concurrent users, simultaneous connections, daily hits and DB read/write ratio to a decision on how much, and which, hardware you need?
Any resources on this issue would be very helpful...
Specifically - any hard numbers from real world experience and case studies would be great.
Capacity Planning is quite a detailed and extensive area. You'll need to accept an iterative model with a "Theoretical Baseline > Load Testing > Tuning & Optimizing" approach.
Theory
The first step is to decide on the Business requirements: how many users are expected for peak usage ? Remember - these numbers are usually inaccurate by some margin.
As an example, let's assume that all the peak traffic (at worst case) will be over 4 hours of the day. So if the website expects 100K hits per day, we dont divide that over 24 hours, but over 4 hours instead. So my site now needs to support a peak traffic of 25K hits per hour.
This breaks down to 417 hits per minute, or 7 hits per second. This is on the front end alone.
Add to this the number of internal transactions such as database operations, any file i/o per user, any batch jobs which might run within the system, reports etc.
Tally all these up to get the number of transactions per second, per minute etc that your system needs to support.
This gets further complicated when you have requirements such as "Avg response time must be 3 seconds etc" which means you have to figure in network latency / firewall / proxy etc
Finally - when it comes to choosing hardware, check out the published datasheets from each manufacturer such as Sun, HP, IBM, Windows etc. These detail the maximum transactions per second under test conditions. We usually accept 50% of those peaks under real conditions :)
But ultimately the choice of the hardware is usually a commercial decision.
Also you need to keep a minimum of 2 servers at each tier : web / app / even db for failover clustering.
Load testing
It's recommended to have a separate reference testing environment throughout the project lifecycle and post-launch so you can come back to run dedicated performance tests on the app. Scale this to be a smaller version of production, so if Prod has 4 servers and Ref has 1, then you test for 25% of the peak transactions etc.
Tuning & Optimizing
Too often, people throw some expensive hardware together and expect it all to work beautifully. You'll need to tune the hardware and OS for various parameters such as TCP timeouts etc - these are published by the software vendors, and these have to be done once the software are finalized. Set these tuning params on the Ref env, test and then decide which ones you need to carry over to Production.
Determine your expected load.
Setup a machine and run some tests against it with a Load testing tool.
How close are you if you only accomplished 10% of the peak load with some margin for error then you know you are going to need some load balancing. Design and implement a solution and test again. Make sure you solution is flexible enough to scale.
Trial and error is pretty much the way to go. It really depends on the individual app and usage patterns.
Test your app with a sample load and measure performance and load metrics. DB queries, disk hits, latency, whatever.
Then get an estimate of the expected load when deployed (go ask the domain expert) (you have to consider average load AND spikes).
Multiply the two and add some just to be sure. That's a really rough idea of what you need.
Then implement it, keeping in mind you usually won't scale linearly and you probably won't get the expected load ;)

Resources