node.js 100% cpu utilization when running StatsD - performance

I've wanted to test StatsD perfromance with some stress test which I've made.
Eventually I found out that when there are something like 80,000 packets per sec the Node.js is getting to 100% cpu utilization on my environment.
I know that 80,000 event per sec is quite huge amount of events , but I wonder if anyone knows what are the limits of StatsD in regarding to Node.js. What is a normal events rate?
Also, is there something I can do to imporve Node.js performance so it won't get to 100% cpu utilization?

According to a StatsD contributer, the metrics rate which was tested and measured was between 15,000 - 20,000 metrics per sec.
Thats quite good enough and it is what I was looking for.
You can see more details about it in the issue I've opened at StatsD's github project:
https://github.com/etsy/statsd/issues/249

Related

getting average of CPU utilization from Kibana

is there a way to get the CPU utilization average result in specific timing from kibana graph instead of calculating it by myself ?
Tldr;
You can monitor your cluster with Metricbeat.
Although it will only give you the system load.
But you have no more details.
Then again there are modules such as the Linux one, which offer more information on the cpu usage.
If none of those offer what you need you might want to roll out your own solution ^^
Or wait for an update of someone with better knowledge.

Jmeter: Response time decreased AND throughput also decreased

I am running my jmeter script for almost a week and observed an interesting thing today. Below is the scenario:
Overview: I am gradually increasing the load on the application. In my last test I gave load of 100 users on the app and today I increased the load to 150 users.
Result of 150 users test:
Response time of the requests decreased compared to the last test. (Which is a good sign)
Throughput decreased drastically to half of what I got in the previous test with less load.
Received 225 errors while executing the test.
My questions are:
What could be the possible reason for such strange behavior of throughput? Why did throughput decrease instead of increasing with the increasing load?
Did I get good response time as many of my requests failed?
NOTE: Till 100 users test throughput was increasing with the increasing load of users.
Can anyone please help me with this question. I am a new bee in performance testing. Thanks in Advance!!
Also, would like to request if anyone can suggest good articles/site etc on finding performance bottleneck and learning crucial things in performance.
Most probably these 225 requests which failed returned failure immediately therefore average response time decreased, that's why you should be looking into i.e. Response Times Over Time chart and pay more attention to percentiles as mean response time can mask the real problem.
With regards to the bottleneck discovery, make sure to collect as much information from the server side as you can, i.e.
CPU, RAM, Network, Disk usage from JMeter PerfMon Plugin
Slow queries log from the database
"heaviest" functions and largest objects from the profiling tool for your application

Performance Difference bet RAFT Orderer and Orderer with Kafka(Latency, Throughput, TPS)

Did anyone compare performance(Latency, Throughput, TPS) between orderer with Kafka and RAFT Orderer?
I could see here a considerable difference in terms of latency, throughput, and TPS.
I tried with the same setup with the same resource configuration on two different VM(the Only difference is the orderer system).
Note: Used Single orderer in both networks.Fabric Version: 1.4.4
Orderer with Kafka is more efficient than RAFT. I am using the default configuration for RAFT and Kafka.
I tried with a load generator at a rate of 100 TPS. WIth Kafka all parameters are fine(latency- 0.3 to 2 sec) whereas using RAFT, latency is gradually increasing 2 to 15+ seconds, the tx failure rate is also high.
What could be the reason for this considerable difference in terms of TPS, throughput, and latency?
Please correct If I am doing something wrong.
For starters I would not run performance tests using a single orderer. These fault tolerance systems are there to handle distribution and consensus of a distributed system, so by running a single orderer you are fundamentally removing the reason they exist. It's as if you are comparing two sports cars on a dirt road and wonder which is the fastest.
Then there are other things that come into play, such as if you connect the services over TLS, the general network latency as well as how many brokers/nodes you are running.
Chris Ferris performed an initial performance analysis of the two systems prior to the release of Raft, and it seemed it was both faster and could handle almost twice as many transactions per second. You can read his blog post here: Does Hyperledger Fabric perform at scale?
You should also be aware of the double-spending problem and key collisions that can occur if you run a distributed system under high load. You should take necessary steps to avoid this, which can cause a bottle-neck. See this Medium post about collisions, and Hyperledger Fabric's own documentation on setting up a high throughput network.

Golang High GC pause times on docker/kubernetes

I am migrationg a web application written in Go from AWS Elastic Beanstalk to Kubernets and I noticed that the garbage collector pause times (I am using Newrelic to monitor the application) increased about a 100 when running the application.
I believe it is related with the CPU limiting that the Kubernet does.
Does anyone have any idea about what is really causing it? Is it possible to overcome it?
Below there is a small example of this difference.
Elastc Beanstalk:
Kubernets:
After some tests and more research I discovered some interesting things.
The CPU limit on Docker seems to have a great influence on GC time/pauses. After some tests I got the CPU limit to 500m which means about 1/2 CPU of a 8-core machine.
I set GOMAXPROCS = 1 and GOGC = 1000 and this lead to less and faster GC pauses, however the average memory usage increased.
Here are a 27h overview of Kubernets and Elastic Beanstalk
Kubernetes:
Elastic Beanstalk:

StatsD and complex systems/applications

StatsD, has been around for some years now, thanks Etsy and Flickr. I have recently stumbled upon it and been 'playing' with it. There are several reasons that make me love it.
I wonder if somebody is using it along large and heavily used systems and has some feedback on it? How is StatsD working out for your cases?
Statsd works well up to 20k packets/sec (UDP packets / sec), but then starts to drop metrics after that as it's not fast enough to process that many. For some metrics workloads, accuracy is required and so sampling is not an option. It can be pretty easy to eat up this 20k / sec budget.
There are various other statsd implementations that have better performance. One of them is https://github.com/github/brubeck which claims it can process up to 4 million metrics / sec. YMMV, but I've been using brubeck in production and it can handle way more load than statsd can.

Resources