Big delay in Grafana loki on high load - grafana-loki

Have a problem. I used grafana loki without promtail (logs sending by python scripts to API). When I send 1000 messages per second - loki work good. When I send 3000 messages per second - in grafana grow delay. I don't understand - is this trouble in ingester or distributor.
P.S. host have 16 cores and 32 gb RAM

Related

Whats the maximum throughput per instance can be achieved in IBM MQ Advanced for Developers?

I am currently using a IBM MQ Advanced for Developers server for testing our client and was able to achieve around 1000 messages per second using the sample consumer written in jms, which seems to be pretty slow. Is this a limit for dev server, and if yes that what throughput can be achieved using a licensed production IBM MQ server.
There is no artificial limit associated with IBM MQ Advanced for Developers. It is the same as the licensed production version of IBM MQ.
You don't say what type of machine you were using, what persistence your messages were, what size they were, or any other qualifying criteria.
You say client, but I don't know whether you mean "network attached application" or "driving application". Clearly if your program is running "client-attached" (MQ parlance for network attached), then the network performance will also come into this.
On my Windows laptop, I get 4500 non-persistent msgs/sec, or 2000 persistent msgs/sec using a simple C-language locally bound program. Over client connection (just using localhost, not actually going out over a real network connection) I get 2700 non-persistent msgs/sec, or 1500 persistent msgs/sec.
You should read the MQ Performance Reports for details of the expected rates you can get.
As an ex MQ performance person I would say - it depends.
At one level you can ask - what can one application in isolation process.
For persistent messages this will come down to the rate at which you can write to the log files.
If you have 10 applications in parallel each putting and getting from their own queue, then you will not get 10 times the throughput - you might get 8 or 9 times the throughput.
If they are all processing the same queue, then the throughput may drop a bit more as the queue usage is serialised.
If only one application is writing to the log, the application may see 1 millisecond response time. If you have 10 applications running concurrently, they may see a 3 milliseconds response time - so individual throughput goes down, but with more threads, the overall throughput goes up.
If you have requests coming in over the network, you need to add network time, but you can run more clients and so get improved throughput.
If your application has a delay built in - it may only process a low message rate. You can have lots (1000s) of these and get a high >overall< throughput.
If your application is putting and getting as fast as possible, you may find that you can run 10-100 instances before the throughput plateaus.
Let's say you want to run you box so it is using 75% of the CPU, and the logging is 50% busy.
If you have just MQ on the box, then this can run more messages than if you had DB2 on the box (with DB2 using 50% of the CPU)
If you have an application (DB2) hammering the disk, then the MQ throughput will go down.
If you have lots of applications putting to a server queue - and one server program, you will find the throughput is limited by the rate at which the server can process work. If it is doing DB2 work, it will be slower than no DB2 work. If you find the server queue depth is over 5 then you need more server instances.
As Morag said, see the performance reports, but they are not the clearest reports to understand.

Data of workflow disappeared unexpectedly in Apache Nifi

My flow works correctly but after one hour data of flow becomes disappeared. I've reduced and increased Heap size memory from 100mb until 8g it did not work, my cpu usage increased until 500% and then data of my flow is disappeared.I mean In/out of all processors became zero,I attached my flow. does anybody have a solution?
my system config:
macOs high sierra
processor 2.3 GHz Intel Core i7
memory 16 GB 1600 MHz DDR3
this is log of my flow
enter image description here
this is my flow after losing data and deleting content
enter image description here
I hope this explanation of these basic concepts clears up the confusion.
About NiFi
NiFi is a flow management tool, you can have processors to ingest, process and egest data.
Typically a message comes in, and goes out once NiFi is done with it.
About statistics
Each processor will keep track of incoming and outgoing messages. These messages are tracked for a while on the processor, and then 'forgotten'. I believe the time period is 5 minutes.
About queues
You can inspect a queue to see the messages in it, if there are no messages you cannot inspect them of course. You might be interested in the provenance.
About provenance
You can check the provenance of a message in the queue to see how it developed (content, timestamps) as it passed the processors. I have personally worked mostly with NiFi in HDF, so I'm not sure if this option is available when you run NiFi without a platform around it.
Detecting problems in NiFi
Of course there may be exceptions, but if NiFi is unable to pick up messages, I would expect them to get stuck in a queue. And if NiFi is processing them but failing, you would expect red squares to start appearing in the UI.
So usually it is quite easy to tell if something is going wrong in NiFi.

Elasticsearch speed vs. Cloud (localhost to production)

I have got a single ELK stack with a single node running in a vagrant virtual box on my machine. It has 3 indexes which are 90mb, 3.6gb, and 38gb.
At the same time, I have also got a Javascript application running on the host machine, consuming data from Elasticsearch which runs no problem, speed and everything's perfect. (Locally)
The issue comes when I put my Javascript application in production, as the Elasticsearch endpoint in the application has to go from localhost:9200 to MyDomainName.com:9200. The speed of the application runs fine within the company, but when I access it from home, the speed drastically decreases and often crashes. However, when I go to Kibana from home, running query there is fine.
The company is using BT broadband and has a download speed of 60mb, and 20mb upload. Doesn't use fixed IP so have to update A record whenever IP changes manually, but I don't think is relevant to the problem.
Is the internet speed the main issue that affected the loading speed outside of the company? How do I improve this? Is cloud (CDN?) the only option that would make things run faster? If so how much would it cost to host it in the cloud assuming I would index a lot of documents in the first time, but do a daily max. 10mb indexing after?
UPDATE1: Metrics from sending a request from Home using Chrome > Network
Queued at 32.77s
Started at 32.77s
Resource Scheduling
- Queueing 0.37 ms
Connection Start
- Stalled 38.32s
- DNS Lookup 0.22ms
- Initial Connection
Request/Response
- Request sent 48 μs
- Waiting (TTFB) 436.61.ms
- Content Download 0.58 ms
UPDATE2:
The stalling period seems to been much lesser when I use a VPN?

Kibana 4 RAM consumption

I installed Kibana 4.3.0 on my VPS having one CPU core and 2GB of RAM running Ubuntu 14.04.3.
Kibana works and my dashboard works as expected, but unfortunately it consumes too much RAM so the VPS begins to swap and has a very high system load.
There is not much data put into ES (about 192 temperature entries per day) so Kibana 4 should not consume too much memory.
Is there any possibility to configure Kibana 4 to consume less RAM, i.e. 256MB at the maximum?
in this thread I found a solution for the memory consumption: https://github.com/elastic/kibana/issues/5170
It seems to be a Node.js problem. Changing the last line in bin/kibana start script to
exec "${NODE}" --max-old-space-size=100 "${DIR}/src/cli" ${#}
as suggested in the thread helped.

How to increase the request per second on amazon EC2 T2.micro instance?

I recently lunched a Amazon EC2 instance, the T2.micro. After installed Wildfly 8.2.0Final, I try to do a load test of the web server. I tested the server to serve a static page of less than 500 byte size, and a dynamic page that write and read mysql. To my suprise, I got the similar result, both test get the result of around 1000 RPS. I monitored the system using top -d 1, the CPU hasn't reach the max, and there are free memory. I think either EC2 has some limitation on concurrent connections, or my setup needs improvement.
My setup is CentOS 7, WileFly/Jboss 8.2.0 Final, MariaDb 5.5. The test tool is jmeter in distributed mode or command line mode. Tests were performed on remote, on the same subnet, and on the localhost. All get the same result.
Can you please help identify where the bottleneck is. Are there any limitations on Amazon EC2 instance that could affect this? Thanks.
Yes, there are some limitations depending of the EC2 instance type and one of them is network performance.
Amazon doesn't publish the exact limitations of each type of instance, but in the Instance Types Matrix you can see that t2.micro has a low to moderate network performance. If you need better network performance, you can check on the AWS instance types page where it shows which instances have enhanced networking:
Enhanced Networking
Enhanced Networking enables you to get significantly higher packet per second (PPS) performance, lower network jitter and lower latencies. This feature uses a new network virtualization stack that provides higher I/O performance and lower CPU utilization compared to traditional implementations. In order to take advantage of Enhanced Networking, you should launch an HVM AMI in VPC, and install the appropriate driver. Enhanced Networking is currently supported in C4, C3, R3, I2, M4, and D2 instances. For instructions on how to enable Enhanced Networking on EC2 instances, see the Enhanced Networking on Linux and Enhanced Networking on Windows tutorials. To learn more about this feature, check out the Enhanced Networking FAQ section.
You have more information in these SO and SF questions:
Bandwidth limits for Amazon EC2
Does anyone know the bandwidth available for different EC2 Instances?
EC2 Instance Types's EXACT Network Performance?
You're right that 1000 RPS feels awfully low for Wildfly, given that the Undertow server powering it is one of the fastest in Java land and among the 10 fastest, period.
Starting points to optimize:
Make sure that you do not have request logging on (that could cause an I/O bottleneck), use the latest stable JVM, and it's probably worth using the most recent Wildfly version that your app works with.
With that done, you're almost certainly being bottlenecked by connection creation, not your AWS instance. This could be within JMeter, or within the Wildfly subsystem.
To eliminate JMeter as a culprit, try ApacheBenchmark ("ab") at the same concurrency level, and then try it with the -k option on (to allow connection reuse).
If the first ApacheBenchmark number is much higher than JMeter, the issue is the thread-based networking model that JMeter uses (Another load-testing tool, such as gatling or locust.io may be needed).
If the second number is much higher than the first, the bottleneck is proven to be connection creation. The may be solved by tuning the Undertow server settings.
As far as WildFly goes, I'd have to see the config.xml, but you may be able to improve performance by tweaking the Undertow subsystem settings. The defaults are usually solid, but you want a very low number of I/O threads (either 1, or the number of CPUs, no more).
I have seen a trivial Wildfly 10 application far exceed the performance you're seeing on a t2.micro instance.
Benchmark results, with Wildfly 10 + docker + Java 8:
Server setup (EC2 t2.micro running latest amazon linux, in US-east-1, different AZs)
sudo yum install docker
sudo service docker start
sudo docker run --rm -it -p 8080:8080 svanoort/jboss-demo-app:0.7-lomem
Client (another t2.micro, minimal load, different AZ):
ab -c 16 -k -n 1000 http://$SERVER_PRIVATE_IP:8080/rest/cached/500
16 concurrent connections with keep-alive, serving 500 bytes of cached randomly pre-generated data
Results over multiple runs:
430 requests per second (RPS), 1171 RPS, 1527 RPS, 1686 RPS, 1977 RPS, 2471 RPS, 3339 RPS, eventually peaking at ~6500 RPS after hundreds of thousands of requests.
Notice how that goes up over time? It's important to prewarm the server before benchmarking, to allow for enough handler threads to be created, and to allow for JIT compilation. 10,000 requests is a good starting point.
If I turn off connection keepalive? Peaks at about ~1450 RPS with concurrency 16. BUT WAIT! With a single thread (concurrency 1), it only gives ~340-350 RPS. Increasing concurrency beyond 16 does not give higher performance, it remains fairly stable (even up to 512 concurrent connections).
If I increase the request data size to 2000 bytes, by using http://$SERVER_PRIVATE_IP:8080/rest/cached/2000 then it still hits 1367 RPS, showing that almost all of the time is spent on connection handling.
With very large (300k) requests and connection keep-alive, I hit about 50 MB/s between hosts, but I've seen up to 90 MB/s in optimal situations.
Very impressive performance for JBoss/Wildfly there, I'd say. Note that higher concurrency may be needed if there is more latency between hosts, to allow for the impact of round-trip time on connection creation.

Resources