ubuntu server cpu utilisation increasing very quickly after installing ELK - elasticsearch

I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.

There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.

Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.

Related

Panels to have in Kibana dashboard for troubleshooting applications

what are some good panels to have in kibana visualisation for developers to troubleshoot issues in applications? I am trying to create a dashboard that developers could use to pinpoint where the app is having issues. So that they could resolve it. These are a few factors that I have considered :
Cpu usage of pod, memory usage of pod, network in and out, application logs are the ones I have got in mind. Any other panels I could add to so that developers could get an idea where to check if something goes wrong in the app.
For example, application slowness could be because of high cpu consumption, app goes down could because OOM kill, request takes longer could be due to latency or cache issues etc Is there any other thing that I could take into consideration if yes please suggest?
So here a few things that we could add are:
Number of pods, deployments,daemonsets,statefulsets present in the cluster
cpu utilised by the pod(pod wise breakdown)
memory utilised by the pod(pod wise breakdown)
Network in/out
5.top memory/cpu consuming pods and nodes
Latency
persistence disk details
error logs as annotations in tsvb
Logstreams to check logs within dashboard.

What are the resource requirements to run Logstash in a k8s pod?

I was noticing that running a ELK stack on a Raspberry Pi running a Kubernetes Cluster. I noticed that it didnt have the resources to run all three containers. I was looking up that with Kubernetes you can put limits and requests on your resources CPU and Memory, and it got me thinking. What are the minimum requirements? To me, applications are greedy, so is there a way to cut down the requirements for Logstash, to emphasize resources for Elasticsearch?
Right now, I am running a Raspberry Pi 4, 4g RAM, 32G disk.
If I can put min and max requirements on the container it will better allow me manage the resources. The think though that I noticed is that there was no insight from what I can tell as to minimum requirements for the different containers.
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-managing-compute-resources.html
The above link i believe tells me that the CPU consumption is greedy, but the default MEMORY for Elastic and Kibana 2Gi and 1Gi respectively. It mentioned nothing about logstash though, and whether or not there is a Minimum requirement for CPUs.
I wasnt sure if I should set each ELK container to 1CPU, 1Gi RAM, and I can try it to see if it functions, but since the concept of it throttling down makes me curious what the happy medium would be.
Logstash is not part of the Elastic Cloud, that is why there is no mention of it in the Elastic Cloud on Kubernetes documentation link that you shared.
Logstash is way more CPU bound than memory bound, but how much memory does it needs is completely dependent on your pipelines.
In Logstash the memory depends on the pipelines, the batch size, the filters used, the number of events per seconds, the queue type etc. If you are running a dev or lab environment I think that you can try to give Logstash 1 CPU and 512 MB of RAM and see if it feets your use case.
But I would say that 4GB is pretty small for a full stack since you need to have memory for the applications and still have some memory left for the sytems.

Elasticsearch load is not distributed evenly

I am facing strange issue with Elasticsearch. I have 8 nodes with same configurations (16GB RAM and 8 core CPU).
One node "es53node6" has always high load as shown in the screenshot below. Also 5-6 nodes were getting stopped yesterday automatically after every 3-4 hours.
What could be the reason?
ES version : 5.3
there can be a fair share of reasons. Maybe all data is stored on that node (which should not happen by default), maybe you are sending all the requests to this single node.
Also, there is no automatic stopping of Elasticsearch built-in. You can configure Elasticsearch that it stops the JVM process when an out-of-memory exception occurs, but this is not enabled by default as it relies on a more recent JVM.
You can use the hot threads API to check where the CPU time is spent in Elasticsearch.

Marklogic latency : Document not found

I am working on a clustered marklogic environment where we have 10 Nodes. All nodes are shared E&D Nodes.
Problem that we are facing:
When a page is written in marklogic its takes some time (upto 3 secs) for all the nodes in the cluster to get updated & its during this time if I then do a read operation to fetch the previously written page, its not found.
Has anyone experienced this latency issue? and looked at eliminating it then please let me know.
Thanks
It's normal for a new document to only appear after the database transaction commits. But it is not normal for a commit to take 3-sec.
Which version of MarkLogic Server?
Which OS and version?
Can you describe the hardware configuration?
How large are these documents? All other things equal, update time should be proportional to document size.
Can you reproduce this with a standalone host? That should eliminate cluster-related network latency from the transaction, which might tell you something. Possibly your cluster network has problems, or possibly one or more of the hosts has problems.
If you can reproduce the problem with a standalone host, use system monitoring to see what that host is doing at the time. On linux I favor something like iostat -Mxz 5 and top, but other tools can also help. The problem could be disk I/O - though it would have to be really slow to result in 3-sec commits. Or it might be that your servers are low on RAM, so they are paging during the commit phase.
If you can't reproduce it with a standalone host, then I think you'll have to run similar system monitoring on all the hosts in the cluster. That's harder, but for 10 hosts it is just barely manageable.

Best ways to diagnose elasticsearch issues?

The question is a little broad, but I feel there is no one place that helps systematically diagnose elastic search issues. The broad categories could be :
Client
Query errors
Incorrect Query Results
Unexplained behaviors
Server
Setup issues
Performance issues
Critical errors
Unexplained behaviors
Example for 1)a) would be to say, log the query string on the server ( reference to how to enable logging would be nice), install the inquistor plugin (link to github) and run the query string yourself. etc.
Your question is very broad and to be honest I am not sure I can fully answer it, however I will tell you how we monitor and manage our cluster.
1 - We log query logs and slow query logs to graylog2 (it uses es under the hood) so we can easily see, report, and alert on all logging from our cluster. We can also view slow queries that have occurred.
2 - we send es stats to statsd and then graph that information in graphite. This way we can see things like cluster state, query counts, indexing counts, jvm stats, disk i/o, etc. All parsed from the es stats api and sent to statsd
3 - we use fabric scripts to deploy/upgrade the cluster and manage plugin installation
4 - we use jenkins and jmeter to run occasional performance tests against the cluster (are we getting slower over time, does the cluster deployment work?)
5 - we use bigdesk and head plugins to keep an eye on the cluster and explore how it is doing.

Resources