I have been experiencing an issue where occasionally my Kibana stops working stating a time-out trying to connect to elasticsearch as the cause. (I have marvel installed. Something like: "plugin:elasticsearch Request Timeout"
Usually these go away by the next day, and occasionally I have been able to re-gain access to my data by increasing the timeout on kibana. However I can't figure out how to troubleshoot this issue. I suspect it may be that ES is storing some extremely large individual documents but I cannot find them, there's just too many logs to dig through by hand.
My elasticsearch cluster is perfectly healthy (green on health check), even when kibana cannot access it.
Where can I possibly start to try and troubleshoot why we are getting timeouts here? when I expand the timeout window, kibana comes back, and everything works FINE.
Any tips on where to start searching would be enormously appreciated!!
Related
I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.
There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.
Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.
I'm using elasticsearch for my website. Everything was normal, I received response within max 60ms but it suddenly slowed down. Now I'm getting response in min 200ms.
It is likely that your web server is causing this rather than the Elasticsearch service itself. If you can, check the connection logs on your server to see if it is receiving a lot of requests. If you are hosting your Elasticsearch instance on a public-facing website, it's very possible that someone, or multiple people are sending a lot of requests or queries to it which could be causing it to slow down.
It may be a good idea to set up your Elasticsearch behind Apache2 or something similar to protect against this. That way, you can limit the requests to specific IPs and restrict the HTTP methods called against the Elasticsearch cluster.
I have a quite large elasticsearch cluster with more than 100 nodes, and I have a problem when sometimes the cluster starts returning 504 http codes and timing out for requests to the _cat API, in particular /_cat/indices and /_cat/shards. As a result, KOPF it's not loading, I guess because it calls this same API under the hood. This happens even when the cluster is green, and it's only solved when I restart the cluster. Indexing and search, even from Kibana, work ok, as well as other APIs like _cluster/health?level=shards and _cat/nodes
I'm using Elasticsearch 1.7.1, any idea why this might be happening? I know I have to upgrade the version, but I would like to understand what it's going on here.
Note this question is similar to Elasticsearch Not Responding to Certain API Calls / Kibana and Head not loading, but that question hasn't been answered yet.
The exception looks like this.
I am not able to understand the cause for this exception. I tried restarting the server but still it keeps occurring again and again.
This is an ElasticSearch error and not something controlled by Moqui. I have seen this and based on my limited research it appears to happen when there are multiple ElasticSearch nodes running on the same network (in your case probably multiple Moqui nodes on the same network) and they seem to find the other nodes but not successfully sync up with them because they are not configured for it.
I haven't seen this cause any problems with anything else, so it seems annoying but safely ignored. There may be some ElasticSearch configuration that resolves this.
I'm having issues with a large number of conncurrent connections to an Amazon RDS database using propel as the ORM with PHP. The application runs fine during load testing with 20 to 50 connections open at a time, then seems to hit a wall, mushrooms up to maximum connections almost immediately, and everything dies.
I believe Propel is using mysql_pconnect, but I can't find where it designates that, or a simple way to turn it off. I may be chasing a red herring here, but I'm stumped, and there are enough comments on the net regarding pconnect causing problems with too many connections that I thought it would be worth a shot to remove it.
Anyone know how to do this? I have been searching using various phrases, can't seem to find anything.
As it turns out, the error was being caused by the RDS redo log. There is only one size for all RDS instance sizes. On the larger instances sizes, it's possible to fill the redo log and come back to the beginning before the data is written out to the database. At this point it does the 'furiously flushing' thing to get caught up, does not process any new requests, and they pile up like crazy. This eventually caused our app to crash. More, smaller RDS servers fixed the issue, though not very happy with Amazon over this. They need to be able to change the size of the redo logs.