elasticsearch read timeout, seems to have too many shards? - elasticsearch

I'm using both elasticsearch 1.4.4/2.1.0 with cluster of 5 hosts on AWS. In the config file, I've set index shard num to 10.
So here comes the strange behavior: I created a index everyday, and when there's about 400 shards or more, the whole cluster returns Read Timeout when using Buik index API.
If I delete some indices, the timeout error disappeared.
Anyone meets similar problem? This is really a big obstacle for storing more data

Related

Elasticsearch warning messages

I have ES running on my local development machine for my Rails app (Using Searchkick). I am getting these error messages:
299 Elasticsearch-6.8.8-2f4c224 "In a future major version, this
request will fail because this action would add [1] total shards, but
this cluster currently has [1972]/[1000] maximum shards open. Before
upgrading, reduce the number of shards in your cluster or adjust the
cluster setting [cluster.max_shards_per_node]."
My config file already has cluster.max_shards_per_node: 2000. Am I missing something here?
299 Elasticsearch-6.8.8-2f4c224 "[types removal] The parameter
include_type_name should be explicitly specified in create index
requests to prepare for 7.0. In 7.0 include_type_name will default to
'false', and requests are expected to omit the type name in mapping
definitions."
I have zero clue where to start looking on this one.
These flood my terminal when I run my re-indexing - looking to resolve it.
I think this is dynamic cluster setting an you should use _cluster/settings API.
obviously it is very wrong that have this number of shards in one node. please read followning article:
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
you can use shrink index API. The shrink index API allows you to shrink an existing index into a new index with fewer primary shards

Elasticsearch reindex gets stuck

Context
We have two Elasticsearch clusters with 6 and 3 nodes each. The cluster with 6 nodes is the one we use in production environment and we use the one with 3 nodes for testing purposes. (We have the same problem in both clusters). All the nodes have the following characteristics:
Elasticsearch 7.4.2
1TB HDD disk
8 GB RAM
In our case, we need to reindex some of the indexes. Those indexes have billions of documents and a size between 50GB and 250GB.
Problem
Whenever we start reindexing, internally or from a remote source, the task starts working correctly but it reaches a point where it stops reindexing, without apparent reason. We canĀ“t see anything in the logs. The task is not cancelled or anything, it only stops reindexing documents, it looks like the task gets stuck. We tried changing GC strategies, we used CMS and Shenandoah but nothing changes.
Has anyone run into the same problem?
It's difficult to find the RCA of these issues without debugging it and with the little information you provided(missing cluster and index configuration, index slow logs information, elasticsearch error logs, Elasticsearch hot threads to name a few).

Elastic search indices are getting recreated after deletion

We are running single node cluster, as a single instance.
Filebeat is the log forwarder for logstash.
We have indices like
abc_12.06.2018
abc_13.06.2018
With 5 primary shards and 1 replica shard.
When I delete abc_12.06.2018, it is getting deleted at that moment and slowly after some time index is getting recreated.
The same is happening with replica 0 as well.
Please help.
Looks like Filebeat just write logs to index, that you deleted and recreate it. Root cause isn't in elasticsearch.
Does "recreated" indices have any data?

How does elastic search brings back a node which is down

I was going through elastic search and wanted to get consistent response from ES clusters.
I read Elasticsearch read and write consistency
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html
and some other posts and can conclude that ES returns success to write operation after completing writes to all shards (Primary + replica), irrespective of consistency param.
Let me know if my understanding is wrong.
I am wondering if anyone knows, how does elastic search add a node/shard back into a cluster which was down transiently. Will it start serving read requests immediately after it is available or does it ensures it has up to date data before serving read requests?
I looked for the answer to above question, but could not find any.
Thanks
Gopal
If node is removed from the cluster and it joins again, Elasticsearch checks if the data is up to date. If it is not, then it will not be made available for search, until it is brought up to date again (which could mean the whole shard gets copied again).
the consistency parameter is just an additional pre-index check if the number of expected shards are available in the cluster (if the index is configured to have 4 replicas, then the primary shard plus two replicas need to be available, if set to quorum). However this parameter does never change the behaviour that a write needs to be written to all available shards, before returning to the client.

Elasticsearch, how many clusters, indexes do I need for 8 applications

I have an ELK Stack set up and accepting log data from 2 of my applications and everything is working ok. Its been running for 25 days and I have nearly 4GB of Data/Documents on a 25GB server.
My question
I have 8 applications in total that I would like to hook up to my ELK Stack.
Is the one cluster OK for this, or do I need to add more clusters? say a cluster for each applications data? If so how do I do that without having to re-index my data?
Why does cluster health say "yellow (244 of 488)?
Should I index each application to index on it own index rather than the default "logstash-{todays-date}". Like my-app-1-{todays-date}, my-app-2-{todays-date} etc..?
your help is greatly appreciated
G
Your cluster is yellow because your logstash-* indices are configured with 1 replica and you probably have a single node. 244 of 488 means that you have 488 shards in all your indices but only 244 are assigned on your single node and 244 remain to be assigned to new nodes. This is not a problem per se, but if your current node were to fail for some reason, you'd probably lose some data, whereas if you had 2+ nodes, the data would be replicated on other nodes, your cluster would be green (and you'd see 488 of 488) and you'd have a lower risk of losing data.
As for your second question, nothing prevents you from storing all the logs from your eight applications in the same daily logstash indices. You just need to make sure that your logstash configuration accounts for every different apps and adds one field with the application name (e.g. app: app1, app: app2, etc) to the indexed log events so that you can then distinguish within Kibana from which app each log event has been issued.
I have only used Elasticsearch and no the complete ELK stack, but I can give some ideas and guess what is going on. 488 = 2 x 244 , so I guess there are un-assigned replica shards in the single-machine cluster. You can update this setting ad-hoc and set it to zero:
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{"index" : {"number_of_replicas" : 0}}'
You should update logstash index template not to use replicas when you are running just a single machine. Also your shards seem to be only about 20 MB in size so I'd recommend each index to use just one shard instead of five, each shard consumes extra resources. Having multiple shards increases indexing speed but slows down queries, you should check if one is sufficient or not.
Index / application / day would speed-up querying if dashboards are mostly application-specific, and you can create a day-specific alias to-be used by cross-application queries.

Resources