Is there any way to check the relocation progress of shards in elasticsearch? - elasticsearch

Yesterday, I was adding a node to production elasticsearch cluster once I added it I can use /_cat/health api to check number of relocating shards. And there is another api /_cat/shards to check which shards are getting relocated. However, is there any way or api to check live progress of shards/data movement to the newly added node. Suppose there is a 13GB shards, we've added a node to es cluster can we check how much percent, GBs(MBs or KBs) has moved currently so that we can have a estimate of how much time it will take for reallocation.
Can this be implemented by on our own or suggest this to elasticsearch? If it can be implemented on our own, how to proceed or what pre-requisites I need to know?

you have
GET _cat/recovery?active_only=true&v
GET _cat/recovery?active_only=true&h=index,shard,source_node,target_node,bytes_percent,time
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-recovery.html

Take a look to the Pending Tasks API :
https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html
The task management API returns information about tasks currently executing on one or more nodes in the cluster.
GET /_tasks
You can also see the reasons for the allocation using the allocation explain API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-allocation-explain.html
GET _cluster/allocation/explain

Related

Elasticsearch - two out of three nodes instant shutdown case

We have a small Elasticsearch cluster for 3 nodes: two in one datacenter and one in another for disaster recovery reasons. However, if the first two nodes fail simultaneously, the third one won't work either - it will just throw "master not discovered or elected yet".
I understand that this is intended - this is how Elasticsearch cluster should work. But is there some additional special configuration that I don't know to keep the third single node working, even if in the read-only mode?
nope, there's not. as you mentioned it's designed that way
you're probably not doing yourselves a lot of favours by running things across datacentres like that. network issues are not kind on Elasticsearch due to it's distributed nature
Elasticsearch runs in distributed mode by default. Nodes assume that there are or will be a part of the cluster, and during setup nodes try to automatically join the cluster.
If you want your Elasticsearch to be available for only node without the need to communicate with other Elasticsearch nodes. It works similar to a standalone server. To do this we can tell Elasticsearch to work in local only (disable network)
open your elasticsearch/config/elasticsearch.yml and set:
node.local: true

Figure out Indexing error in elasticsearch?

I am using ES 1.x version and having trouble to find the errors while indexing some document.
Some documents are not getting indexed and all I saw is below lines in ES logs.
stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
now throttling indexing: numMergesInFlight=4, maxNumMerges=3
I did a quick google and understood the high level of these errors but would like to understand below:
Will ES retry the documents which were throttled?
Is there is any way to know the documents which were throttled by enabling some detailed logging and if yes, then in which classes?
I don't see any error message, apart from above INFO logs. Is there is a way to enable verbose logging for indexing which shows what exactly is going on during indexing?
The throttling messages you see in the logs are not the issue. throttling is happening in the background in order for elasticsearch to protect against segments explosion. see explanation here: https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#segments-and-merging
The throttling does not drop messages, but just slows down indexing, which cause a back pressure to the indexers and external queues.
When indexing fail you should get an error response for the index/bulk request. in order to tell what is the issue you should inspect the responses ES provide for the index/bulk requests. Logs might not tell the full story, as it depends on the log level configuration, which is per module in ES.
Another option is that you are able to index, but the logs don't have the timestamp you think it does. check _cat/indices in order to see if the docs count increases when you index. if the doc counts increases it means that the indexed docs are there, and you need to refine your searches.
elasticsearch does not do retries to the best of my knowledge, that is up to the client (though i havent used 1.x in quite some time)
logstash, for example, retries batches it gets 503 and 429 on exactly for these kinds of reasons https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/lib/logstash/outputs/elasticsearch.rb#L55

how to setup elasticsearch to concurrently handle 20k POST request?

we are trying to collect performance metric from about 20k servers and POST the data to elasticsearch using the below curl command to analyse the data further
curl
-XPOST "$ELASTICSEARCH_URL/sariovm/sar/"
-H 'Content-Type: application/json'
-d '{ "#timestamp" : '\""$DATE3\""', "cpu" : '$cpu', "iowait" : '$iowait', "swapips" : '$swapips', "swapops" : '$swapops', "hostname" : "'$HOSTNAME'" }'
currently we tested it using 80+ POST request to elasticsearch and we have setup only single node to handle the request. How to setup elasticsearch to scale to handle 20K+ POST requests?
Assuming you are tracking 20k server metrics, it should be 20k requests per second since you want to aggregate without having an exact frequency in your use case, 20k servers sending CPU usage could happen all in the same time, why not.
You need to benchmark, and you should start with the default deployment, 3 nodes,1 master, green cluster, read more about what means the elasticsearch types of nodes, special attention to data node and ingestion node, in conclusion you need to start with the default deployment and benchmark, tune and keep benchmarking since every use case is special, yours looks like one where elasticsearch has made a great product for, read about beats, logstash and kibana.
In my personal opinion, if you don't have too much budget and you don't care about real real-time there are some other ways to handle this, like storing the 20k metrics per second in Kafka which is great to handle high io writing capacity, then logstash it to elasticsearch at the capacity your cluster supports, obviously this adds Kafka to your royal pains, problems we like because we know there is always a solution and fun times.
It really depends.
20K+ posts per what? Per second? Per hour? Per day? you'll need that information.
Also, by using a single node you're ignoring the biggest elasticsearch advantage to my opinion (which is, of course, the support of scaling out).
It also depends on the size of your post.
You'll need a lot more information to answer this question, but what I recommend (and what elastic recommends) is to simply try. Use some node and start trying and indexing,
and add resources until you reach your goal

How do I figure out the new master node when my master node fails in ElasticSearch?

Let's say I have 3 nodes. 1 of which is the master.
I have an API (running on another machine) which hits the master and gets my search result. This is through a subdomain, say s1.mydomain.com:9200 (assume the others are pointed to by s2.mydomain.com and s3.mydomain.com).
Now my master fails for whatever reason. How would my API recover from such a situation? Can I hit either S2 or S3 instead? How can I figure out what the new master is? Is there a predictable way to know which one would be picked as the new master should the master go down?
I've googled this and it's given me enough information about how when a master goes down, a failover is picked as the new master but I haven't seen anything clarify how I would need to handle this from the outside looking in.
The master in ElasticSearch is really only for internal coordination. There are no actions required when a node goes down, other than trying to get it back up to get your full cluster performance back.
You can read/write to any of the remaining nodes and the data replication will keep going. When the old master node comes back up, it will re-join the cluster once it has received the updated data. In fact, you never need to worry if the node you are writing on is the master node.
There are some advanced configurations to alter these behaviors, but ElasticSearch comes with suitable defaults.

How to kill the thread of searching request on elasticsearch cluster? Is there some API to do this?

I made a elasticsearch cluster with big data, and the client can send searching request to it.
Sometimes, the cluster costs much time to deal with one request.
My question is, is there any API to kill the specified thread which cost too much time?
I wanted to follow up on this answer now that elasticsearch 1.0.0 has been released. I am happy to announce that there is new functionality that has been introduced that implements some protection for the heap, called the circuit breaker.
With the current implementation, the circuit breaker tries to anticipate how much data is going to be loaded into the field data cache, and if it's greater than the limit (80% by default) it will trip the circuit breaker and there by kill your query.
There are two parameters for you to set if you want to modify them:
indices.fielddata.breaker.limit
indices.fielddata.breaker.overhead
The overhead is the constant that is used to estimate how much data will be loaded into the field cache; this is 1.03 by default.
This is an exciting development to elasticsearch and a feature I have been waiting to be implemented for months.
This is the pull request if interested in seeing how it was made; thanks to dakrone for getting this done!
https://github.com/elasticsearch/elasticsearch/pull/4261
Hope this helps,
MatthewJ
Currently it is not possible to kill or stop the long running queries, But Elasticsearch is going to add a task management api to do this. The API is likely to be added in Elasticsearch 5.0, maybe in 2016 or later.
see Task management 1 and Task management 2.

Resources