Track what documents are coming from which shards in Elasticsearch - elasticsearch

I have enabled routing and all my sets of documents are going to same shard. Now i need to directly hit that machines and see if there is performance gain . But then i haven't found a mechanism to find what document went to which shard. Kindly let me know if there is any way to achieve this.

You can use Search Shards API.
Sample Syntax:
GET /index/type/_search_shards?routing={routing_id}

Related

What is the way to know from elastic search that all shards have been updated for a given document ID?

We would like to trigger some code to start labeling when all the shards in Elastic Search have been updated for the document ID. Matcher in ElasticSearch seem like a new feature we can use. However, the documentation does not suggest any way to know if all the shards have been updated.

elastic search preference setting Custom Value(Java api)

I really really need some helps on elastic search usage in java api...
Let's assume I am using java api from ES.
So far, I understand that elastic search can give inconsistent result due to primary and replica's inconsistency issue(deleting doc makes stats difference in overall due to deletion marking instead of delete it).
So what I tried it
searchRequest.preference("_primary_first").
This gave me consistent result(since it only uses primary shard!)
Now what I want to try in my toy example is,
1) using preference=Custom (string) value
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-preference
2) if I have 5 nodes, I want to designate which node we want to use based on the queryText.
For instance,
'''
if (queryRequest.text().equals("red")) {
// use 1st node
searchRequest.preference("??????")
} else if (queryRequest.text().equals("blue")) {
// use 2nd node
searchRequest.preference("??????")
} else {
// use either 3rd~5th node <- but this is not necessary if it is really hard..
searchRequest.prefernce("???????")
}
'''
Q1)
I guess I need to use custom setting "WISELY" to denote which node to use...
can someone give me simple java.api example?
Q2)
This is another one, but is there any way we can load status for each node from searchResponse?(again in java api friendly)
Q3)
Is there any clever way to specify to use 1st Node(or certain Node Id??) with given query text?(instead using hashmap things...)
For instance,
let say I don't know which query text I will receive, but I want to evenly distribute them to each node(among 5!)
But want to stick with the first choice.
if I see very first query text == "red" and I designate this queryRequest to use Node1, then later I also want to use Node1 if I see the query text == "red" again.. Does someone have idea?
Thank you guys!
Disclaimer:
I am non-CS guy and independant learner who tried to experiment new things to break my comfort zones! :) Please excuse this silly question!
Actually it's not a silly question and the answer has two parts.
You mention nodes and you want to control which node gets what queries based on an attribute.
Some context:
An elasticsearch cluster has elasticsearch nodes
Your documents will be "saved" in an elasticsearch index and the queries you perform will be against that index
An elasticsearch index is but an abstraction, a layer that hides the complexity of shards (basically lucene indices).
Now when you save a document, that document will eventually be stored in a shard (there are segments etc, but no reason to go any further). Now you can have primary shards and replica shards. When you save something, that will go to a primary shard and will be replicated by elasticsearch to the replica shards (if any). Your searches can and will be served both by primary and replica shards.
Now, you want to control which node gets what. What you can control is which shard gets what via routing on save and via routing on search.
You've asked to control which node get's what. Most of the times you won't be needing this. What you can control is what shard gets what, so you'll need to control which node gets what shard. This can be accomplished via shard allocation awareness.
Both of these topics are advanced features and you'll need to make sure to know what you are doing when trying to use them or you'll get very unexpected results.

How does elastic search brings back a node which is down

I was going through elastic search and wanted to get consistent response from ES clusters.
I read Elasticsearch read and write consistency
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html
and some other posts and can conclude that ES returns success to write operation after completing writes to all shards (Primary + replica), irrespective of consistency param.
Let me know if my understanding is wrong.
I am wondering if anyone knows, how does elastic search add a node/shard back into a cluster which was down transiently. Will it start serving read requests immediately after it is available or does it ensures it has up to date data before serving read requests?
I looked for the answer to above question, but could not find any.
Thanks
Gopal
If node is removed from the cluster and it joins again, Elasticsearch checks if the data is up to date. If it is not, then it will not be made available for search, until it is brought up to date again (which could mean the whole shard gets copied again).
the consistency parameter is just an additional pre-index check if the number of expected shards are available in the cluster (if the index is configured to have 4 replicas, then the primary shard plus two replicas need to be available, if set to quorum). However this parameter does never change the behaviour that a write needs to be written to all available shards, before returning to the client.

Elastic search replica selection mechanism

Anyone has information regarding the replica selection mechanism that Elasticsearch uses? I mean the basis on which a particular replica of a shard is selected to serve a query. I have tried to look at Elastic search documentation but I was unable to find the information.
Kindly share any relevant resource.
So, I found the answer via discussion on ElasticSearch forums. To put it simply, ElasticSearch uses a round robin scheme to select replicas to respond to queries from within a replica group (sort-of). This replica group, according to my understanding, is selected based on the the awareness and preference attributes that has been provided in configuration.

Query and allocate data to shards based on tags

I'm running a typical logstash-redis-elasticsearch system to capture all my logs(around 500 GB/day). To my knowledge elasticsearch queries every shard in an index and aggregates the results, but due to the volume of logs per day and the response times needed, I want to query only few shards which of course should be decided on some "tag" in the message. So I'm looking at a way to allocate data to shards based on some tags and query only relevant shards based on the tags. Any leads, references or solutions on how to achieve this ?
I've already looked at shard allocation filtering but that doesn't cater this specific requirement.
Routing is the way to go here.
Specify a route option while indexing will cause the document to be routed on a specific shard. See routing in index API.
You can also extract the routing value from a field. See routing field.
Don't forget to search with the same routing value. See routing option in search.

Resources