Elasticsearch - Does forcing _id on elasticsearch piles up to only one shard? - elasticsearch

Let's say I have 2 documents with the following ids:
id0001
id0002
Since I am forcing the ids of the document, how does elastic search place this in the shard? Will elasticsearch put all of this in the same shard? In other words, how does elastic search compute where to place the documents in a shard?

Each document is routed to a specific shard depending on its _routing value, which defaults to its ID hash.
routing = _routing != null ? hash(_routing) : hash(_id)
routing_factor = num_routing_shards / num_primary_shards
shard_num = (hash(_routing) % num_routing_shards) / routing_factor
So shard_num will be a direct function of either a specific _routing value or the hash of the document's _id value.
In your sample, id0001 and id0002 would definitely land on two different shards, provided your index has more than one primary shards

Related

Avoid ranking all matching documents in elasticsearch search query

I am having Elasticsearch index with multi-millions of documents. I am running a following search query.
POST testIndex/_search?size=200
{
"query": {
"query_string": {
"query": "(title:QA Manager OR title:QA Lead) AND (skills:JIRA OR skills:Software Development OR skills:Test Case)"
}
}
}
Even if we have passed the limit with size=200, it seems Elasticsearch is doing ranking for all the matching documents and bringing the top 200 with the highest rank.
Is there a way we can limit ranking? meaning do ranking on max 1000 matching documents only?
ES will consider your all data for search and ranking that is how Elasticsearch work. What basically do is, It executes your query in 2 phases, one is query and the second is fetch.
In Query Phase, it executes your query in all shades and get document id and score from each shard and return to requesting node. So in your scenario as size is set to 200, it will get 200 documents id from each shard and return to requesting node.
On requesting node, all the document id and score are merged and sorted based on score and select top document based on size param.
In Fetch phase, the actual docs are retrieved from individual shards where they reside based on ID which are selected in Query Phase and Results are returned to the client.
If you don't want to calculate score for some of your query, then you can move that query to the filter clause in bool query.

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

How Keyword and Numeric data Types are stored in elastic search? is it stored in inverted index?

put sana/_mapping/learn { "properties": { "name":{"type":"text"}, "age":{"type":"integer"} } }
POST sana/learn { "name":"rosy", "age":23 }
Quoting the Elasticsearch doc:
Most fields are indexed by default, which makes them searchable. The
inverted index allows queries to look up the search term in unique
sorted list of terms, and from that immediately have access to the
list of documents that contain the term.
Keyword and numeric data types are also indexed and stored in the inverted index so that these fields are searchable, but if you want you can disable it by setting index type to false, in your index mapping, also on these fields(keyword,numeric) doc_values is enabled by default sorting and aggregations etc, but not enabled on analyzed string(text) fields.
Hope I answered your question and let me know if you have any doubt.

Kibana match documents based on keys from different document

I am trying to find the unique count of documents matched on keys from different documents.
I have two documents in the same ES index:
{
"appVersion": [301],
"uri": "www.something.com?productTransactionId=IS3243"
}
and
{
"tripId":"IS3243",
"event":"my_event"
}
I want to find out unique count of documents where event is my_event and tripId of second document exist in uri of first document and appVersion is 301.
Is it possible in Kibana?

How to update a document using index alias

I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ?
Thanks
Updating using an index alias is not directly possible, the best solution for this is to use a search query using the document id or a term and get the required index. Using the index you can update your document directly.
GET alias-read/{type}/{doc_id} will get the required Document if doc_id is known.
If doc_id is not known, then find it using a unique id reference
GET alias-read/_search
{
"term" : { "field" : "value" }
}
In both cases, you will get a single document as a response.
Once the document is obtained, you can use the "_index" field to get the required index.
PUT {index_name}/{type}/{id} {
"required_field" : "new_value"
}
to update the document.

Resources