improving performance of search query using index field when working with alias - elasticsearch

I am using an alias name when writing data using Bulk Api.
I have 2 questions:
Can I get the index name after writing data using the alias name maybe as part of the response?
Can I improve performance if I send search queries on specific indexes instead to search on all indexes of the same alias?

If you're using an alias name for writes, that alias can only point to a single index which you're going to receive back in the bulk response
For instance, if test_alias is an alias to the test index, then when sending this bulk command:
POST test_alias/_doc/_bulk
{"index":{}}
{"foo": "bar"}
You will receive this response:
{
"index" : {
"_index" : "test", <---- here is the real index name
"_type" : "_doc",
"_id" : "WtcviYABdf6lG9Jldg0d",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
Common sense has it that searching on a single index is always faster than searching on an alias spanning several indexes, but if the alias only spans a single index, then there's no difference.

You can provide the multiple index names while searching the data, if you are using alias and it has multiple indices by default it would search on all the indices, but if you want to filter it based on a few indices in your alias, that is also possible based on the fields in the underlying indices.
You can read the Filter-based aliases to limit access to data section in this blog on how to achieve it, as it queries fewer indices and less data, search performance would be better.
Also alias can have only single writable index, and name of that you can get as part of _cat/alias?v api response as well, which shows which is the write_index for the alias, you can see the sample output here

Related

ElasticSearch, Multiple Indices search

Can anyone help me for below use case:
I want to have 3 schools, each school has multiple students, so i want to be able to search the student names, but the search result should tell me that the text that I searched belongs to which school, here is what i am thinking would be the solution with having a problem:
I should have an index for each school and then using multi match to match the entered text against all the indexes, but the problem is that i want to know each matched result is belong to which index? please if there is a better solution for the use case or how can i solve the mentioned problem. Thank you All..
BR
when you run a search you get back a response that contains this;
"hits" : [
{
"_index" : ".async-search",
"_type" : "_doc",
"_id" : "CdT9fKXfQpOEIPuZazz0BA",
"_score" : 1.0,
"_source" : {
so you can use that _index value there to determine things.
ps - it's Elasticsearch, not ElasticSearch :)

Getting child documents

I have an Elasticsearch index. Each document in that index has a number (i.e 1, 2, 3, etc.) and an array named ChildDocumentIds. There are additional properties too. Still, each item in this array is the _id of a document that is related to this document.
I have a saved search named "Child Documents". I would like to use the number (i.e. 1, 2, 3, etc.) and get the child documents associated with it.
Is there a way to do this in Elastisearch? I can't seem to find a way to do a relational-type query in Elasticsearch for this purpose. I know it will be slow, but I'm o.k. with that.
The terms query allows you to do this. If document #1000 had child documents 3, 12, and 15 then the following two queries would return identical results:
"terms" : { "_id" : [3, 12, 15] }
and:
"terms" : {
"_id" : {
"index" : <parent_index>,
"type" : <parent_type>,
"id" : 1000,
"path" : "ChildDocumentIds"
}
}
The reason that it requires you to specify the index and type a second time is that the terms query supports cross-index lookups.

Elastic Search pipeline search queries

I am looking for a way to pipeline multiple queries into Elastic search. My main problem is that when I receive the results I want to be able to know the which was the query that generated the result. In pseudo-code I would like to do something like following
query1="James Bond"
query2="Sean Connery"
query3="Charlie Chaplin"
pipeline=new ElasticSearchPipeline()
pipeline.add(query1);pipeline.add(query2);pipeline.add(query3)
pipeline.execute()
jamesBondResults=pipeline.getResultsForQuery(query1)
seanConneryResults=pipeline.getResultsForQuery(query2)
charleChaplinResults=pipeline.getResultsForQuery(query3)
The key feature is that I want to send avoid the overhead of sending multiple requests on the ES server, but still be able to treat the results as if I had sent the queries one by one.
The multi search API is exactly what you're looking for.
You can send many queries and the response will contain an array with the responses to each query in the same order:
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2",}
{"query" : {"match_all" : {}}}
'
The response array of the above multi search queries will contain two ES responses with the documents from the first and second queries.

Searching multiple types in elasticsearch

I have a usecase where there are two different types in the same index. Both the types have different structure and mapping.
I need to query both types at the same time using different query DSL.
How can I build my query DSL to simultaneously query more than one type of the same index.
I looked into elasticsearch guide at https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-index-multi-type.html but there is no proper explanation here. According to this even if I set two different types in my request :
/index/type1,type2/_search
I will have to send the same query DSL.
You need to use multi-search API and the _msearch endpoint
curl -XGET localhost:9200/index/_msearch -d '
{"type": "type1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"type": "type2"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
'
Note: make sure to separate each line by newlines (including the last line)
You'll get two responses in the same order as the requests

How to retrieve all the document ids from an elasticsearch index

How to retrieve all the document ids (the internal document '_id') from an Elasticsearch index? if I have 20 million documents in that index, what is the best way to do that?
I would just export the entire index and read off the file system. My experience with size/from and scan/scroll has been disaster when dealing with querying resultsets in the millions. Just takes too long.
If you can use a tool like knapsack, you can export the index to the file system, and iterate through the directories. Each document is stored under it's own directory named after _id. No need to actually open files. Just iterate through the dir.
link to knapsack:
https://github.com/jprante/elasticsearch-knapsack
edit: hopefully you are not doing this often... or this may not be a viable solution
For that amount of documents, you probably want to use the scan and scroll API.
Many client libraries have ready helpers to use the interface. For example, with elasticsearch-py you can do:
es = elasticsearch.Elasticsearch(eshost)
scroll = elasticsearch.helpers.scan(es, query='{"fields": "_id"}', index=idxname, scroll='10s')
for res in scroll:
print res['_id']
First you can issue a request to get the full count of records in the index.
curl -X GET 'http://localhost:9200/documents/document/_count?pretty=true'
{
"count" : 1408,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
Then you'll want to loop through the set using a combination of size and from parameters until you reach the total count. Passing an empty field parameter will return only the index and _id that you're interested in.
Find a good page size that you can consume without running out of memory and increment the from each iteration.
curl -X GET 'http://localhost:9200/documents/document/_search?fields=&size=1000&from=5000'
Example item response:
{
"_index" : "documents",
"_type" : "document",
"_id" : "1341",
"_score" : 1.0
},
...

Resources