Elasticsearch: is bulk search possible? - elasticsearch

i know there is support for bulk index operation. but is it possible to do the same for search queries? i want to send many different unrelated queries (to do precision/recall testing) and it would probably be faster using bulk query

Yes, you can use the multi search API and the /_msearch endpoint to send as many queries as you wish in one shot.
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2"}
{"query" : {"match_all" : {}}}
'
You'll get a responses array with the response of each query in the same order as in the request.
Note:
make sure to separate each line by a newline character
make sure to add the extra newline after the last query.

Related

improving performance of search query using index field when working with alias

I am using an alias name when writing data using Bulk Api.
I have 2 questions:
Can I get the index name after writing data using the alias name maybe as part of the response?
Can I improve performance if I send search queries on specific indexes instead to search on all indexes of the same alias?
If you're using an alias name for writes, that alias can only point to a single index which you're going to receive back in the bulk response
For instance, if test_alias is an alias to the test index, then when sending this bulk command:
POST test_alias/_doc/_bulk
{"index":{}}
{"foo": "bar"}
You will receive this response:
{
"index" : {
"_index" : "test", <---- here is the real index name
"_type" : "_doc",
"_id" : "WtcviYABdf6lG9Jldg0d",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
Common sense has it that searching on a single index is always faster than searching on an alias spanning several indexes, but if the alias only spans a single index, then there's no difference.
You can provide the multiple index names while searching the data, if you are using alias and it has multiple indices by default it would search on all the indices, but if you want to filter it based on a few indices in your alias, that is also possible based on the fields in the underlying indices.
You can read the Filter-based aliases to limit access to data section in this blog on how to achieve it, as it queries fewer indices and less data, search performance would be better.
Also alias can have only single writable index, and name of that you can get as part of _cat/alias?v api response as well, which shows which is the write_index for the alias, you can see the sample output here

Elastic Search pipeline search queries

I am looking for a way to pipeline multiple queries into Elastic search. My main problem is that when I receive the results I want to be able to know the which was the query that generated the result. In pseudo-code I would like to do something like following
query1="James Bond"
query2="Sean Connery"
query3="Charlie Chaplin"
pipeline=new ElasticSearchPipeline()
pipeline.add(query1);pipeline.add(query2);pipeline.add(query3)
pipeline.execute()
jamesBondResults=pipeline.getResultsForQuery(query1)
seanConneryResults=pipeline.getResultsForQuery(query2)
charleChaplinResults=pipeline.getResultsForQuery(query3)
The key feature is that I want to send avoid the overhead of sending multiple requests on the ES server, but still be able to treat the results as if I had sent the queries one by one.
The multi search API is exactly what you're looking for.
You can send many queries and the response will contain an array with the responses to each query in the same order:
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2",}
{"query" : {"match_all" : {}}}
'
The response array of the above multi search queries will contain two ES responses with the documents from the first and second queries.

Searching multiple types in elasticsearch

I have a usecase where there are two different types in the same index. Both the types have different structure and mapping.
I need to query both types at the same time using different query DSL.
How can I build my query DSL to simultaneously query more than one type of the same index.
I looked into elasticsearch guide at https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-index-multi-type.html but there is no proper explanation here. According to this even if I set two different types in my request :
/index/type1,type2/_search
I will have to send the same query DSL.
You need to use multi-search API and the _msearch endpoint
curl -XGET localhost:9200/index/_msearch -d '
{"type": "type1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"type": "type2"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
'
Note: make sure to separate each line by newlines (including the last line)
You'll get two responses in the same order as the requests

can terms lookup mechanism query be nested

I want to know can I nest a terms lookup mechanism query in anther terms lookup mechanism.
For instance:
curl -XPUT localhost:9200/users/user/2 -d '{
"tweets" : ["1", "3"]
}'
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":["1","2","3"]
}'
curl -XPUT localhost:9200/comments/comment/1 -d '{
"uuid" : "1"
}'
As you know, we can use a terms lookup mechanism query to get tweets which belong to the user:
curl -XGET localhost:9200/tweets/tweet/_search -d'{
"query" : {
"terms" : {
"uuid" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "tweets"
}
}
}
}'
But if i want to get comments, i must do anther query.
However my documents is so many, it is not a good method.
So i want to nest terms lookup query in order to get comments in only one query by user's id, can i?
I will so appreciate it, if you can give me some help. Thank you! :)
At the moment, this is not possible as far as I know, because you expect data from three different indices to be returned in one query, which would equate to a JOIN. The terms lookup query sort of implements JOINs between two indices "only" (which is already quite cool considering the fact that ES does not want to support JOINs in the first place).
One way out of this would be to refactor your data model to get rid of the comments index and use either parent/child and/or nested relationships within the tweet mapping type. Since a comment can only belong to a single tweet and there aren't usually hundreds of comments on a tweet (I'm pretty confortable with the idea that 99% of the time there are less than half a dozen comments per tweet, if any at all), you could add comments either as a child documents or as a nested document (my preference), instead of just storing their ids in the comments array. That way you'd get your comments right away with your existing query, without the need for a second query.
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":[{
"id": 1,
"content": "Nice tweet!"
},{
"id": 2,
"content": "Way to go!"
},{
"id": 3,
"content": "Sucks!"
}]
}'
Or you can wait for this pull request (#3278) (Terms Lookup by Query/Filter (aka. Join Filter)) to be merged, which will effectively allow to do what you're asking for, but that PR has been created more than 2 years ago and there still are conflicts to be resolved.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Resources