Elastic Search pipeline search queries - elasticsearch

I am looking for a way to pipeline multiple queries into Elastic search. My main problem is that when I receive the results I want to be able to know the which was the query that generated the result. In pseudo-code I would like to do something like following
query1="James Bond"
query2="Sean Connery"
query3="Charlie Chaplin"
pipeline=new ElasticSearchPipeline()
pipeline.add(query1);pipeline.add(query2);pipeline.add(query3)
pipeline.execute()
jamesBondResults=pipeline.getResultsForQuery(query1)
seanConneryResults=pipeline.getResultsForQuery(query2)
charleChaplinResults=pipeline.getResultsForQuery(query3)
The key feature is that I want to send avoid the overhead of sending multiple requests on the ES server, but still be able to treat the results as if I had sent the queries one by one.

The multi search API is exactly what you're looking for.
You can send many queries and the response will contain an array with the responses to each query in the same order:
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2",}
{"query" : {"match_all" : {}}}
'
The response array of the above multi search queries will contain two ES responses with the documents from the first and second queries.

Related

Searching multiple types in elasticsearch

I have a usecase where there are two different types in the same index. Both the types have different structure and mapping.
I need to query both types at the same time using different query DSL.
How can I build my query DSL to simultaneously query more than one type of the same index.
I looked into elasticsearch guide at https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-index-multi-type.html but there is no proper explanation here. According to this even if I set two different types in my request :
/index/type1,type2/_search
I will have to send the same query DSL.
You need to use multi-search API and the _msearch endpoint
curl -XGET localhost:9200/index/_msearch -d '
{"type": "type1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"type": "type2"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
'
Note: make sure to separate each line by newlines (including the last line)
You'll get two responses in the same order as the requests

Elasticsearch: is bulk search possible?

i know there is support for bulk index operation. but is it possible to do the same for search queries? i want to send many different unrelated queries (to do precision/recall testing) and it would probably be faster using bulk query
Yes, you can use the multi search API and the /_msearch endpoint to send as many queries as you wish in one shot.
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2"}
{"query" : {"match_all" : {}}}
'
You'll get a responses array with the response of each query in the same order as in the request.
Note:
make sure to separate each line by a newline character
make sure to add the extra newline after the last query.

How to view the response for multiple indices for a single query

I have created multiple indices in elasticsearch and have passed a single query to all of them. Is there any way to know,how many results came from each index?
Here is the screenshot of my elasticsearch head,showing a single aggregation applied to two indices
screenshot:
Here as in the figure you can see I have done an aggregation named "posted_time" on the indices foodfind and comics (red box 1).
But in the response window,to the right,only the results for the index "comics" is shown. How can I see the results for the other index too?
You can use terms aggregation on the field _index for this.
Lets say you need to run the same on index-a , index-b and index-c.
You need to make the request in this pattern -
curl -XPOST 'http://localhost:9200/index-a,index-b,index-c/_search' -d '{
"aggs" : {
"indexStats" : {
"terms" : {
"field" : "_index"
}
}
}
}'

Queries vs. Filters

I can't see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?
The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.
This is what official documentation says:
As a general rule, filters should be used instead of queries:
for binary yes/no searches
for queries on exact values
As a general rule, queries should be used instead of filters:
for full text search
where the result depends on a relevance score
An example (try it yourself)
Say index myindex contains three documents:
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hi Stack Overflow!" }'
Query: How well a document matches the query
Query hello sam (using keyword must)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'
Document "Hello world! I am Sam." is assigned a higher score than "Hello world!", because the former matches both words in the query. Documents are scored.
"hits" : [
...
"_score" : 0.74487394,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
"_score" : 0.22108285,
"_source" : {
"name" : "Hello world!"
}
...
Filter: Whether a document matches the query
Filter hello sam (using keyword filter)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'
Documents that contain either hello or sam are returned. Documents are NOT scored.
"hits" : [
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world!"
}
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
Unless you need full text search or scoring, filters are preferred because frequently used filters will be cached automatically by Elasticsearch, to speed up performance. See Elasticsearch: Query and filter context.
Filters -> Does this document match? a binary yes or no answer
Queries -> Does this document match? How well does it match? uses scoring
Few more addition to the same.
A filter is applied first and then the query is processed over its results. To store the binary true/false match per document , something called a bitSet Array is used.
This BitSet array is in memory and this would be used from second time the filter is queried. This way , using bitset array data-structure , we are able to utilize the cached result.
One more point to note here , the filter cache is created only when the request is executed hence only from the second hit , we actually get the advantage of caching.
But then you can use warmer API , to outgrow this. When you register a query with filter against a warmer API , it will make sure that this is executed against a new segment whenever it comes live. Hence we will get consistent speed from the first execution itself.
Basically, a query is used when you want to perform a search on your documents with scoring.
And filters are used to narrow down the set of results obtained by using query. Filters are boolean.
For example say you have an index of restaurants something like zomato.
Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.
So you will use query to find all the documents containing "pizza" and some results will obtained.
Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.
So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.
What happens is that filters are usually applied on the results obtained by querying your index.
Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.
Source: https://logz.io/blog/elasticsearch-queries/
Queries : calculate score; thus they’re able to return results sorted by relevance.
Filters : don’t calculate score, making them faster and easier to cache.

how to include doc urls in result set

In ElasticSearch, I am wondering how I can get back document urls as well in the search result set? Here is what I meant with some example.
Let's say I index a doc using the following curl command:
curl -XPUT 'http://localhost:9200/ads/offers/1234' -d '{
"name": "blah blah",
"Weight":0.0001,
...
}'
Then I run a search and I want to get the document URL itself in the result set. In the above case, the document URL is the following:
http://localhost:9200/ads/offers/1234.
How can I do that? Is there a special field name for this or do I have to create some kind field to store this explicitly?
elasticsearch search response contains all piece that are needed to build this URL on the client. The record for the URL in you example will look like this:
"hits" : [ {
"_index" : "ads",
"_type" : "offers",
"_id" : "1234",
...
If you really need to get this URL from elasticsearch you can use script field to combine these pieces together into a single field on the server side, although I cannot think of a legitimate scenario where it would be needed.

Resources