Can ElasticSearch perform multiple aggregations with different query conditions in a single request? - elasticsearch

I am looking for a solution to get aggregations, one of each field, but apply different query conditions at different aggregations.
I have a collection of products, which has attributes: type, color, brand.
User selected: brand=Gap, color=White, and type=Sandal. To display the counts of the various similar products of at each aggregation:
Query condition for brand aggregation : color=White, and type=Sandal
Query condition for color aggregation: brand=Gap, and
type=Sandal
Query condition for type aggregation: brand=Gap, and color=White
Can this be done in a single ElasticSearch query?

You'd create three aggregations with a filter agg for each and add the queries you'd like in there. I used the simplest one - bool with term - just to show the high level approach:
"aggs": {
"brand_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"color": "white"
}
},
{
"term": {
"type": "sandal"
}
}
]
}
}
},
"color_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"brand": "gap"
}
},
{
"term": {
"type": "sandal"
}
}
]
}
}
},
"type_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"color": "white"
}
},
{
"term": {
"brand": "gap"
}
}
]
}
}
}
}

Related

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

Terrible has_child query performance

The following query has terrible performance.
100% sure it is the has_child. Query without it runs under 300ms, with it it takes 9 seconds.
Is there some better way to use the has_child query? It seems like I could query parents, and then children by id and then join client side to do the has child check faster than the ES database engine is doing it...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "es"
}
}
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
Cluster info:
CPU and memory usage is low. It is AWS ES Service cluster (v1.5.2). Many small documents, and since version aws is running is old, doc values aren't on by default. Not sure if that is helping or hurting.
Since "stage" is not analyzed (based on your comment) and, therefore, you are not interested in scoring the documents that match on that field, you might realize slight performance gains by using the has_child filter instead of the has_child query. And using a term filter instead of a term query.
In the documentation for has_child, you'll notice:
The has_child filter also accepts a filter instead of a query:
The main performance benefits of using a filter come from the fact that Elasticsearch can skip the scoring phase of the query. Also, filters can be cached which should improve the performance of future searches that use the same filters. Queries, on the other hand, cannot be cached.
Try this instead:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "es"
}
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
I bit the bullet and just performed the parent:child join in my application. Instead of waiting 7 seconds for the has_child query, I fire off two consecutive term queries and do some post processing: 200ms.

Filter with match_all VS query

I have 2 types of queries. They are both logically identical however I'm not sure if there is any performance difference between the two.
I will be glad if someone can enlighten me.
Using match_all and filter:
{
"query": {
"filtered": {
"query": {
"term": {
"user_id": "1234567"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Using term query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Looking at your query it seems like you don't care about how documents are scored based on the value of user_id field being "1234567". What I mean to say is - If more than one document have user_id set to "1234567", you don't care about the order of documents in the result. If that is the case, 2nd option is better with respect to performance because there is some computation cost associated with scoring in the 1st query while there is no scoring in the 2nd query. By the way, your 2nd query can also be simplified to below:
{
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}

How to do nested AND and OR filters in ElasticSearch?

My filters are grouped together into categories.
I would like to retrieve documents where a document can match any filter in a category, but if two (or more) categories are set, then the document must match any of the filters in ALL categories.
If written in pseudo-SQL it would be:
SELECT * FROM Documents WHERE (CategoryA = 'A') AND (CategoryB = 'B' OR CategoryB = 'C')
I've tried Nested filters like so:
{
"sort": [{
"orderDate": "desc"
}],
"size": 25,
"query": {
"match_all": {}
},
"filter": {
"and": [{
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"progress": "incomplete"
}
}, {
"term": {
"progress": "completed"
}
}]
}
}
}, {
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"paid": "yes"
}
}, {
"term": {
"paid": "no"
}
}]
}
}
}]
}
}
But evidently I don't quite understand the ES syntax. Is this on the right track or do I need to use another filter?
This should be it (translated from given pseudo-SQL)
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query":
{
"filtered":
{
"filter":
{
"and":
[
{ "term": { "CategoryA":"A" } },
{
"or":
[
{ "term": { "CategoryB":"B" } },
{ "term": { "CategoryB":"C" } }
]
}
]
}
}
}
}
I realize you're not mentioning facets but just for the sake of completeness:
You could also use a filter as the basis (like you did) instead of a filtered query (like I did). The resulting json is almost identical with the difference being:
a filtered query will filter both the main results as well as facets
a filter will only filter the main results NOT the facets.
Lastly, Nested filters (which you tried using) don't relate to 'nesting filters' like you seemed to believe, but related to filtering on nested-documents (parent-child)
Although I have not understand completely your structure this might be what you need.
You have to think tree-wise. You create a bool where you must (=and) fulfill the embedded bools. Each embedded checks if the field does not exist or else (using should here instead of must) the field must (terms here) be one of the values in the list.
Not sure if there is a better way, and do not know the performance.
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query": {
"query": { #
"match_all": {} # These three lines are not necessary
}, #
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "progress"
}
}
},
{
"terms": {
"progress": [
"incomplete",
"complete"
]
}
}
]
}
},
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "paid"
}
}
},
{
"terms": {
"paid": [
"yes",
"no"
]
}
}
]
}
}
]
}
}
}
}
}

NOT condition in elasticsearch

I am trying to implement NOT condition in elasticsearch query.
Can I Implement filter inside bool or I need to write separate
filter as below. Any optimum solution is there?
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "fashion"
}
},
{
"term": {
"post_status": "publish"
}
}
]
}
},
"filter": {
"not": {
"filter": {
"term": {
"post_type": "page"
}
}
}
}
}
You can use a must_not clause:
{
"query": {
"bool": {
"must": [
{
"match": {
"_all": "fashion"
}
},
{
"term": {
"post_status": "publish"
}
}
],
"must_not": {
"term": {
"post_type": "page"
}
}
}
}
}
Also, I'd recommend using a match filter instead of query_string, as query_string requires the much more strict Lucene syntax (and is therefor more error prone), whereas match works more like a search box: it will automatically transform a human readable query to a Lucene query.

Resources