How can I aggregate filtered nested documents in ElasticSearch? - elasticsearch

Suppose I have an index with nested document that looks like this:
{
"id" : 1234
"cars" : [{
"id" : 987
"name" : "Volkswagen"
}, {
"id": 988
"name" : "Tesla"
}
]
}
I now want to get a count aggregation of "car" documents that match a certain criteria, e.g. that match a search query. My initial attempt was the following query:
{
"query" : {
"nested" : {
"path" : "cars",
"query" : {
"query_string" : {
"fields" : ["cars.name"],
"query" : "Tes*"
}
}
}
},
"aggregations" : {
"cars" :{
"nested" : {
"path" : "cars"
},
"aggs" : {
"cars" : {
"terms" : {
"field" : "cars.id"
}
}
}
}
}
}
I was hoping here to get an aggregation result with only the ids of cars whose name begin with "Tes". However, the aggregation instead uses all cars that are in a top-level document that also contains a matching nested documents. That is, in the above example "Volkswagen" would also be counted because the top-level document also contains a car that does match.
How can I get an aggregation of just the matching nested documents?

In the mean time I've figured it out: to achieve this a filter aggregation should be added around the the terms aggregation like so:
"aggregations" : {
"cars" :{
"nested" : {
"path" : "cars"
},
"aggs" : {
"cars-filter" : {
"filter" : {
"query" : {
"query_string" : {
"fields" : ["cars.name"],
"query" : "Tes*"
}
}
},
"aggs" : {
"cars" : {
"terms" : {
"field" : "cars.id"
}
}
}
}
}
}
}

Related

ElasticSearch filter by multiple keywords

In ElasticSearch 6.8 I have indexed many documents that contains a collection of tags. The tags are mapped as keyword.
"tags": {
"type": "keyword"
},
When doing
"query" : {
"bool" : {
"must" : { "match" : { "name" : "beach" } },
"filter" : {
"terms" : { "tags" : ["games", "cars"] }
}
}
}
I get documents that contains at least one of those tags. But I want to filter out all documents that do not contains ALL the given tags.
I tried
"query" : {
"bool" : {
"must" : { "match" : { "name" : "beach" } },
"filter" : {
"terms" : {
"tags" : ["games", "cars"],
"minimum_should_match": 2
}
}
}
}
But it throws an error: "[terms] query does not support [minimum_should_match]"
Which would be the correct way of filtering out documents that do not contain those two tags? Note that the real query may contain other "should" clauses as well.

ElasticSearch: Get all elements where a parameter is not unique

I know there is an aggregation to get the count of all unique value for a field.
For example
{
"query" : {
"match_all" : {}
},
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "name"
}
}
},
"size":0
}
With this query I get the count of all the unique name.
But what I want is the list of all the names that are in the index more than once.
I want all the non unique names.
What is the best way to achieve that?
You can use the terms aggregation with a min_doc_count of 2, like this:
{
"query" : {
"match_all" : {}
},
"aggs" : {
"type_count" : {
"terms" : {
"field" : "name",
"min_doc_count": 2
}
}
},
"size":0
}

Multiple filter aggregations vs Returning all buckets

I'm interesting in having sub-aggregrations but for specific keyword value.
"aggregations" : {
"Keyword" : {
"terms" : {
"field" : "keyword"
},
"aggregations" : {
"Concept" : {
"terms" : {
"field" : "concept"
}
}
The following returns only the top 10 first, which does not necessary contains the values I'm interesting in.
I see two main ways of solving my issue:
returning all the buckets and then selecting the ones I'm interesting in.
adding filter aggregations for all the value I'm interesting in. So if I'm interesting in 10 keyword/values, I will perform 10 filter aggregations.
What is the best solution in term of performance?
So if I'm interesting in 10 keyword/values, I will perform 10 filter aggregations.
This isn't necessarily true.
You can create a single filter that eliminates unwanted keywords and you can do this upfront at the query stage:
{
"size" : 0,
"query" : {
"bool" : {
"filter" : [
{
"terms" : {
"keyword" : [ "abc", "def", "ghi" ]
}
}
]
}
},
"aggs" : {
"Keyword" : {
"terms" : {
"field" : "keyword"
},
"aggs" : {
"Concept" : {
"terms" : {
"field" : "concept"
}
}
}
}
}
}
The query stage would filter it down to just abc, def, and ghi. Then the aggregation would work as you expect, but only against documents with those values.

Error:Class cast exception in elastic search while sorting buckets in aggregation

Error:
ClassCastException[org.elasticsearch.search. aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData cannot
be cast to
org.elasticsearch.search.aggregations.support.ValuesSource$Numeric]}{[vTHdFzpuTEGMGR8MES_b9g]
My Query:
GET _search
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"dis_max" : {
"tie_breaker" : 0.7,
"queries" : [ {
"bool" : {
"should" : [ {
"match" : {
"post.body" : {
"query" : "check",
"type" : "boolean"
}
}
}, {
"match" : {
"post.parentBody" : {
"query" : "check",
"type" : "boolean",
"boost" : 2.0
}
}
} ]
}
} ]
}
}
}
},
"aggregations" : {
"by_parent_id" : {
"terms" : {
"field" : "post.parentId",
"order" : {
"max_score" : "desc"
}
},
"aggregations" : {
"max_score" : {
"max" : {}
},
"top_post" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
I want to sort buckets by max_score rather than by doc_count which is the default behaviour of elastic search.
I am trying to aggregate posts (which contains body and parentBody)
by parentId and then sorting buckets by max_score and in each bucket
I am getting top_hits. But I am getting the above error when I sorted
the buckets by defining max score aggregation. Rest everything works if I remove max_score aggregation. Every post object has parentId, body and parentBody. I have used the following references for coding this:
Elasticsearch Aggregation: How to Sort Bucket Order
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example
Tell me what am I doing wrong? I have shared the query above.

Elasticsearch: [filtered] query does not support [highlight]

I am new to Elasticsearch. I have a filtered query as follows
{
"query": {
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"highlight" : {
"fields" : {
"title" : {}
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
When I tried this query and got the error:
[filtered] query does not support [highlight]
Does filtered query support highlight? If not, how can I achieve highlight in query with filters? I have to use filters.
Thanks and regards!
The "highlight" parameter should go at the same level as the "query" parameter, not embedded within it. In your case it should look something like this:
{
"query": {
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
},
"highlight" : {
"fields" : {
"title" : {}
}
}
}
Highlighting reference
Highlights problems with a filtered query

Resources