ElasticSearch: Get all elements where a parameter is not unique - elasticsearch

I know there is an aggregation to get the count of all unique value for a field.
For example
{
"query" : {
"match_all" : {}
},
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "name"
}
}
},
"size":0
}
With this query I get the count of all the unique name.
But what I want is the list of all the names that are in the index more than once.
I want all the non unique names.
What is the best way to achieve that?

You can use the terms aggregation with a min_doc_count of 2, like this:
{
"query" : {
"match_all" : {}
},
"aggs" : {
"type_count" : {
"terms" : {
"field" : "name",
"min_doc_count": 2
}
}
},
"size":0
}

Related

Multiple filter aggregations vs Returning all buckets

I'm interesting in having sub-aggregrations but for specific keyword value.
"aggregations" : {
"Keyword" : {
"terms" : {
"field" : "keyword"
},
"aggregations" : {
"Concept" : {
"terms" : {
"field" : "concept"
}
}
The following returns only the top 10 first, which does not necessary contains the values I'm interesting in.
I see two main ways of solving my issue:
returning all the buckets and then selecting the ones I'm interesting in.
adding filter aggregations for all the value I'm interesting in. So if I'm interesting in 10 keyword/values, I will perform 10 filter aggregations.
What is the best solution in term of performance?
So if I'm interesting in 10 keyword/values, I will perform 10 filter aggregations.
This isn't necessarily true.
You can create a single filter that eliminates unwanted keywords and you can do this upfront at the query stage:
{
"size" : 0,
"query" : {
"bool" : {
"filter" : [
{
"terms" : {
"keyword" : [ "abc", "def", "ghi" ]
}
}
]
}
},
"aggs" : {
"Keyword" : {
"terms" : {
"field" : "keyword"
},
"aggs" : {
"Concept" : {
"terms" : {
"field" : "concept"
}
}
}
}
}
}
The query stage would filter it down to just abc, def, and ghi. Then the aggregation would work as you expect, but only against documents with those values.

Error:Class cast exception in elastic search while sorting buckets in aggregation

Error:
ClassCastException[org.elasticsearch.search. aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData cannot
be cast to
org.elasticsearch.search.aggregations.support.ValuesSource$Numeric]}{[vTHdFzpuTEGMGR8MES_b9g]
My Query:
GET _search
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"dis_max" : {
"tie_breaker" : 0.7,
"queries" : [ {
"bool" : {
"should" : [ {
"match" : {
"post.body" : {
"query" : "check",
"type" : "boolean"
}
}
}, {
"match" : {
"post.parentBody" : {
"query" : "check",
"type" : "boolean",
"boost" : 2.0
}
}
} ]
}
} ]
}
}
}
},
"aggregations" : {
"by_parent_id" : {
"terms" : {
"field" : "post.parentId",
"order" : {
"max_score" : "desc"
}
},
"aggregations" : {
"max_score" : {
"max" : {}
},
"top_post" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
I want to sort buckets by max_score rather than by doc_count which is the default behaviour of elastic search.
I am trying to aggregate posts (which contains body and parentBody)
by parentId and then sorting buckets by max_score and in each bucket
I am getting top_hits. But I am getting the above error when I sorted
the buckets by defining max score aggregation. Rest everything works if I remove max_score aggregation. Every post object has parentId, body and parentBody. I have used the following references for coding this:
Elasticsearch Aggregation: How to Sort Bucket Order
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example
Tell me what am I doing wrong? I have shared the query above.

How to get all unique tags from 2 collections in Elasticsearch?

I have a set of tags stored in document.tags and document.fields.articleTags.
This is how I get all the tags from both namespaces, but how can I get the result merged into one array in the response from ES?
{
"query" : {
"match_all" : { }
},
"size": 0,
"aggs" : {
"tags" : {
"terms" : { "field" : "tags" }
},
"articleTags" : {
"terms" : { "field" : "articleTags" }
}
}
}
Result
I get the tags listed in articleTags.buckets and tags.buckets. Is it possible to have the result delivered in one bucket?
{
"aggregations": {
"articleTags": {
"buckets": [
{
"key": "halloween"
}
]
},
"tags": {
"buckets": [
{
"key": "news"
}
Yes, you can using a single terms aggregation with a script instead that would "join" the two arrays (i.e. add them together), it goes like this:
{
"query" : {
"match_all" : { }
},
"size": 0,
"aggs" : {
"all_tags" : {
"terms" : { "script" : "doc.tags.values + doc.articleTags.values" }
}
}
}
Note that you need to make sure to enable dynamic scripting in order for this query to work.

How can I aggregate filtered nested documents in ElasticSearch?

Suppose I have an index with nested document that looks like this:
{
"id" : 1234
"cars" : [{
"id" : 987
"name" : "Volkswagen"
}, {
"id": 988
"name" : "Tesla"
}
]
}
I now want to get a count aggregation of "car" documents that match a certain criteria, e.g. that match a search query. My initial attempt was the following query:
{
"query" : {
"nested" : {
"path" : "cars",
"query" : {
"query_string" : {
"fields" : ["cars.name"],
"query" : "Tes*"
}
}
}
},
"aggregations" : {
"cars" :{
"nested" : {
"path" : "cars"
},
"aggs" : {
"cars" : {
"terms" : {
"field" : "cars.id"
}
}
}
}
}
}
I was hoping here to get an aggregation result with only the ids of cars whose name begin with "Tes". However, the aggregation instead uses all cars that are in a top-level document that also contains a matching nested documents. That is, in the above example "Volkswagen" would also be counted because the top-level document also contains a car that does match.
How can I get an aggregation of just the matching nested documents?
In the mean time I've figured it out: to achieve this a filter aggregation should be added around the the terms aggregation like so:
"aggregations" : {
"cars" :{
"nested" : {
"path" : "cars"
},
"aggs" : {
"cars-filter" : {
"filter" : {
"query" : {
"query_string" : {
"fields" : ["cars.name"],
"query" : "Tes*"
}
}
},
"aggs" : {
"cars" : {
"terms" : {
"field" : "cars.id"
}
}
}
}
}
}
}

Elastic search has_child query

We have a parent-child (one to many) relation in elastic search, and we want to check for all parent objects where it's child object attribute(child_attr) has any value in it.
we are generating json-queries as below:
1) For Has value condition.
{
"has_child" : {
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"and" : {
"filters" : [ {
"exists" : {
"field" : "child_attr"
}
}, {
"not" : {
"filter" : {
"term" : {
"child_attr" : ""
}
}
}
} ]
}
}
}
},
"type" : "child"
}
}
2) For Has No Value Condition
{
"has_child" : {
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"or" : {
"filters" : [ {
"missing" : {
"field" : "child_attr"
}
}, {
"term" : {
"child_attr" : ""
}
} ]
}
}
}
},
"type" : "child"
}
}
These queries are returning only those parent objects where either all child objects have some value or all child objects have no value the searched attribute.
It doesn't return anything where this condition is met partially which covers majority of data.
I have also toyed with keyword analyzer to index this child_attribute but no joy.
Look forward to your expert suggestions please.
You are getting unexpected results because the query
"missing" : {
"field" : "child_attr"
}
matches both records that were indexed with empty string in the child_attr and records in which child_attr was missing.
The query
"exists" : {
"field" : "child_attr"
}
is exact oposite of the first query, it matches all records that were indexed with a non-empty child_attr field.
The query
"term" : {
"child_attr" : ""
}
doesn't match anything.

Resources