Elasticsearch: has_child query with children aggregation - bucket counts are wrong - elasticsearch

I'm attempting to find parents based on matches in their children and retrieve children term aggregations for the matches. For some reason, the bucket count for the children aggregation is showing a higher count than actual results (I would be happy if it showed the count of the parents - or the children - in the particular children bucket).
The query is similar to the following (NOTE: I use the filtered query as I will later add a filter in addition to the query):
{
"query" : {
"filtered" : {
"query" : {
"has_child" : {
"type" : "blog_tag",
"query" : {
"filtered" : {
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
}
},
"aggs" : {
"my_children" : {
"children" : {
"type" : "my_child_type"
},
"aggs" : {
"field_name" : {
"terms" : {
"field" : { "blog.blog_tag.field_name" }
}
}
}
}
}
}
What is the correct way to do this?

The problem was as noted in the comments. The solution was to filter the aggregation with the query,
"query" : {
"filtered" : {
"query" : {
"has_child" : {
"type" : "blog_tag",
"query" : {
"filtered" : {
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
}
},
"aggs" : {
"my_children" : {
"children" : {
"type" : "my_child_type"
},
"aggs" : {
"results" : {
"filter" : {
"query" : {
"filtered" : {
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
},
"aggs" : {
"field_name" : {
"terms" : {
"field" : { "blog.blog_tag.field_name" }
}
}
}
}
}
}
}

Related

Error:Class cast exception in elastic search while sorting buckets in aggregation

Error:
ClassCastException[org.elasticsearch.search. aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData cannot
be cast to
org.elasticsearch.search.aggregations.support.ValuesSource$Numeric]}{[vTHdFzpuTEGMGR8MES_b9g]
My Query:
GET _search
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"dis_max" : {
"tie_breaker" : 0.7,
"queries" : [ {
"bool" : {
"should" : [ {
"match" : {
"post.body" : {
"query" : "check",
"type" : "boolean"
}
}
}, {
"match" : {
"post.parentBody" : {
"query" : "check",
"type" : "boolean",
"boost" : 2.0
}
}
} ]
}
} ]
}
}
}
},
"aggregations" : {
"by_parent_id" : {
"terms" : {
"field" : "post.parentId",
"order" : {
"max_score" : "desc"
}
},
"aggregations" : {
"max_score" : {
"max" : {}
},
"top_post" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
I want to sort buckets by max_score rather than by doc_count which is the default behaviour of elastic search.
I am trying to aggregate posts (which contains body and parentBody)
by parentId and then sorting buckets by max_score and in each bucket
I am getting top_hits. But I am getting the above error when I sorted
the buckets by defining max score aggregation. Rest everything works if I remove max_score aggregation. Every post object has parentId, body and parentBody. I have used the following references for coding this:
Elasticsearch Aggregation: How to Sort Bucket Order
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example
Tell me what am I doing wrong? I have shared the query above.

Elasticsearch: [filtered] query does not support [highlight]

I am new to Elasticsearch. I have a filtered query as follows
{
"query": {
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"highlight" : {
"fields" : {
"title" : {}
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
When I tried this query and got the error:
[filtered] query does not support [highlight]
Does filtered query support highlight? If not, how can I achieve highlight in query with filters? I have to use filters.
Thanks and regards!
The "highlight" parameter should go at the same level as the "query" parameter, not embedded within it. In your case it should look something like this:
{
"query": {
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
},
"highlight" : {
"fields" : {
"title" : {}
}
}
}
Highlighting reference
Highlights problems with a filtered query

filtering on geo_distance in elasticsearch

I'm trying to set a maximum distance from my center location in my elasticsearch query, there's no problems with the sorting part:
{
"query" : {
"match_all" : {}
},
"sort" : [
{
"_geo_distance" : {
"location" : "56,14",
"order" : "asc",
"unit" : "km"
}
}
]
}
however when I try adding a filter I get the "[geo_distance] filter does not support [location]":
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200m",
"location" : {
"location" : "56,14"
}
}
}
}
},
"sort" : [
{
"_geo_distance" : {
"location" : "56,14",
"order" : "asc",
"unit" : "km"
}
}
]
}
any ideas of what I'm doing wrong?
Use this filter instead
"filter" : {
"geo_distance" : {
"distance" : "200m",
"location" : "56,14"
}
}
Location can be any name for the field like if you field name is loc or locator,then the query would be
"filter" : {
"geo_distance" : {
"distance" : "200m",
"loc"/"locator" : "56,14"
}
}

How to exclude a filter from a facet?

I have come from a Solr background and am trying to find the equivalent of "tagging" and "excluding" in Elasticsearch.
In the following example, how can I exclude the price filter from the calculation of the prices facet? In other words, the prices facet should take into account all of the filters except for price.
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
},
{
"range" : {
"price" : {
"from" : "10",
"to" : "20"
}
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
}
},
"features" : {
"terms" : {
"field" : "feature"
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
}
}
You can apply price filter as a top level filter to your query and add it to all facets expect prices as a facet_filter:
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"features" : {
"terms" : {
"field" : "feature"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
},
"filter": {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
}
Btw, important change since ES 1.0.0. Top-level filter was renamed to post_filter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_search_requests.html#_search_requests). And filtered queries using is still preferred as described here: http://elasticsearch-users.115913.n3.nabble.com/Filters-vs-Queries-td3219558.html
And there is global option for facets to avoid filtering by query filter (elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html#_scope).

Elastic search has_child query

We have a parent-child (one to many) relation in elastic search, and we want to check for all parent objects where it's child object attribute(child_attr) has any value in it.
we are generating json-queries as below:
1) For Has value condition.
{
"has_child" : {
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"and" : {
"filters" : [ {
"exists" : {
"field" : "child_attr"
}
}, {
"not" : {
"filter" : {
"term" : {
"child_attr" : ""
}
}
}
} ]
}
}
}
},
"type" : "child"
}
}
2) For Has No Value Condition
{
"has_child" : {
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"or" : {
"filters" : [ {
"missing" : {
"field" : "child_attr"
}
}, {
"term" : {
"child_attr" : ""
}
} ]
}
}
}
},
"type" : "child"
}
}
These queries are returning only those parent objects where either all child objects have some value or all child objects have no value the searched attribute.
It doesn't return anything where this condition is met partially which covers majority of data.
I have also toyed with keyword analyzer to index this child_attribute but no joy.
Look forward to your expert suggestions please.
You are getting unexpected results because the query
"missing" : {
"field" : "child_attr"
}
matches both records that were indexed with empty string in the child_attr and records in which child_attr was missing.
The query
"exists" : {
"field" : "child_attr"
}
is exact oposite of the first query, it matches all records that were indexed with a non-empty child_attr field.
The query
"term" : {
"child_attr" : ""
}
doesn't match anything.

Resources