Elastic search nested aggregations - method from documentation is not working - elasticsearch

I'm new to ES and am struggling with nested aggregations.
Here is my dummy data object ([Here is my data object][1][1]: https://i.stack.imgur.com/X7oaM.png). I'm just trying to get the minimum cost out of the "modern" field.
I have read the following posts in regards to the problem I'm trying to solve. None of them have helped me solve the problem
- Elastic Search 6 Nested Query Aggregations
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
- https://madewithlove.be/elasticsearch-aggregations/
- https://iridakos.com/tutorials/2018/10/22/elasticsearch-bucket-aggregations.html
- https://github.com/elastic/elasticsearch/issues/9317
Moreover, I've searched all over stackoverflow and have had not had success (yes, I've tried just about every solution I've come across with no success).
According to the docs and the above posts and more, a nested aggregation should be ran as follows:
GET /loquesea/_search
{
"size": 0,
"aggs": {
"modern_costs": {
"nested": {
"path": "modern"
},
"aggs": {
"min_cost": {
"min": {
"field": "modern.cost1"
}
}
}
}
}
}
However, upon completion, this is what I get:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"modern_costs" : {
"doc_count" : 0
}
}
}
I've spent hours trying to make just a basic nested aggregation work. What am I doing wrong?

problem is solved. Turns out that since I didn't declare in the mapping the type of "cars" as "nested", the nested aggregation will not work. Reason being is that in Elastic Search, since the type was not declared as "nested", ES will treat "cars" as an object.

Related

Elasticsearch Aggregation most common list of integers

I am looking for elastic search aggregation + mapping
that will return the most common list for a certain field.
For example for docs:
{"ToneCurvePV2012": [1,2,3]}
{"ToneCurvePV2012": [1,5,6]}
{"ToneCurvePV2012": [1,7,8]}
{"ToneCurvePV2012": [1,2,3]}
I wish for the aggregation result:
[1,2,3] (since it appears twice).
so far any aggregation that i made would return: 1
This is not possible with default terms aggregation. You need to use terms aggregation with script. Please note that this might impact your cluster performance.
Here, i have used script which will create string from array and used it for aggregation. so if you have array value like [1,2,3] then it will create string representation of it like '[1,2,3]' and that key will be used for aggregation.
Below is sample query you can use to generate aggregation as you expected:
POST index1/_search
{
"size": 0,
"aggs": {
"tone_s": {
"terms": {
"script": {
"source": "def value='['; for(int i=0;i<doc['ToneCurvePV2012'].length;i++){value= value + doc['ToneCurvePV2012'][i] + ',';} value+= ']'; value = value.replace(',]', ']'); return value;"
}
}
}
}
}
Output:
{
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"tone_s" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "[1,2,3]",
"doc_count" : 2
},
{
"key" : "[1,5,6]",
"doc_count" : 1
},
{
"key" : "[1,7,8]",
"doc_count" : 1
}
]
}
}
}
PS: key will be come as string and not as array in aggregation response.

How to make Elastic Engine understand a field is not to be analyzed for an exact match?

The question is based on the previous post where the Exact Search did not work either based on Match or MatchPhrasePrefix.
Then I found a similar kind of post here where the search field is set to be not_analyzed in the mapping definition (by #Russ Cam).
But I am using
package id="Elasticsearch.Net" version="7.6.0" targetFramework="net461"
package id="NEST" version="7.6.0" targetFramework="net461"
and might be for that reason the solution did not work.
Because If I pass "SOME", it matches with "SOME" and "SOME OTHER LOAN" which should not be the case (in my earlier post for "product value").
How can I do the same using NEST 7.6.0?
Well I'm not aware of how your current mapping looks. Also I don't know about NEST as well but I will explain
How to make Elastic Engine understand a field is not to be analyzed for an exact match?
by an example using elastic dsl.
For exact match (case sensitive) all you need to do is to define the field type as keyword. For a field of type keyword the data is indexed as it is without applying any analyzer and hence it is perfect for exact matching.
PUT test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
}
}
}
}
Now lets index some docs
POST test/_doc/1
{
"field1":"SOME"
}
POST test/_doc/2
{
"field1": "SOME OTHER LOAN"
}
For exact matching we can use term query. Lets search for "SOME" and we should get document 1.
GET test/_search
{
"query": {
"term": {
"field1": "SOME"
}
}
}
O/P that we get:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"field1" : "SOME"
}
}
]
}
}
So the crux is make the field type as keyword and use term query.

How to get search hits results when executing aggregation?

As stated in the ElasticSearch documentation:
In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.
I want to execute searches returning hits when i have queries out for the aggregation. But i am not sure how can i achieve the above?
I am using the following query:
curl -XPOST 'localhost:9200/employee/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_domain": {
"terms": {
"field": "domain"
}
}
}
}'
and here is the result which i am getting,
{
"took" : 92,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_domain" : {
"doc_count_error_upper_bound" : 5,
"sum_other_doc_count" : 744,
"buckets" : [ {
"key" : "finance",
"doc_count" : 30
}]
}
}
}
As we can see that the hits array is empty. I am not sure how to get those hits array. Any suggestion?
the hits are empty because you have set the size of the returning query to 0 when you specify:
"size": 0,
you can remove size completely and in this case you'll get 10 hits that is the default or you can set the size you want, for instance if you specify 100 you'll get 100 hits in response. This is related to the search results.
Now, if you also want to get results in the aggregation you can use Top Hits Aggregation for that.

Show all Elasticsearch aggregation results/buckets and not just 10

I'm trying to list all buckets on an aggregation, but it seems to be showing only the first 10.
My search:
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 0,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw"
}
}
}
}'
Returns:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 16920,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"bairro_count" : {
"buckets" : [ {
"key" : "Barra da Tijuca",
"doc_count" : 5812
}, {
"key" : "Centro",
"doc_count" : 1757
}, {
"key" : "Recreio dos Bandeirantes",
"doc_count" : 1027
}, {
"key" : "Ipanema",
"doc_count" : 927
}, {
"key" : "Copacabana",
"doc_count" : 842
}, {
"key" : "Leblon",
"doc_count" : 833
}, {
"key" : "Botafogo",
"doc_count" : 594
}, {
"key" : "Campo Grande",
"doc_count" : 456
}, {
"key" : "Tijuca",
"doc_count" : 361
}, {
"key" : "Flamengo",
"doc_count" : 328
} ]
}
}
}
I have much more than 10 keys for this aggregation. In this example I'd have 145 keys, and I want the count for each of them. Is there some pagination on buckets? Can I get all of them?
I'm using Elasticsearch 1.1.0
The size param should be a param for the terms query example:
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 0,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw",
"size": 10000
}
}
}
}'
Use size: 0 for ES version 2 and prior.
Setting size:0 is deprecated in 2.x onwards, due to memory issues inflicted on your cluster with high-cardinality field values. You can read more about it in the github issue here .
It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.
How to show all buckets?
{
"size": 0,
"aggs": {
"aggregation_name": {
"terms": {
"field": "your_field",
"size": 10000
}
}
}
}
Note
"size":10000 Get at most 10000 buckets. Default is 10.
"size":0 In result, "hits" contains 10 documents by default. We don't need them.
By default, the buckets are ordered by the doc_count in decreasing order.
Why do I get Fielddata is disabled on text fields by default error?
Because fielddata is disabled on text fields by default. If you have not wxplicitly chosen a field type mapping, it has the default dynamic mappings for string fields.
So, instead of writing "field": "your_field" you need to have "field": "your_field.keyword".
If you want to get all unique values without setting a magic number (size: 10000), then use COMPOSITE AGGREGATION (ES 6.5+).
From official documentation:
"If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the COMPOSITE AGGREGATION which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination."
Implementation example in JavaScript:
const ITEMS_PER_PAGE = 1000;
const body = {
"size": 0, // Returning only aggregation results: https://www.elastic.co/guide/en/elasticsearch/reference/current/returning-only-agg-results.html
"aggs" : {
"langs": {
"composite" : {
"size": ITEMS_PER_PAGE,
"sources" : [
{ "language": { "terms" : { "field": "language" } } }
]
}
}
}
};
const uniqueLanguages = [];
while (true) {
const result = await es.search(body);
const currentUniqueLangs = result.aggregations.langs.buckets.map(bucket => bucket.key);
uniqueLanguages.push(...currentUniqueLangs);
const after = result.aggregations.langs.after_key;
if (after) {
// continue paginating unique items
body.aggs.langs.composite.after = after;
} else {
break;
}
}
console.log(uniqueLanguages);
Increase the size(2nd size) to 10000 in your term aggregations and you will get the bucket of size 10000. By default it is set to 10.
Also if you want to see the search results just make the 1st size to 1, you can see 1 document, since ES does support both searching and aggregation.
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 1,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw",
"size": 10000
}
}
}
}'

has_child queries fail if queryName is set

I am using elasticsearch v0.90.5 and trying to set QueryName (_name) for a has_child query as described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-named-queries-and-filters.html
But the has_child queries are failing if queryName is set. They also fail if queryName is set on a wrapping query. You can find the curl recreation here: https://gist.github.com/hmrizin/9645816
"query": {
"has_child": {
"query": {
"term": {
"postid": "p1"
}
},
"child_type": "post",
"_name": "somename"
}
}
and
"query": {
"bool": {
"should": {
"has_child": {
"query": {
"term": {
"postid": "p2"
}
},
"child_type": "post"
}
},
"_name": "somename"
}
}
The error that both cause is the same:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"failed" : 1,
"failures" : [ {
"index" : "twtest",
"shard" : 1,
"status" : 500,
"reason" : "ElasticSearchIllegalStateException[has_child filter hasn't executed properly]"
} ]
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ ]
}
}
What am I missing?
Having just run this with v1.0.1, it worked as-is without any changes.
However, I downloaded v0.90.5 and it got the error that you posted unless I removed the _name. It looks like you came across a bug, and in the very next release, v0.90.6, the entire class to blame (HasChildFilter) was refactored into other code.
v0.90.5 package: org.elasticsearch.index.search.child
v0.90.6 package: org.elasticsearch.index.search.child
It looks like you came across a bug that also existed in v0.90.4 (I verified), which is when the feature was added. Unfortunately, I think the solution to your problem is to upgrade your installation of Elasticsearch to at least v0.90.6. Based on its patch notes, they may not even have been aware of the error that they fixed, but I was able to verify that it was fixed in that release.

Resources