elasticsearch aggregation fields with text type mapping - elasticsearch

I am trying to aggregate on a field that has type text.
Mapping setting:
"Group":{"type":"text"}
And query:
{
"query": {
"term": {
"request_id": 22
}
},
"size": 0,
"aggs": {
"sets": {
"terms": {"field": "Group.keyword"}
}
}
}
This gives empty results:
"hits": {
"total": 7463,
"max_score": 0,
"hits": []
},
"aggregations": {
"sets": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
Without .keyword gives illegal_argument_exception.. reason: ... alternatively use a keyword field instead..
Also, values in Group field are Grp1 and Grp2 only.
How can I aggregate sets based on these two values?

Update mapping to:
"Group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
After making the above change in mapping, re-index the documents and then you can use Group.keyword
If you would never want full text search on the values of Group field then you should keep it type as keyword.
"Group":{"type":"keyword"}
In this case you can aggregate on Group field itself.

Related

Limit to max records to be searched in Elastic Search Group by query

We have a strange issue where data for one of our customers has a lot of records based on certain field x. When the user triggers a query for the group by for that x field, the Elastic Search cluster is going for a toss and restarting with OOM.
Is there a way to limit max records that elastic search should look for while aggregating the result for a certain field so that cluster can be saved from going OOM ?
PS: The group by can go on multiple fields such as x,y,x, and w, and the user is searching for the last 30-day data only.
Use Sampler Aggregation with terms aggregation if you wish to restrict the number of documents that should be taken into account for an aggregation (let's say terms aggregation) (in this case)
Index Data:
{
"role": "example",
"number": 1
}
{
"role": "example1",
"number": 2
}
{
"role": "example2",
"number": 3
}
Search Query:
{
"size": 0,
"aggs": {
"sample": {
"sampler": {
"shard_size": 2 // Max documents you need to have for the aggregation
},
"aggs": {
"unique_roles": {
"terms": {
"field": "role.keyword"
}
}
}
}
}
}
Search Result:
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sample": {
"doc_count": 2, // Note this
"unique_roles": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "example",
"doc_count": 1
},
{
"key": "example1",
"doc_count": 1
}
]
}
}
}

Elastic Search Unique Field Values

I am trying to get groups of only unique values in Elastic Search for the searches. I can't figure out why this doesn't behave.
I have gone through many StackOverflow questions, and read the Documentation for most of the day. Nothing seems to work for me, below I provided what I tried doing last.
Is there any reason someone would want to have the same results repeatedly returned? Maybe for differing versions of a Document?
In this example I would like a listing of all mfr_id's, and their mfr_desc as well. I am running this over a type to search document field values only. It seems that Agg Terms is the way to accomplish this, does anyone see anything I am doing wrong?
1: API Call
GET /inventory/item/_search
{
"size": 0,
"_source": ["mfr_id", "mfr_desc"],
"aggs": {
"unique_vals": {
"terms": {
"field": "mfr_id.keyword"
/** I have to use .keyword, seems like my mappings isn't working */
}
}
}
}
2: Mapping File
The Mapping I run after doing a Bulk import is quite simple. I read to not analyze the keys if you want a unique query:
{
"index": "inventory",
"body": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_default_": {
"properties": {
"mfr_id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
3: My Results
Aggregation has ~10 records when there are about 100. I would really like to be able to get the _source fields of more than just a key if this is possible.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 49341,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_vals": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 6815,
"buckets": [
{
"key": "14",
"doc_count": 24292
},
{
"key": "33",
"doc_count": 5508
},
...
I would really like to be able to get the _source fields of more than
just a key if this is possible.
I think , you have only one option , I have faced same problem . Try this :-
{
"aggregations": {
"byId": {
"terms": {
"field": "mfr_id"
},
"aggs": {
"byDesc": {
"terms": {
"field": "mfr_desc"
}
}
}
}
}
}
Now you will get both id and desc while iterating through Elastic search JAVA API .
Terms aTerms = aAggregations.get("byId");
aTerms.getBuckets().stream().forEach(aBucketById-> {
Terms aTermsDesc = aBucketById.getAggregations().get("byDesc");
aTermsDesc.getBuckets().stream().forEach(aBucketByDesc -> {
//store id and desc
});
});
I would use a filter , it has better performance than an aggregation.
in aggregation you get all of the documents and only than you apply the aggregation . if you using a filter you get only the documents witch match the filter , and also filters can be cached.
{
"query": {
"constant_score": {
"filter": {
"exists": {
"field": "mfr_id"
}
}
}
}
}

Can I get a field if I disabled the _source and _all in Elasticsearch

Elasticsearch suggested to dissable _source and _all field in my case, this my mapping
{
"template": "mq-body-*",
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"max_result_window": 100,
"codec": "best_compression"
},
"mappings": {
"_default_": {
"_source": {
"enabled": false
},
"_all": {
"enabled": false
}
},
"body": {
"properties": {
"body": {
"type": "string",
"doc_values": true,
"index": "not_analyzed"
}
}
}
}
}
The body.body is a very large field(20k-300k), we don't have to index and rare get,this is lost-able. But after
PUT /mq-body-local/body/1
{"body":"My body"}
I can't find the body by GET /mq-body-local/body/1?fields=body or POST /mq-body-local/body/_search -d'{"fields":["body"]}',the result is found one but no document.I know there is no _source I can not do get or search, but how can I retrive my document ?
From Elasticsearch's website:
The _source field contains the original JSON document body that was
passed at index time. The _source field itself is not indexed (and
thus is not searchable), but it is stored so that it can be returned
when executing fetch requests, like get or search
Disabling the source will prevent Elasticsearch from displaying it in the resultset. However, filtering, querying and aggregations will not be affected.
So these two queries will not generate any results in terms of the actual body:
GET mq-body-local/body/_search
GET mq-body-local/body/1
However, you could run this aggregation that will include some of the source, for example:
POST mq-body-local/body/_search
{
"aggs": {
"test": {
"terms": {
"field": "body"
}
}
}
}
Will produce this result set (I've created some test records):
"aggregations": {
"test": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "my body",
"doc_count": 1
},
{
"key": "my body2",
"doc_count": 1
}
]
}
}

ElasticSearch 2.1.0 - Deep 'children' aggregation with 'sum' metric returning empty results

I have a hierarchy of document types two levels deep. The documents are related by parent-child relationships as follows: category > sub_category > item i.e. each sub_category has a _parent field referring to a category id, and each item has a _parent field referring to a sub_category id.
Each item has a price field. Given a query for categories, which includes conditions for sub-categories and items, I want to calculate a total price for each sub_category.
My query looks something like this:
{
"query": {
"has_child": {
"child_type": "sub_category",
"query": {
"has_child": {
"child_type": "item",
"query": {
"range": {
"price": {
"gte": 100,
"lte": 150
}
}
}
}
}
}
}
}
My aggregation to calculate the price for each sub-category looks like this:
{
"aggs": {
"categories": {
"terms": {
"field": "id"
},
"aggs": {
"sub_categories": {
"children": {
"type": "sub_category"
},
"aggs": {
"sub_category_ids": {
"terms": {
"field": "id"
},
"aggs": {
"items": {
"children": {
"type": "item"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
}
}
Despite the query response listing matching results, the aggregation response doesn't match any items:
{
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1",
"doc_count": 1,
"sub_categories": {
"doc_count": 3,
"sub_category_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "subcat1",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat2",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat3",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
}
]
}
}
}]
}
}
}
However, omitting the sub_category_ids aggregation does cause the items to appear and for prices to be summed at the level of the categories aggregation. I would expect including the sub_category_ids aggregation to simply change the level at which the prices are summed.
Am I misunderstanding how the aggregation is evaluated, and if so how could I modify it to display the summed prices for each sub-category?
I opened an issue #15413, regarding children aggregation as I and other folks were facing similar issues in ES 2.0
Apparently the problem according to ES developer #martijnvg was that
The children agg makes an assumption (that all segments are being seen by children agg) that was true in 1.x but not in 2.x
PR #15457 fixed this issue, again from #martijnvg
Before we only evaluated segments that yielded matches in parent aggs, which caused us to miss to evaluate child docs in segments we didn't have parent matches for.
The fix for this is stop remember in what segments we have matches for
and simply evaluate all segments. This makes the code simpler and we
can still quickly see if a segment doesn't hold child docs like we did
before
This pull request has been merged and it has also been back ported to the 2.x, 2.1 and 2.0 branches.

ElasticSearch multifield not working

In my documents I have a field collaboration on which I would like to do aggregation queries. However I also want it to be full-text searchable, so I figured out I should make it a multifield. The field may look something like this:
...
"collaboration" : "CMS"
or
"collaboration" : ["ATLAS", "CMS"]
or
"collaboration" : "LHCb"
...
Following this advice: ElasticSearch term aggregation I changed the mapping to:
"collaboration": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
And I run a query:
POST /my_index/_search
{
"aggs": {
"collaboration": {
"terms": {
"field": "collaboration.raw"
}
}
}
}
And get nothing:
"hits": {
"total": 5,
"max_score": 1,
"hits": [...]
},
"aggregations": {
"collaboration": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
Even when I try to use this field for searching it doesn't work:
POST /my_index/_search
{
"query": {
"query_string": {
"query": "CMS",
"fields": ["collaboration.raw"]
}
}
}
Should I change the mapping somehow because of the fact that the field is sometimes a list and sometimes a string? My research found that arrays are supposed to be supported out of the box. Any suggestions what might be wrong here?
Solved it thanks to Andrei Stefan!
All the documents needed to be reindexed after changing the mapping, small yet painful detail.

Resources