Getting empty buckets array in elasticsearch aggregation - elasticsearch

I'm using version 5.4.1 of ElasticSearch.
When I try to perform a groupBy aggregate/bucket aggregate, I'm not getting any values in the bucket array.
This is my index:
curl -X PUT localhost:9200/urldata -d '{
"mappings" : {
"components" : {
"properties" : {
"name" : {
"type" : "keyword",
"index" : "not_analyzed"
},
"status" : {
"type" : "keyword",
"index" : "not_analyzed"
},
"timestamp":{
"type":"date",
"index":"not_analyzed"
}
}
}
}
}'
And this the aggregate query:
curl -XGET 'localhost:9200/urldata/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"components": {
"terms": {
"field": "name.keyword"
}
}
}
}
'
Output:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":0.0,
"hits":[
]
},
"aggregations":{
"components":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
]
}
}
}
Where am I going wrong??

Try this, it should do it:
{
"size": 0,
"aggs": {
"components": {
"terms": {
"field": "name"
}
}
}
}
EDIT:
Here is all the steps to replicate your use case:
PUT test
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}
PUT test/_mapping/people_name
{
"properties":{
"name":{
"type":"keyword",
"index":"not_analyzed"
},
"status":{
"type":"keyword",
"index":"not_analyzed"
},
"timestamp":{
"type":"date",
"index":"not_analyzed"
}
}
}
POST test/people_name
{
"name": "A",
"status": "success",
"created_at": "2017-08-17"
}
POST test/people_name
{
"name": "A",
"status": "success_2",
"created_at": "2017-06-15"
}
POST test/people_name
{
"name": "B",
"status": "success",
"created_at": "2017-09-15"
}
GET test/people_name/_search
{
"size": 0,
"aggs": {
"components": {
"terms": {
"field": "name"
}
}
}
}
The result of the aggregation is:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"components": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 2
},
{
"key": "B",
"doc_count": 1
}
]
}
}
}

Related

How to apply custom analyser?

Just discovered an issue with our Elastic Search. It is not returning anything for '&' in field name. Did some googling and I think I need a custom analyser. Never worked with ES before, assumption is I'm missing something basic here.
This is what I have got and it is not working as expected.
PUT custom_analyser
{
"settings": {
"analysis": {
"analyzer": {
"suggest_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase", "my_synonym_filter" ]
}
},
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"&, and",
"foo, bar" ]
}
}
}
}
}
And trying to use it like:
GET custom_analyser/_search
{
"aggs": {
"section": {
"terms": {
"field": "section",
"size": 10,
"shard_size": 500,
"include": "jill & jerry" //Not returning anything back for this field using default analyser
}
}
}
}
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"section": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
Mappings
"_doc":{
"dynamic":"false",
"date_detection":false,
"properties":{
"section":{
"type":"keyword"
}
}
}
GET custom_analyser:
{
"custom_analyser": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "custom_analyser",
"creation_date": "1565971369814",
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"&, and",
"foo, bar"
]
}
},
"analyzer": {
"suggest_analyzer": {
"filter": [
"lowercase",
"my_synonym_filter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "oVMOU5wPQ--vKhE3dDFG2Q",
"version": {
"created": "6030199"
}
}
}
}
}
I think there is a slight confusion here: An analyzer won't help you, because you are (correctly) using a keyword field for the aggregation, but those are not analyzed. You could only use a normalizer on those fields.
For your specific problem: The include (and exclude) are regular expressions — you'll need to escape the & to make this work as expected.
Full example
Mapping and sample data:
PUT test
{
"mappings": {
"properties": {
"section": {
"type": "keyword"
}
}
}
}
PUT test/_doc/1
{
"section": "jill & jerry"
}
PUT test/_doc/2
{
"section": "jill jerry"
}
PUT test/_doc/3
{
"section": "jill"
}
PUT test/_doc/4
{
"section": "jill & jerry"
}
Query — you need a double backslash for the escape to work here (and I'm also excluding the actual documents with "size": 0 to keep the response shorter):
GET test/_search
{
"size": 0,
"aggs": {
"section": {
"terms": {
"field": "section",
"include": "jill \\& jerry"
}
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"section" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "jill & jerry",
"doc_count" : 2
}
]
}
}
}

elasticsearch aggregation result is 0

The following is my query for elasticsearch:
GET index/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"id_1": "xx"
}
},
{
"term": {
"level": "level2"
}
},
{
"or": [
{
"term": {
"type": "yyy"
}
},
{
"term": {
"type": "zzzz"
}
}
]
}
]
}
}
},
"aggs": {
"variable": {
"stats": {
"field": "score"
}
}
}
}
But the agg result is as follows:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 68,
"max_score": 0,
"hits": []
},
"aggregations": {
"variable": {
"count": 30,
"min": 0,
"max": 0,
"avg": 0,
"sum": 0
}
}
}
Why the min,max etc are 0. But value is there for score like(0.18,0.25,etc..). Also in mapping the type for score is long. Please help me to solve this. Thanks in advance.
Edit:
value in index:
"score": 0.18
Single document:
{
"_index": "index",
"_type": "ppppp",
"_id": "n0IiTEd2QFCnJUZOSiNu1w",
"_score": 1,
"_source": {
"name_2": "aaa",
"keyid": "bbbb",
"qqq": "cccc",
"level": "level2",
"type": "kkk",
"keytype": "Year",
"org_id": 25,
"tempid": "113",
"id_2": "561",
"name_1": "xxxxx",
"date_obj": [
{
"keyid": "wwwww",
"keytype": "Year",
"value": 21.510617952000004,
"date": "2015",
"id": "ggggggg",
"productid": ""
},
{
"keyid": "rrrrrr",
"keytype": "Year",
"value": 0.13,
"date": "2015",
"id": "iiiiii",
"productid": ""
}
],
"date": "2015",
"ddddd": 21.510617952000004,
"id_1": "29",
"leveltype": "nnnn",
"tttt": 0.13,
"score": 0.13 ------------------->problem
}
}
Mapping:
curl -XPUT ip:9200/index -d '{
"mappings" : {
"places" : {
"properties" : {
"score" : { "type" : "float"}
}
}
}
}'
The fix should be as simple as changing the type of the score field to float (or double) instead of long. long is an integer type and 0.18 will be indexed as 0 under the hood.
"score" : {
"type" : "float",
"null_value" : 0.0
}
Note that you'll need to reindex your data after making the mapping change.

Not able to aggregate on nested fields in elasticsearch

I have set a field to nested and now i am not able to aggregate on it.
Sample document -
{
"attributes" : [
{ "name" : "snake" , "type" : "reptile" },
{ "name" : "cow" , "type" : "mamal" }
]
}
attributes field is nested.
Following terms query is not working on this
{
"aggs" : {
"terms" : { "field" : "attributes.name" }
}
}
How can I do the aggregation in elasticsearch?
Use a nested aggregation.
As a simple example, I created an index with a nested property matching what you posted:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
}
Then added your document:
PUT /test_index/doc/1
{
"attributes": [
{ "name": "snake", "type": "reptile" },
{ "name": "cow", "type": "mammal" }
]
}
Now I can get "attribute.name" terms as follows:
POST /test_index/_search?search_type=count
{
"aggs": {
"nested_attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"name_terms": {
"terms": {
"field": "attributes.name"
}
}
}
}
}
}
...
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_attributes": {
"doc_count": 2,
"name_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cow",
"doc_count": 1
},
{
"key": "snake",
"doc_count": 1
}
]
}
}
}
}
Here's the code I used:
http://sense.qbox.io/gist/0e3ed9c700f240e523be08a27551707d4448a9df

Elastic search aggregations buckets counting email format as two different bucket key .

I have field stored as "user1#user.com " .
Using aggregations json query :
"aggregations": {
"email-terms": {
"terms": {
"field": "l_obj.email",
"size": 0,
"shard_size": 0,
"order": {
"_count": "desc"
}
}
}
}
I am getting response :
"buckets" : [
{
"key" : "user.com",
"doc_count" : 1
},
{
"key" : "user1",
"doc_count" : 1
}
instead of
"buckets" : [
{
"key" : "user1#user.com",
"doc_count" : 1
}
]
Same issue persists for string type likes : user1.user2.user.com ,I am doing terms aggregations .
Am i missing something here ?
You need to set "index": "not_analyzed" on the "email" field in your mapping.
If I set up a toy index without specifying an analyzer (or to not use one), the standard analyzer will be used, which will split on whitespace and symbols like "#". So, with this index definition:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string"
}
}
}
}
}
if I add a single doc:
PUT /test_index/doc/1
{
"email": "user1#user.com"
}
and then ask for a terms aggregation, I get back two terms:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user.com",
"doc_count": 1
},
{
"key": "user1",
"doc_count": 1
}
]
}
}
}
But if I rebuild the index with "index": "not_analyzed" in that field, and again index the same document:
DELETE /test_index
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test_index/doc/1
{
"email": "user1#user.com"
}
and run the same terms aggregation, I only get back a single term for that email address:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user1#user.com",
"doc_count": 1
}
]
}
}
}
Here is the code I used, altogether:
http://sense.qbox.io/gist/a73a28bf7450b637138b02a371fb15cabf344ab6
We can use index template to predefined field types ,http://www.elastic.co/guide/en/elasticsearch/reference/1.3/indices-templates.html
,ex :
Use rest client or elastic search sense
PUT/POST http://escluster:port/_template
{
"testtemplate": {
"aliases": {},
"mappings": {
"test1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": true
},
"properties": {
"email": {
"fielddata": {
"format": "doc_values"
},
"index": "not_analyzed",
"type": "string"
}...

Elasticsearch: generating terms from array using script

Would love an explanation of why this happens and how to correct it.
Here's a snippet of the source document:
{
"created_time":1412988495000,
"tags":{
"items":[
{
"tag_type":"Placement",
"tag_id":"id1"
},
{
"tag_type":"Product",
"tag_id":"id2"
}
]
}
}
The following terms aggregation:
"aggs":{
"tags":{
"terms":{
"script":"doc['tags'].value != null ? doc['tags.items.tag_type'].value + ':' + doc['tags.items.tag_id'].value : ''",
"size":2000,
"exclude":{
"pattern":"null:null"
}
}
}
}
returns:
"buckets":[
{
"key":"Placement:id1",
"doc_count":1
},
{
"key":"Placement:id2",
"doc_count":1
}
]
...when you would expect:
"buckets":[
{
"key":"Placement:id1",
"doc_count":1
},
{
"key":"Product:id2",
"doc_count":1
}
]
I would probably go with a nested type. I don't know all the details of your setup, but here is a proof of concept, at least. I took out the "items" property because I didn't need that many layers, and just used "tags" as the nested type. It could be added back in if needed, I think.
So I set up an index with a "nested" property:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"created_time": {
"type": "date"
},
"tags": {
"type": "nested",
"properties": {
"tag_type": {
"type": "string",
"index": "not_analyzed"
},
"tag_id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Then added a couple of docs (notice that the structure differs slightly from yours):
PUT /test_index/doc/1
{
"created_time": 1412988495000,
"tags": [
{
"tag_type": "Placement",
"tag_id": "id1"
},
{
"tag_type": "Product",
"tag_id": "id2"
}
]
}
PUT /test_index/doc/2
{
"created_time": 1412988475000,
"tags": [
{
"tag_type": "Type3",
"tag_id": "id3"
},
{
"tag_type": "Type4",
"tag_id": "id3"
}
]
}
Now a scripted terms aggregation inside a nested aggregation seems to do the trick:
POST /test_index/_search?search_type=count
{
"query": {
"match_all": {}
},
"aggs": {
"tags": {
"nested": { "path": "tags" },
"aggs":{
"tag_vals": {
"terms": {
"script": "doc['tag_type'].value+':'+doc['tag_id'].value"
}
}
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"tags": {
"doc_count": 4,
"tag_vals": {
"buckets": [
{
"key": "Placement:id1",
"doc_count": 1
},
{
"key": "Product:id2",
"doc_count": 1
},
{
"key": "Type3:id3",
"doc_count": 1
},
{
"key": "Type4:id3",
"doc_count": 1
}
]
}
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/4ceaf8693f85ff257c2fd0639ba62295f2e5e8c5

Resources