Here is what i have in one of my columns
so, all of these values add up to 1.272. Now i tried to create a metric visualization for it but i get
why is it 0? The field is of type number in the index.
Update
So i tried to run this in sense
post indexName/_search
{
"size": 0,
"aggs": {
"sum block": {
"sum": {
"field": "blockSize"
}
}
}
}
}
}
and i get
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 12,
"max_score": 0,
"hits": []
},
"aggregations": {
"sum block": {
"value": 0
}
}
}
why is this happening? Should it not add up the float values? also, in the index mapping
"blockSize": {
"type": "long"
}
shouldn't this be float or double? and if it is long, then why does it store a decimal point with the values?
Probably that the first document that was indexed had blockSize: 0 and thus the long type was chosen by ES to map that field. Now, float values are stored but 0 is indexed (since it's a long).
You need to wipe your index, correct the mapping and re-index your data.
Related
I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:
{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}
Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query
{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}
Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.
Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.
Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
I have a collection of posts with their tags imported into Elasticsearch. The indexes are:
language - type: keyword
tags (array) - type: keyword
created_at - type: date
Single document looks like that:
{ "language": "en", "tags": ["foo", "bar"], created_at: "..." }
I'm trying to get the significant terms query on my data set using:
GET _search
{
"aggregations": {
"significant_tags": {
"significant_terms": {
"field": "tags"
}
}
}
}
But the results bucket are always empty:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"aggregations": {
"significant_tags": {
"doc_count": 2945,
"bg_count": 2945,
"buckets": []
}
}
}
I can confirm the data is properly imported as i'm able to any other aggregation on this dataset and it works fine. Just the significant terms don't want to cooperate. Any ideas on what am i possibly doing wrong in here?
Elasticsearch 6.2.4
Significant terms needs a foreground query or aggregation for it to calculate difference in term frequencies and produce statistically significant results. So you will need to add a initial query then your aggregation. See the doc for details https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html
Running this search in kibana:
GET myindex/mytype/_search
{"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "myfield",
"min_doc_count": 2
}
}
}
}
Gives:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 12,
"successful": 12,
"failed": 0
},
"hits": {
"total": 46117,
"max_score": 0,
"hits": []
},
"aggregations": {
"duplicateCount": {
"doc_count_error_upper_bound": 12,
"sum_other_doc_count": 45817,
"buckets": []
}
}
}
I'm not sure how to interpret this result. Per the term aggregation documentation sum_other_doc_count means:
when there are lots of unique terms, Elasticsearch only returns the top terms; this number is the sum of the document counts for all buckets that are not part of the response
Since there are no buckets in the response, it seems odd that there are apparently buckets that were not included. Does sum_other_doc_count include the buckets that were excluded by min_doc_count, and therefore the result can be interpreted as meaning there were no documents with duplicates for myfield?
If the latter is the case, assuming there were buckets returned, is is possible to get a some_other_doc_count that does not include the min_doc_count excluded buckets, or a total count of the buckets?
Update:
It seems that I can get some of the information I want via a cardinality aggregation. Total records - field cardinality provides the approximate number of documents with a duplicate field.
I have two fields in the search document such as salary_from and salary_to and want the aggregation of salary ranges such as 0 - 10 , 10 - 20, etc.
Is there any ways to set multiple fields to the Elastic Range Aggregation. (I can set one field by using setField function)
I just need to get the aggregated count of salary ranges or slabs by considering the two fields salary_from and salary_to.
Please help me.
If I understand your question correctly, below is what you need.
{
"size": 0,
"aggs": {
"salary_ranges": {
"terms": {
"script": "doc['salary_from'].value + ' to ' + doc['salary_to'].value",
"size": 0
}
}
}
}
It basically uses a script for Terms Aggregation. Read more about it here.
If say, you have 3 documents with salary_from set to 3 and salary_to set to 5 and then you have 2 documents with salary_from set to 10 and salary_to set to 25, the result of the query above will look something like below:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"salary_ranges": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3 to 5",
"doc_count": 3
},
{
"key": "10 to 25",
"doc_count": 2
}
]
}
}
}
I'm using Logstash to insert a location attribute into my logs that are going into ElasticSearch:
"location" : {
"lat":31.07,
"lon":-82.09
}
I'm then setting up a mapping to tell ElasticSearch it's a Geo-Point. I'm not exactly sure how this call should look. This is what I've been using so far:
PUT logstash-*/_mapping/latlon
{
"latlon" : {
"properties" : {
"location" : {
"type" : "geo_point",
"lat_lon" : true,
"geohash" : true
}
}
}
}
When I query for matching records in Kibana 4, the location field is annotated with a small globe. So far, so good.
When I move to the Tile Map visualization, bring up matching records, bucket by Geo Coordinates, select Geohash from the 'Aggregation' drop down, and then select location from the Field drop down, and press 'Apply', no results are returned.
The aggreations part of the request looks like this:
"aggs": {
"2": {
"geohash_grid": {
"field": "location",
"precision": 3
}
}
}
And the response:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 689,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"buckets": []
}
}
}
For some reason, ElasticSearch isn't returning results, even though it seems like the Geo-Point mapping is recognized. Any tips for how to troubleshoot from here?