So I have this mapping:
"employee": {
"properties": {
"DaysOff": {
"type": "nested",
"properties": {
"Date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"Days": {
"type": "double"
},
"ID": {
"type": "long"
}
}
}
}
}
So basically a employee can have days off. Each day off they have is stored in an array under the property DaysOff. Days can be a fraction of a day, so if an employee took half a day off then it would be 0.5.
So I have this search:
{
"size": 45,
"filter": {
"nested": {
"path": "DaysOff",
"filter": {
"range": {
"DaysOff.Date": {
"from": "now-2M",
"to": "now"
}
}
}
}
}
}
which brings me back 45 documents. which is correct. I'm just can't figure out how to now apply an aggregation to these documents in order to get back the sum of all the days that have been taken.
Using this resource I tried this aggs but didn't get me the correct result:
{
"size": 45,
"filter": {
"nested": {
"path": "DaysOff",
"filter": {
"range": {
"DaysOff.Date": {
"from": "now-2M",
"to": "now"
}
}
}
}
},
"aggs": {
"sum_docs": {
"nested": {
"path": "DaysOff"
},
"aggs": {
"stepped_down": {
"sum": {
"field": "DaysOff.Days"
}
}
}
}
}
}
You need to filter on those nested documents to get the correct results, From the docs
Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query,
I created index like this
POST employee
{
"mappings": {
"emp_map": {
"properties": {
"DaysOff": {
"type": "nested",
"properties": {
"Date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"Days": {
"type": "double"
},
"ID": {
"type": "long"
}
}
},
"name": {
"type": "string"
}
}
}
}
},
Then I indexed few documents like this,
PUT employee/emp_map/1
{
"name" : "messi",
"DaysOff" : [
{
"Date" : "2015-11-01",
"Days" : 1,
"ID" : 11
},
{
"Date" : "2014-11-01",
"Days" : 2,
"ID" : 11
},
{
"Date" : "2015-12-01",
"Days" : 0.5,
"ID" : 11
}
]
}
PUT employee/emp_map/2
{
"name" : "ronaldo",
"DaysOff" : [
{
"Date" : "2015-10-01",
"Days" : 3,
"ID" : 12
},
{
"Date" : "2014-11-01",
"Days" : 2,
"ID" : 12
},
{
"Date" : "2015-12-01",
"Days" : 0.5,
"ID" : 12
}
]
}
PUT employee/emp_map/3
{
"name" : "suarez",
"DaysOff" : [
{
"Date" : "2015-11-01",
"Days" : 4,
"ID" : 13
},
{
"Date" : "2015-11-09",
"Days" : 2,
"ID" : 13
},
{
"Date" : "2015-12-01",
"Days" : 1.5,
"ID" : 13
}
]
}
This is my query, notice the filter aggregation in nested aggregation, without that ES will give you sum of all the days taken off.
GET employee/_search
{
"query": {
"bool": {
"filter": {
"nested": {
"path": "DaysOff",
"query": {
"range": {
"DaysOff.Date": {
"from": "now-2M",
"to": "now"
}
}
}
}
}
}
},
"aggs": {
"emp_name": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"nesting": {
"nested": {
"path": "DaysOff"
},
"aggs": {
"filter_date": {
"filter": {
"range": {
"DaysOff.Date": {
"from": "now-2M",
"to": "now"
}
}
},
"aggs": {
"sum_taken_off_days": {
"sum": {
"field": "DaysOff.Days"
}
}
}
}
}
}
}
}
},
"size": 0
}
This is the result I get,
"aggregations": {
"emp_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "messi",
"doc_count": 1,
"nesting": {
"doc_count": 3,
"filter_date": {
"doc_count": 2,
"sum_taken_off_days": {
"value": 1.5
}
}
}
},
{
"key": "ronaldo",
"doc_count": 1,
"nesting": {
"doc_count": 3,
"filter_date": {
"doc_count": 1,
"sum_taken_off_days": {
"value": 0.5
}
}
}
},
{
"key": "suarez",
"doc_count": 1,
"nesting": {
"doc_count": 3,
"filter_date": {
"doc_count": 3,
"sum_taken_off_days": {
"value": 7.5
}
}
}
}
]
}
}
P.S : This is per employee, you can remove emp_name terms aggregation to get sum of all employees.
Related
I need to ordenate the results of an composite aggregation, but the value to be orderned is the sum of a specific field (my index is so much larger, so i need the composite for paginate values).
When send this GET:
GET /_search
{
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "date": { "date_histogram": { "field": "timestamp", "interval": "1d"} } },
{ "product": { "terms": {"field": "product" } } }
]
},
"aggregations": {
"the_sum": {
"sum": { "field": "price" } <--- i want order by this field aggregation
}
}
}
}
}
How can i get this response? (order by sum of each price)
{
...
"aggregations": {
"my_buckets": {
"after_key": {
"date": 1494374400000,
"product": "mad max"
},
"buckets": [
{
"key": {
"date": 1494460800000,
"product": "apocalypse now"
},
"doc_count": 1,
"the_sum": {
"value": 10.0
}
},
{
"key": {
"date": 1494288000000,
"product" : "mad max"
},
"doc_count": 2,
"the_sum": {
"value": 22.5
}
},
{
"key": {
"date": 1494374400000,
"product": "mad max"
},
"doc_count": 1,
"the_sum": {
"value": 290.0
}
}
]
}
}
}
Lets assume I have data structured like this:
{ "id": "120400871755634330808993320",
"name": "Metaalschroef binnenzeskant, DIN 912 RVS A4-80",
"description": "m16x70 cilinderschroef bzk a4-80 din912 klasse 80",
"fullDescription": "Metaalschroef met een binnenzeskant cilinderkop",
"synonyms": [],
"properties": [
{
"name": "draad",
"value": "16",
"sort": 99
},
{
"name": "lengte",
"value": "70",
"sort": 99
},
{
"name": "materiaal",
"value": "roestvaststaal",
"sort": 99
},
{
"name": "kwaliteit (materiaal)",
"value": "A4",
"sort": 99
},
{
"name": "DIN",
"value": "912",
"sort": 99
},
{
"name": "AISI",
"value": "316",
"sort": 99
},
{
"name": "draadsoort",
"value": "metrisch",
"sort": 99
},
{
"name": "Merk",
"value": "Elcee Holland",
"sort": 1
}
]
}
How do I write a boolean query where I select all documents that have a property with name "draad" and value "16" and a property with name "lengte" and value "70".
Right now I have this but it returns 0 results:
"query" : {
"nested" : {
"path" : "properties",
"query" : {
"bool" : {
"must" : [{
"bool" : {
"must" : [{
"term" : {
"properties.name" : "Merk"
}
}, {
"term" : {
"properties.value" : "Facom"
}
}
]
}
}, {
"bool" : {
"must" : [{
"term" : {
"properties.name" : "materiaal"
}
}, {
"term" : {
"properties.value" : "kunststof"
}
}
]
}
}
]
}
}
}
}
Replacing the highest level "must" with "should" returns too many results, which makes sense as it translates to an "or".
When using must, the engine is trying to search for nested documents with name:Merk and value:Facom. But also with name:materiaal and value:kunststof - which is impossible to happen in the same nested document at once.
When using should as you mentioned, it translate to or - which is indeed possible.
Problem is, you also getting the entire parent document with all it's nested documents.
In my own answer I'm showing the steps to create an index with nested documents (you should mark the field properties as nested type`).
After complete those steps, you'll be able to get results with the following query:
{
"_source": [
"id",
"name",
"description"
],
"query": {
"bool": {
"must": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"properties.name": "Merk"
}
},
{
"term": {
"properties.value": "Facom"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"properties.name": "materiaal"
}
},
{
"term": {
"properties.value": "kunststof"
}
}
]
}
}
]
}
},
"inner_hits":{
"size": 10
}
}
}
]
}
}
}
I found a solution that is working very well!
My property object now looks like this:
{
"name": "breedte(mm)",
"value": "1000",
"unit": "mm",
"sort": 99,
"nameSlug": "breedte-mm",
"slug": "breedte-mm-1000"
},
I added a slug (containing a normalized string for key + value) and a nameslug which is a normalized string for the name.
My index is mapped like this:
"properties": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "keyword"
},
"nameSlug": {
"type": "keyword"
},
"slug": {
"type": "keyword"
},
"sort": {
"type": "long"
},
"unit": {
"type": "text",
"index": false
},
"value": {
"type": "keyword"
}
}
}
The "include_in_parent" is important here. It allows me to do the query below:
"query": {
"bool": {
"must": [
{
"terms": {
"properties.slug": [
"merk-orbis",
"merk-bahco"
]
}
},
{
"terms": {
"properties.slug": [
"materiaal-staal",
"materiaal-kunststof"
]
}
}
]
}
},
This queries searches for all documents where "merk" is "Orbis" or "Bahco" and where "materiaal" is "staal" or "kunststof".
My aggregations look like this:
"merk_query": {
"filter": {
"bool": {
"must": [
{
"terms": {
"properties.slug": [
"materiaal-staal",
"materiaal-kunststof"
]
}
}
]
}
},
"aggs": {
"merk_facets": {
"nested": {
"path": "properties"
},
"aggs": {
"merk_only": {
"filter": {
"term": {
"properties.nameSlug": {
"value": "merk"
}
}
},
"aggs": {
"facets": {
"terms": {
"field": "properties.name",
"size": 1
},
"aggs": {
"facetvalues": {
"terms": {
"field": "properties.value",
"size": 10
}
}
}
}
}
}
}
}
}
},
I run filteraggregate which filters all documents that match a facet (but not the current one I am bulding).
The result of this aggragate is something like this:
"merk_query": {
"doc_count": 7686,
"merk_facets": {
"doc_count": 68658,
"merk_only": {
"doc_count": 7659,
"facets": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Merk",
"doc_count": 7659,
"facetvalues": {
"doc_count_error_upper_bound": 10,
"sum_other_doc_count": 438,
"buckets": [
{
"key": "Orbis",
"doc_count": 6295
},
{
"key": "DX",
"doc_count": 344
},
{
"key": "AXA",
"doc_count": 176
},
{
"key": "Talen Tools",
"doc_count": 127
},
{
"key": "Nemef",
"doc_count": 73
},
{
"key": "bonfix",
"doc_count": 67
},
{
"key": "Bahco",
"doc_count": 64
},
{
"key": "Henderson",
"doc_count": 27
},
{
"key": "Maasland Groep",
"doc_count": 25
},
{
"key": "SYSTEC",
"doc_count": 23
}
]
}
}
]
}
}
}
}
},
And this is the end result in the browser:
City and home type are two nested objects in the following document mapping:
"mappings" : {
"home_index_doc" : {
"properties" : {
"city" : {
"type" : "nested",
"properties" : {
"country" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "text"
}
}
},
"name" : {
"type" : "keyword"
}
}
},
"home_type" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "keyword"
}
}
},
...
}
}
}
I am trying to do the following aggregation:
Take all present documents and show all home_types per city.
I imagine it should look similar to:
"aggregations": {
"all_cities": {
"buckets": [
{
"key": "Tokyo",
"doc_count": 12,
"home_types": {
"buckets": [
{
"key": "apartment",
"doc_count": 5
},
{
"key": "house",
"doc_count": 12
}
]
}
},
{
"key": "New York",
"doc_count": 1,
"home_types": {
"buckets": [
{
"key": "house",
"doc_count": 1
}
]
}
}
]
}
}
After trying gazzilion aproaches and combinations, I've made it that far with Kibana:
GET home-index/home_index_doc/_search
{
"size": 0,
"aggs": {
"all_cities": {
"nested": {
"path": "city"
},
"aggs": {
"city_name": {
"terms": {
"field": "city.name"
}
}
}
},
"aggs": {
"all_home_types": {
"nested": {
"path": "home_type"
},
"aggs": {
"home_type_name": {
"terms": {
"field": "home_type.name"
}
}
}
}
}
}
}
and I get the following exception:
"type": "unknown_named_object_exception",
"reason": "Unknown BaseAggregationBuilder [all_home_types]",
You need to use reverse_nested in order to jump out of the city nested type back at the root level and do another nested aggregation for the home_type nested type. Basically, like this:
{
"size": 0,
"aggs": {
"all_cities": {
"nested": {
"path": "city"
},
"aggs": {
"city_name": {
"terms": {
"field": "city.name"
},
"aggs": {
"by_home_types": {
"reverse_nested": {},
"aggs": {
"all_home_types": {
"nested": {
"path": "home_type"
},
"aggs": {
"home_type_name": {
"terms": {
"field": "home_type.name"
}
}
}
}
}
}
}
}
}
}
}
}
I have ElasticSearch index with records like:
{
"project" : "A",
"updated" : <date>,
"cost" : 123
},
{
"project" : "A",
"updated" : <date>,
"cost" : 1
},
{
"project" : "B",
"updated" : <date>,
"cost" : 3
},
{
"project" : "B",
"updated" : <date>,
"cost" : 4
},
{
"project" : "C",
"updated" : <date>,
"cost" : 5
}
I'm trying to draw "cost" chart by selected projects.
Can anyone help me to build a query to get a sum of cost, grouped data by a project?
F.e. I want to select data for projects "A" & "B" and get something like:
date1 ->
projectA -> sum(cost)
projectB -> sum(cost)
date2 ->
projectA -> sum(cost)
projectB -> sum(cost)
Have no idea how to modify this query, which extracts data for the one project:
"query": {
"bool": {
"must": [
{
"match": {
"project": {
"query": <project>,
"type": "phrase"
}
}
},
{
"range": {
"updated": {
"gte": <startDate>,
"format": "epoch_millis"
}
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
Upd: Thanks guys! With your help I wrote the query:
{
"query": {
"bool": {
"must": [
{
"range": {
"End_Time": {
"gte": 1485892800000,
"format": "epoch_millis"
}
}
}
],
"should": [
{
"match": {
"Project_Name": {
"query": "A",
"type": "phrase"
}
}
},
{
"match": {
"Project_Name": {
"query": "B",
"type": "phrase"
}
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "Project_ID"
},
"aggs": {
"2": {
"sum": {
"field": "Cost"
}
}
}
}
}
}
}
}
But it returns something strange:
"aggregations": {
"3": {
"buckets": [
{
"key_as_string": "2017-02-01T00:00:00.000-06:00",
"key": 1485928800000,
"doc_count": 17095,
"project_agg": {
"doc_count_error_upper_bound": 36,
"sum_other_doc_count": 3503,
"buckets": [
{
"2": {
"value": 2536.8616891294323
},
"key": 834879987748,
"doc_count": 2243
},
{
"2": {
"value": 3438.766646153458
},
"key": 497952557271,
"doc_count": 1785
},
{
"2": {
"value": 13066.367076588496
},
"key": 1057394416300,
"doc_count": 1736
},
...
Here are 10 buckets for each month. I expect to see only 2 values for each project. What's wrong?
The query you wrote gives you total cost(irrespective of project) per month, you need to have another aggregation to group by project in between aggregation 3 and aggregation 2.
If you only want for projects A and B, then use filter in aggregation.
"size": 0,
"aggs": {
"project": {
"filter": {
"bool": {
"must": [
{
"terms": {
"project": [
"a",
"b"
]
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "project"
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
}
You need to aggregate on project before you aggregate cost:
{
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"terms": {
"field": "project"
},
"aggs": {
"1": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
For the filtering it depends on how you want to do the search. For a list of projects you could use:
"query": {
"bool": {
"must": [
{ "terms": { "project": [ "a", "b" ] } } //Assuming field is mapped as "analyzed"
]
}
}
In case your mapping contains a .keyword variety, you would format the terms filter like this:
{ "terms": { "project.keyword": [ "A", "B" ] } } //Assuming field is mapped as "not_analyzed" or has a keyword field.
Here is an example of how a field is mapped in ES 5.5 as "text" with a "keword" field:
"ShortTextContent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
In this case I can access the analyzed version using "ShortTextContent" and the not_analyzed version using "ShortTextContent.keyword"
I want to aggregate access by function path.
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"path.keyword": "/hex/*"
}
}
]
}
},
"from": 0,
"size": 0,
"aggs": {
"path": {
"terms": {
"field": "path.keyword"
}
}
}
}
And i get the result like these..
{
"key": "/hex/user/admin_user/auth",
"doc_count": 38
},
{
"key": "/hex/report/chart/fastreport_lobby_all?start_date=2017-06-29&end_date=2017-07-05&category=date_range&value[]=payoff",
"doc_count": 35
},
{
"key": "/hex/report/chart/fastreport_lobby_all?start_date=2017-06-29&end_date=2017-07-05&category=lobby&value[]=payoff",
"doc_count": 35
},
{
"key": "/hex/report/chart/online_membership?start_date=2017-06-29&end_date=2017-07-05&category=datetime_range&value[]=user_total",
"doc_count": 34
}
There are two /hex/report/chart/fastreport_lobby_all?balabala... result.
It's not the real count about this function.
Do i have any method to count these as one?
{
"key": "/hex/report/chart/fastreport_lobby_all",
"doc_count": 70
}
I don't think this is possible without a custom analyzer like
PUT your_index
{
"settings": {
"analysis": {
"analyzer": {
"query_analyzer": {
"type": "custom",
"tokenizer": "split_query",
"filter": ["top1"
]
}
},
"filter":{
"top1":{
"type": "limit",
"max_token_count": 1
}
},
"tokenizer":{
"split_query":{
"type": "pattern",
"pattern": "\\?"
}
}
}
},
"mappings": {
"your_log_type": {
"properties": {
"path": {
"type": "text",
"fields": {
"keyword": {
"type":"keyword"
},
"no_query": {
"type":"string",
"fielddata":true,
"analyzer":"query_analyzer"
}
}
}
}
}
}
}
And then query on
POST test/log_type/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"path.keyword": "/hex/*"
}
}
]
}
},
"from": 0,
"size": 0,
"aggs" : {
"genres" : {
"terms" : { "field" : "path.no_query" }
}
}
}