I'm trying to build a nested aggregation in elasticsearch but it keeps giving errors. It says "cannot find agg type tags". How can I fix it. Thank you for your helps.Btw I don't have nested documents I have one document having 180 fields. Can I apply this aggregation? Here is my code:
{
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs" : {
"red_products": {
"filter": {
"not": {
"terms": {
"text": [
"06melihgokcek",
"t.co","??","????","???"
]
}
}
},
"aggs": {
"top_docs": {
"terms": {
"field": "text",
"size": 50
}
},
"aggs" : {
"tags" : {
"terms" : {
"field" : "text",
"include" : ".*avni.*",
"exclude" : "fuat_.*"
}
}
}
}
}
}
}}}
Your innermost aggs (the one called tags at the bottom) is misplaced and should be a child element of top_docs.
{
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs": {
"red_products": {
"filter": {
"not": {
"terms": {
"text": [
"06melihgokcek",
"t.co",
"??",
"????",
"???"
]
}
}
},
"aggs": {
"top_docs": {
"terms": {
"field": "text",
"size": 50
},
"aggs": { <---- this was the misplaced aggs
"tags": {
"terms": {
"field": "text",
"include": ".*avni.*",
"exclude": "fuat_.*"
}
}
}
}
}
}
}
}
}
}
Related
I am developing a query where I count how many unique "cp" the most recent document contains.
The json is made up of several nested fields.
I am having trouble showing only the json value with the most recent date when I add to a json with nested fields.
I have done nested aggregations, and finally I have used top_hits filter to sort in descending order, and it returns me the last one through the size.
But still it is returning all the documents with different dates.
JSON:
"data" : [
{
"addresses" : [
{
"cp" : "33.33.33",
"services" : [
{
"field1" : "true",
"field2" : "1234",
}
]
}
],
}
],
"created_at" : "2020-09-03 14:39:01",
"#timestamp" : "2020-09-04T05:53:22.341661Z",
}
},
QUERY:
{"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
},
"range":{
"top_hits": {
"size": 1,
"sort": [
{"created_at.keyword": {"order": "desc"}}]
}
}
}
}
}
}
}
}
}
I have tried sorting by the predefined field "created_at" or with #timestamp, but the result is the same.
Any advice that can help me to solve my problem?
For this case the solution is to add
"order": {
"_key": "desc":
instead of top_hits.
QUERY
{"size": 0,
"aggs": {
"filtrofecha": {
"terms": {
"field": "created_at.keyword",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
}
}
}
}
}
}
}
}
}
}
I have an object with nested field.
"parameters": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"values": {
"type": "keyword"
}
}
}
I am trying aggregate operation:
GET places/place/_search?size=0
{
"query": {
"match_all": {}
},
"aggs": {
"parameters": {
"nested": {
"path": "parameters"
},
"aggs": {
"parameters_cnt_i": {
"terms": {
"field": "parameters.id",
"size": 100
},
"aggs": {
"parameters_cnt_v": {
"terms": {
"field": "parameters.values",
"size": 100
}
}
}
}
}
}
}
}
but it is not good, because i set a "size" too large.
in docs says
If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the Composite aggregation
but i cant understand how to use a Composite aggregation with nested object.. its real?
my solution
{
"size": 0,
"aggs" : {
"parameters" : {
"nested" : {
"path" : "parameters"
},
"aggs": {
"group":{
"composite" : {
"size": 100, // your size
"sources" : [
{ "id": { "terms" : { "field": "parameters.id"} }}
]
}
}
}
}
}
}
Try dropping your 3rd "aggs", like this:
{
"aggs": {
"parameters": {
"nested": {
"path": "parameters"
},
"aggs": {
"count_item_one": {
"terms" : {
"field": "parameters.item_one",
"size": 100
}
},
"count_item_two": {
"terms" : {
"field": "parameters.item_two",
"size": 100
}
}
}
}
}
}
If you're 2nd item is nested again, you may have to set up your nested params again as you did with your 1st "aggs".
I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.
I'm trying to do a query to generate a plot. My data index looks like this:
"mappings": {
"mydata": {
"properties": {
"type": { "type": "string", "index": "not_analyzed" },
"stamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"data": { "type": "object" }
}
}
Depending on the type, the data field will contain different objects, e.g.,
temperature_data = {
"type": "temperature",
"stamp": "2015-11-01T15:25:19.123",
"data": {"temperature": 23.4, "variance": 0.0}
}
humidity_data = {
"type": "humidity",
"stamp": "2015-11-01T15:26:21.063",
"data": {"humidity": 75.1, "variance": 0.0}
}
I'm trying to aggregate the data on buckets depending on their type, and then perform a date histogram to get the stats of each reading (temperature, humidity). My problem is how to set the field on the stats aggs since it changes with the type (for "type": "temperature" the field is data.temperature for example):
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data."+field???
}
}
}
}
}
}
}
}
* UPDATE *
As suggested I added a data-type.groovy file to config/scripts/, the file contains the following:
return doc['data.temperature'].value
Elasticsearch is able to compile the script:
[2015-11-02 19:50:32,651][INFO ][script] [Atum] compiling script file [/home/user/elasticsearch-1.7.0/config/scripts/data-type.groovy]
I updated the query to load the script file:
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": {"file": "data-type"}
}
}
}
}
}
}
}
}
When I run the query I get the following output:
{u'status': 400, u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], ... Parse Failure [Unexpected token START_OBJECT in [point_stats].]]; }]'}
There's only temperature data in the database, if I change "script": {"file": "data-type"} for "field": "data.temperature" the query works.
One option is to rename the humidity and temperature fields to something identical, like value, so you can simply aggregate on that field and you're good. You'd already know what kind of value it is since you know it from the type field.
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data.value"
}
}
}
}
}
}
}
The second option is to use a script but that'd be less performant and less scalable if you were to add more type of data (pressure, etc)
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": "return doc.type.value == 'temperature' ? doc['data.temperature'].value : doc['data.humidity'].value"
}
}
}
}
}
}
}
Note that for this second option you need to enable dynamic scripting
I have a catalog of products that I want to calculate aggregates on. The trouble comes with trying to do nested aggregations with filter that has both nested and parent fields in it. Either it gives wrong counts or 0 hits. Here is a sample of my product object mapping:
"Products": {
"properties": {
"ProductID": {
"type": "long"
},
"ProductType": {
"type": "long"
},
"ProductName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"Prices": {
"type": "nested",
"properties": {
"CurrencyType": {
"type": "integer"
},
"Cost": {
"type": "double"
}
}
}
}
}
Here is an example of the sql query that I am trying to replicate in elastic:
SELECT PRODPR.Cost AS PRODPR_Cost
,COUNT(PROD.ProdcutID) AS PROD_ProductID_Count
FROM Products PROD WITH (NOLOCK)
LEFT OUTER JOIN Prices PRODPR WITH (NOLOCK) ON (PRODPR.objectid = PROD.objectid)
WHERE PRODPR.CurrencyType = 4
AND PROD.ProductType IN (
11273
,11293
,11294
)
GROUP BY PRODPR.Cost
Elastic Search queries I came up with:
First One (following query returns correct counts with just CurrencyType as filter but when I add ProductType filter, it gives me wrong counts)
GET /IndexName/Products/_search
{
"aggs": {
"price_agg": {
"filter": {
"bool": {
**"must": [
{
"nested": {
"path": "Prices",
"filter": {
"term": {
"Prices.CurrencyType": "8"
}
}
}
},
{
"terms": {
"ProductType": [
"11273",
"11293",
"11294"
]
}
}
]**
}
},
"aggs": {
"price_nested_agg": {
"nested": {
"path": "Prices"
},
"aggs": {
"59316518_group_agg": {
"terms": {
"field": "Prices.Cost",
"size": 0
},
"aggs": {
"product_count": {
"reverse_nested": { },
"aggs": {
"ProductID_count_agg": {
"value_count": {
"field": "ProductID"
}
}
}
}
}
}
}
}
}
}
},
"size": 0
}
Second One (following query returns correct counts with just CurrencyType as filter but when I add ProductType filter, it gives me 0 hits):
GET /IndexName/Prodcuts/_search
{
"aggs": {
"price_agg": {
"nested": {
"path": "Prices"
},
"aggs": {
"currency_filter": {
"filter": {
"bool": {
"must": [
{
"term": {
"Prices.CurrrencyType": "4"
}
},
{
"terms": {
"ProductType": [
"11273",
"11293"
]
}
}
]
}
},
"aggs": {
"59316518_group_agg": {
"terms": {
"field": "Prices.Cost",
"size": 0
},
"aggs": {
"product_count": {
"reverse_nested": {},
"aggs": {
"ProductID_count_agg": {
"value_count": {
"field": "ProductID"
}
}
}
}
}
}
}
}
}
}
},
"size": 0
}
I have tried some more queries but the above two are the closest I came up with. Has anyone come across this use case? What am I doing wrong? Any help is appreciated. Thanks!