ElasticSearch Advanced Aggregations - elasticsearch

I currently have documents indexed with the following structure:
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string"
},
"Type": {
"type": "string"
},
}
}
}
}
I want to aggregate on results when searching over this type. I initially just wanted the terms from the Source field, which was easy. I just used a terms aggregations for the Source field.
Now I would like to aggregate the Type field as well. However, the types are related to the sources. For example, I could have two Sources like this:
{
"Source": "The Store",
"Type": "Purchase"
}
and
{
"Source": "The Store",
"Type": "Return"
}
I want to show the different types and their counts for each different source. In other words, I would want my response to be something like this:
{
"aggs": {
"Sources": [
{
"Key": "The Store",
"DocCount": 2,
"Aggregations": {
"Types": [
{
"Key": "Purchase",
"DocCount": 1
},
{
"Key": "Return",
"DocCount": 1
}
]
}
}
]
}
}
Is there a way to get these sub-aggregations?

Yes, there is but you need to slightly change your mapping to make your fields `not_analyzed``
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string",
"index": "not_analyzed"
},
"Type": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
Then you can use the following aggregation in order to get what you want:
{
"aggs": {
"sources": {
"terms": {
"field": "Sources.Source"
},
"aggs": {
"types": {
"terms": {
"field": "Sources.Type"
}
}
}
}
}
}

Related

How to do aggregation on nested objects - Elasticsearch

I'm pretty new to Elasticsearch so please bear with me.
This is part of my document in ES.
{
"source": {
"detail": {
"attribute": {
"Size": ["32 Gb",4],
"Type": ["Tools",4],
"Brand": ["Sandisk",4],
"Color": ["Black",4],
"Model": ["Sdcz36-032g-b35",4],
"Manufacturer": ["Sandisk",4]
}
},
"title": {
"list": [
"Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
]
}
}
}
So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation.
So i did a query like this
{
"size": 0,
"aggs": {
"categoryList": {
"filter": {
"bool": {
"filter": [
{
"term": {
"title.list": "sandisk"
}
}
]
}
},
"aggs": {
"results": {
"terms": {
"field": "detail.attribute",
"size": 3
}
}
}
}
}
}
But it seems to be not working. How do I fix this? Any hints would be much appreciated.
This is the _mappings. It is not the complete one, but I guess this would suffice.
{
"catalog2_0": {
"mappings": {
"product": {
"dynamic": "strict",
"dynamic_templates": [
{
"attributes": {
"path_match": "detail.attribute.*",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"detail": {
"properties": {
"attMaxScore": {
"type": "scaled_float",
"scaling_factor": 100
},
"attribute": {
"dynamic": "true",
"properties": {
"Brand": {
"type": "text"
},
"Color": {
"type": "text"
},
"MPN": {
"type": "text"
},
"Manufacturer": {
"type": "text"
},
"Model": {
"type": "text"
},
"Operating System": {
"type": "text"
},
"Size": {
"type": "text"
},
"Type": {
"type": "text"
}
}
},
"description": {
"type": "text"
},
"feature": {
"type": "text"
},
"tag": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"title": {
"properties": {
"en": {
"type": "text"
}
}
}
}
}
}
}
}
According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.
Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.
Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype
Here is what I have used for a nested aggs query, minus the actual value names.
The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:
"STATUS_ID": {
"type": "keyword",
"index": "not_analyzed",
"doc_values": true
},
Query
GET index name/_search?size=200
{
"aggs": {
"panels": {
"nested": {
"path": "nested path"
},
"aggs": {
"statusCodes": {
"terms": {
"field": "nested path.STATUS.STATUS_ID",
"size": 50
}
}
}
}
}
}
Result
"aggregations": {
"status": {
"doc_count": 12108963,
"statusCodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O",
"doc_count": 5912218
},
{
"key": "C",
"doc_count": 401586
},
{
"key": "E",
"doc_count": 135628
},
{
"key": "Y",
"doc_count": 3742
},
{
"key": "N",
"doc_count": 1012
},
{
"key": "L",
"doc_count": 719
},
{
"key": "R",
"doc_count": 243
},
{
"key": "H",
"doc_count": 86
}
]
}
}

ElasticSearch query doesn't return documents that have an "empty" nested property

I'm running into a weird problem. I have a document mapping for which one of the properties is a nested object.
{
"userLog": {
"properties": {
"userInfo": {
"userId": {
"type": "text"
},
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"email": {
"type": "text"
}
},
"violations": {
"type": "integer"
},
"malfunctions": {
"type": "integer"
},
"extensionsUsed": {
"type": "integer"
},
"date": {
"type": "date",
"format": "yyyy-MM-dd||yyyy/MM/dd||yyyyMMdd||epoch_millis"
},
"events": {
"type": "nested",
"properties": {
"editorId": {
"type": "text"
},
"editorRole": {
"type": "text"
},
"editedTimestamp": {
"type": "date",
"format": "epoch_millis"
},
"createdTimestamp": {
"type": "date",
"format": "epoch_millis"
},
"userId": {
"type": "text"
},
"timestamp": {
"type": "date",
"format": "epoch_millis"
},
"eventType": {
"type": "text"
}
}
}
}
}
}
Some userLogs have events and some don't. My queries only return userLogs that have events, however, and I'm not sure why. There are definitely userLogs that exist without events in the index. I can see them in Kibana. They just aren't returned in the search. Here's what I'm running for a query:
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "20170913",
"format": "yyyyMMdd"
}
}
}
],
"should": [
{
"match_phrase": {
"userInfo.userId": "Xvo9qblajOVaM3bQQMaV4GKk7S42"
}
}
],
"minimum_number_should_match": 1
}
}
}
based on this discussion
I modified my query to be the following:
GET one20_eld_portal/_search
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "events",
"query": {
"bool": {
"filter": {
"exists": {
"field": "events.userId"
}
}
}
}
}
}
],
"should": [
{
"match_phrase": {
"userInfo.uid": "Xvo9qblajOVaM3bQQMaV4GKk7S42"
}
}
],
"minimum_should_match": 1
}
}
}
but this doesn't return any results. Any help is greatly appreciated!
Turns out the reason the "empty" logs weren't being returned is because the userId wasn't being set properly for empty logs.

Range value search in elastic search, not returning results

I have a problem.
I have tried to make plenty of queries and none have returned any documents.
My data format is something like:
{
"_index": "orders",
"_type": "order",
"_id": "AVad66hjiOD-asNwVILB",
"_score": 1,
"_source": {
"document": {
"orderID": "1337",
"sku": "awesomeSku",
"customerID": "7331",
"productID": "20490859",
"variantID": "97920239",
"createTime": "2016-07-13T13:23:19Z",
"retailPrice": "10000",
"costPrice": "10000",
"new": 123
}
}
}
My query:
{
"query": {
"bool": {
"filter": [
{ "range": { "new": { "gte": "20" } } }
]
}
}
}
I just want to start somewhere simply and find all documents which has the attribute "new" with a value above 20.
Any feedback would be awesome.
Edit:
Data formart in ES:
{
"orders": {
"mappings": {
"order": {
"properties": {
"document": {
"properties": {
"costPrice": {
"type": "string"
},
"createTime": {
"type": "string"
},
"customerID": {
"type": "string"
},
"new": {
"type": "long"
},
"orderID": {
"type": "string"
},
"productID": {
"type": "string"
},
"retailPrice": {
"type": "string"
},
"sku": {
"type": "string"
},
"variantID": {
"type": "string"
}
}
}
}
}
}
}
}
You need to make your query like this on the document.new field since all your fields are nested into the top-level document section:
{
"query": {
"bool": {
"filter": [
{
"range": {
"document.new": {
"gte": 20
}
}
}
]
}
}
}

Unable to drop result bucket in terms aggregation - Elasticsearch

I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.
After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)

function_score query in elasticsearch won't change score

I have an index with following doc structure: Company > Jobs (nested)
Company have name and jobs have address. I search jobs by address by default. Along with this, I'm trying to boost certain companies by their name using function_score query. But my query doesn't seem to be boosting anything or change scores.
{
"query": {
"filtered": {
"filter": {},
"query": {
"function_score": {
"query": {
"nested": {
"path": "active_jobs",
"score_mode": "max",
"query": {
"multi_match": {
"query": "United States",
"type": "cross_fields",
"fields": [
"active_jobs.address.city",
"active_jobs.address.country",
"active_jobs.address.state"
]
}
},
"inner_hits": {
"size": 1000
}
}
},
"functions": [
{
"filter": {
"term": {
"name": "Amazon"
}
},
"weight": 100
}
]
}
}
}
},
"size": 30,
"from": 0
}
[Update 1]
Here is the mapping for active_jobs property:
"active_jobs": {
"type": "nested",
"properties": {
"active": {
"type": "boolean"
},
"address": {
"properties": {
"city": {
"type": "string"
},
"country": {
"type": "string"
},
"state": {
"type": "string"
},
"state_code": {
"type": "string"
}
}
},
"id": {
"type": "long"
},
"title": {
"type": "string"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}

Resources