How to do aggregation on nested objects - Elasticsearch - elasticsearch

I'm pretty new to Elasticsearch so please bear with me.
This is part of my document in ES.
{
"source": {
"detail": {
"attribute": {
"Size": ["32 Gb",4],
"Type": ["Tools",4],
"Brand": ["Sandisk",4],
"Color": ["Black",4],
"Model": ["Sdcz36-032g-b35",4],
"Manufacturer": ["Sandisk",4]
}
},
"title": {
"list": [
"Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
]
}
}
}
So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation.
So i did a query like this
{
"size": 0,
"aggs": {
"categoryList": {
"filter": {
"bool": {
"filter": [
{
"term": {
"title.list": "sandisk"
}
}
]
}
},
"aggs": {
"results": {
"terms": {
"field": "detail.attribute",
"size": 3
}
}
}
}
}
}
But it seems to be not working. How do I fix this? Any hints would be much appreciated.
This is the _mappings. It is not the complete one, but I guess this would suffice.
{
"catalog2_0": {
"mappings": {
"product": {
"dynamic": "strict",
"dynamic_templates": [
{
"attributes": {
"path_match": "detail.attribute.*",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"detail": {
"properties": {
"attMaxScore": {
"type": "scaled_float",
"scaling_factor": 100
},
"attribute": {
"dynamic": "true",
"properties": {
"Brand": {
"type": "text"
},
"Color": {
"type": "text"
},
"MPN": {
"type": "text"
},
"Manufacturer": {
"type": "text"
},
"Model": {
"type": "text"
},
"Operating System": {
"type": "text"
},
"Size": {
"type": "text"
},
"Type": {
"type": "text"
}
}
},
"description": {
"type": "text"
},
"feature": {
"type": "text"
},
"tag": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"title": {
"properties": {
"en": {
"type": "text"
}
}
}
}
}
}
}
}

According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.
Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.
Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype

Here is what I have used for a nested aggs query, minus the actual value names.
The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:
"STATUS_ID": {
"type": "keyword",
"index": "not_analyzed",
"doc_values": true
},
Query
GET index name/_search?size=200
{
"aggs": {
"panels": {
"nested": {
"path": "nested path"
},
"aggs": {
"statusCodes": {
"terms": {
"field": "nested path.STATUS.STATUS_ID",
"size": 50
}
}
}
}
}
}
Result
"aggregations": {
"status": {
"doc_count": 12108963,
"statusCodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O",
"doc_count": 5912218
},
{
"key": "C",
"doc_count": 401586
},
{
"key": "E",
"doc_count": 135628
},
{
"key": "Y",
"doc_count": 3742
},
{
"key": "N",
"doc_count": 1012
},
{
"key": "L",
"doc_count": 719
},
{
"key": "R",
"doc_count": 243
},
{
"key": "H",
"doc_count": 86
}
]
}
}

Related

Elasticsearch update a new mapping on index with default values

I am updating by my index with new properties in Elasticsearch and trying to add default vales for new created properties, i have tried the below approach , but the update query is failing with the following error message 'failed to create query: [nested] nested object under path [summaryTableColumns] is not of nested type
New Mapping
{
"properties": {
"subjectPropertyFields": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"summaryTableColumns": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
Update query using painless that i have tried
{
"script": {
"source": "ctx._source.summaryTableColumns= params.summaryTableColumns",
"lang": "painless",
"params": {
"summaryTableColumns": [
{
"key": "Property Name",
"value": "name"
},
{
"key": "City",
"value": "city"
},
{
"key": "Distance",
"value": "propertyAddress"
},
{
"key": "Units",
"value": "units"
},
{
"key": "Built",
"value": "yearBuilt"
},
{
"key": "Occupancy",
"value": "occupancyAsOfDate"
},
{
"key": "Avg SF",
"value": "avgSf"
},
{
"key": "Avg Rent",
"value": "avgMarketRentSf"
},
{
"key": "Avg Rent/SF",
"value": "avgRentSf"
},
{
"key": "NA",
"value": "NA"
}
]
}
},
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "summaryTableColumns",
"query": {
"bool": {
"filter": {
"exists": {
"field": "summaryTableColumns"
}
}
}
}
}
}
],
"should": {
"bool": {
"must_not": {
"match": {
"templateName": "salesComps"
}
}
}
}
}
}
}
error i am facing
'failed to create query: [nested] nested object under path [summaryTableColumns] is not of nested type

how do I implement a single-word auto complete using Elasticsearch 6

I would like to implement an single word autocomplete using elasticsearch 6. I have seen a fair amount of posts on how to do this using lesser versions however, it seems that autocomplete has changed significantly in the last version.
I am using the standard mapping for autocomplete:
PUT advertising_tins
{
"settings": {
"analysis": {
"analyzer": {
"completion_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"completion_filter"
],
"tokenizer": "keyword"
}
},
"filter": {
"completion_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 24
}
}
}
},
"mappings": {
"item": {
"properties": {
"date": {
"type": "long"
},
"id": {
"type": "text"
},
"title": {
"type": "text"
},
"suggest": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"completion": {
"type": "text",
"analyzer": "completion_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
I am indexing like this:
POST advertising_tins/item/_bulk
{"index":{}}
{"date": 20180217, "title": "Vintage Spice Cardboard Tin of Mace Dainty Brand St. Paul, MN 1 oz.","id": "305232814","suggest": [ "spice","cardboard","tin","mace","dainty","brand","st","paul","mn","oz"]}
And querying like this:
POST advertising_tins/_search?pretty
{
"size": 0,
"query": {
"term": {
"suggest.completion": "car"
}
},
"aggs": {
"suggestions": {
"terms": {
"field": "suggest.raw"
}
}
}
}
However my results return all terms in the suggest field instead of just single term "cardboard".
{
"took": 4,
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"suggestions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "brand",
"doc_count": 1
},
{
"key": "cardboard",
"doc_count": 1
},
{
"key": "dainty",
"doc_count": 1
},
{
"key": "mace",
"doc_count": 1
},
{
"key": "mn",
"doc_count": 1
},
{
"key": "oz",
"doc_count": 1
},
{
"key": "paul",
"doc_count": 1
},
{
"key": "spice",
"doc_count": 1
},
{
"key": "st",
"doc_count": 1
},
{
"key": "tin",
"doc_count": 1
}
]
}
}
}
And idea how I fix this and get just a single term match?
You are almost there. It can be achieved with the default Completion Suggester, you only need to change the type of your completion field to "completion":
"mappings": {
"item": {
"properties": {
"date": {
"type": "long"
},
"id": {
"type": "text"
},
"title": {
"type": "text"
},
"suggest": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"completion": {
"type" : "completion", <--- here
"analyzer": "completion_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
And add a "suggest" part into your query:
POST advertising_tins/_search
{
"size": 0,
"query": {
"term": {
"suggest.completion": "car"
}
},
"suggest" : { <--- Here goes he suggest query
"my-suggestion" : {
"text" : "car",
"completion" : {
"field" : "suggest.completion"
}
}
},
"aggs": {
"suggestions": {
"terms": {
"field": "suggest.raw"
}
}
}
}
The response will look like this:
{
// ...
"hits": //... ,
"aggregations": // ...,
"suggest": {
"my-suggestion": [
{
"text": "car",
"offset": 0,
"length": 3,
"options": [
{
"text": "cardboard", <--- here is the suggestion
"_index": "advertising_tins",
"_type": "item",
"_id": "GLeUqGEBVrFe7u7pR5uA",
"_score": 1,
"_source": {
"date": 20180217,
"title": "Vintage Spice Cardboard Tin of Mace Dainty Brand St. Paul, MN 1 oz.",
"id": "305232814",
"suggest": [
"spice",
"cardboard",
"tin",
"mace",
"dainty",
"brand",
"st",
"paul",
"mn",
"oz"
]
}
}
]
}
]
}
}
The response also includes the _source of the suggested document, so you might not even need to use "query" and "aggs" parts.
Hope that helps!

ElasticSearch - Filter results by inner hits

We have a simple index of entities with mapping:
PUT resource/_mapping/entity
{
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"claims": {
"type": "nested",
"properties": {
"claimid": {
"type": "keyword"
},
"priority": {
"type": "short"
},
"visibility": {
"type": "keyword"
}
}
}
}
}
Here's a sample document in the index:
POST resource/entity/
{
"id": "2",
"name": "e2",
"claims": [
{
"claimid": "c1",
"priority": "2",
"visibility": "M",
"reqid" : "2"
},
{
"claimid": "c2",
"priority": "1",
"visibility": "V",
"reqid" : "2"
},
{
"claimid": "c5",
"priority": "3",
"visibility": "H",
"reqid" : "2"
}
]
}
And a query to filter documents by provided set of 'claims.claimid', then to sort by 'claims.priority', select the one with highest priority and return only the 'claims.visibility' e.g.:
GET resource/entity/_search/
{
"query": {
"nested": {
"path": "claims",
"query": {
"bool": {
"must": [
{
"terms": {
"claims.claimid": [
"c1",
"c4",
"c5"
]
}
}
]
}
},
"inner_hits": {
"sort": [
{
"claims.priority": "asc"
}
],
"size":1,
"_source":{"includes":["claims.visibility"]}
}
}
}
}
And finally the problem to be solved: how to modify the query to filter out documents having resulted in "H" for visibility with highest priority in inner hits? Or what other query will return a set of documents with visibility of highest priority filtered by provided claim ids, but only those where visibility is not "H"?
A catch here is that we have to sort the documents having all types of visibilities and filter out those with resulting "H" on a complete list of results.

ElasticSearch Advanced Aggregations

I currently have documents indexed with the following structure:
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string"
},
"Type": {
"type": "string"
},
}
}
}
}
I want to aggregate on results when searching over this type. I initially just wanted the terms from the Source field, which was easy. I just used a terms aggregations for the Source field.
Now I would like to aggregate the Type field as well. However, the types are related to the sources. For example, I could have two Sources like this:
{
"Source": "The Store",
"Type": "Purchase"
}
and
{
"Source": "The Store",
"Type": "Return"
}
I want to show the different types and their counts for each different source. In other words, I would want my response to be something like this:
{
"aggs": {
"Sources": [
{
"Key": "The Store",
"DocCount": 2,
"Aggregations": {
"Types": [
{
"Key": "Purchase",
"DocCount": 1
},
{
"Key": "Return",
"DocCount": 1
}
]
}
}
]
}
}
Is there a way to get these sub-aggregations?
Yes, there is but you need to slightly change your mapping to make your fields `not_analyzed``
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string",
"index": "not_analyzed"
},
"Type": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
Then you can use the following aggregation in order to get what you want:
{
"aggs": {
"sources": {
"terms": {
"field": "Sources.Source"
},
"aggs": {
"types": {
"terms": {
"field": "Sources.Type"
}
}
}
}
}
}

Elastic Search - OR filter with boolean and ids

I'm trying to search through items, where some of them might be private.
If a item is private, only friends of item owner (array item.friends) may see the item.
If it's not private, everyone can see it.
So my logic is:
If item is not is_private (is_private=0) OR user id (4 in my example) is in array item.friends, user can see the item.
Still i get no results. Every item is now set to is_private=1, so I guess something is wrong with my ids filter.
Any suggestions?
// ---- Mapping
{
"item": {
"properties": {
"name": {
"type": "string"
},
"description": {
"type": "string"
},
"created": {
"type": "date"
},
"location": {
"properties": {
"location": {
"type": "geo_point"
}
}
},
"is_proaccount": {
"type": "integer"
},
"is_given_away": {
"type": "integer"
},
"is_private": {
"type": "integer"
},
"friends": {
"type": "integer",
"index_name": "friend"
}
}
}
}
// ----- Example insert
{
"name": "Test",
"description": "Test",
"created": "2012-02-20T12:21:30",
"location": {
"location": {
"lat": "59.919914",
"lon": "10.753414"
}
},
"is_proaccount": "0",
"is_given_away": "0",
"is_private": 1,
"friends": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10
]
}
// ----- Query
{
"from": 0,
"size": 30,
"filter": {
"or": [
{
"bool": {
"must": [
{
"term": {
"is_private": 0
}
}
]
}
},
{
"ids": {
"values": [
4
],
"type": "friends"
}
}
]
},
"query": {
"match_all": {}
}
}
The "ids" filter probably does not mean what you think it means: it filters on the document ID (and, optionally, on the document type.)
See http://www.elasticsearch.org/guide/reference/query-dsl/ids-filter.html

Resources