Faceted search in webshop with Elastic - elasticsearch

I've seen a few examples about faceted search in Elastic but all of them know in advance on what fields you would want created buckets.
How should I work when I have a webshop with multiple categories, where the properties of the values are different in every category?
Is there a way to describe what properties your documents have when you ran a query (eg filter by category)?
I have this query right now:
{
"from" : 0, "size" : 10,
"query": {
"bool" : {
"must" : [
{ "terms": {"color": ["red", "green", "purple"]} },
{ "terms": {"make": ["honda", "toyota", "bmw"]} }
]
}
},
"aggregations": {
"all_cars": {
"global": {},
"aggs": {
"colors": {
"filter" : { "terms": {"make": ["honda", "toyota", "bmw"]} },
"aggregations": {
"filtered_colors": { "terms": {"field": "color.keyword"} }
}
},
"makes": {
"filter" : { "terms": {"color": ["red", "green"]} },
"aggregations": {
"filtered_makes": { "terms": {"field": "make.keyword"} }
}
}
}
}
}
}
How can I know on what fields I can make aggregations. Is there a way to describe the properties of a document after running a query? So I can know what the possible fields ,to aggregate on, are.

Right now I am storing all properties of my article in an array and I can quickly aggregate them like this:
{
"size": 0,
"aggregations": {
"array_aggregation": {
"terms": {
"field": "properties.keyword",
"size": 10
}
}
}
}
This is a step in the right direction but that way I don't know what the type of a property is.
Here's a sample object
"price": 10000,
"color": "red",
"make": "honda",
"sold": "2014-10-28",
"properties": [
"price",
"color",
"make",
"sold"
]

You can use the filter aggregation which will filter and then create a terms aggregation inside?

Related

Elasticsearch : How to do 'group by' with painless in scripted fields?

I would like to do something like the following using painless:
select day,sum(price)/sum(quantity) as ratio
from data
group by day
Is it possible?
I want to do this in order to visualize the ratio field in kibana, since kibana itself doesn't have the ability to divide aggregated values, but I would gladly listen to alternative solutions beyond scripted fields.
Yes, it's possible, you can achieve this with the bucket_script pipeline aggregation:
{
"aggs": {
"days": {
"date_histogram": {
"field": "dateField",
"interval": "day"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
}
UPDATE:
You can use the above query through the Transform API which will create an aggregated index out of the source index.
For instance, I've indexed a few documents in a test index and then we can dry-run the above aggregation query in order to see how the target aggregated index would look like:
POST _transform/_preview
{
"source": {
"index": "test2",
"query": {
"match_all": {}
}
},
"dest": {
"index": "transtest"
},
"pivot": {
"group_by": {
"days": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
The response looks like this:
{
"preview" : [
{
"quantity" : 12.0,
"price" : 1000.0,
"days" : 1580515200000,
"ratio" : 83.33333333333333
}
],
"mappings" : {
"properties" : {
"quantity" : {
"type" : "double"
},
"price" : {
"type" : "double"
},
"days" : {
"type" : "date"
}
}
}
}
What you see in the preview array are documents that are going to be indexed in the transtest target index, that you can then visualize in Kibana as any other index.
So what a transform actually does is run the aggregation query I gave you above and it will then store each bucket into another index that can be used.
I found a solution to get the ratio of sums with TSVB visualization in kibana.
You may see the image here to see an example.
At first, you have to create two sum aggregations, one that sums price and another that sums quantity. Then, you choose the 'Bucket Script' aggregation to divide the aforementioned sums, with the use of painless script.
The only drawback that I found is that you can not aggregate on multiple columns.

Sort multi-bucket aggregation by source fields inside inner multi-bucket aggregation

TL;DR: Using an inner multi-bucket aggregation (top_hits with size: 1) inside an outer multi-bucket aggregation, is it possible to sort the buckets of the outer aggregation by the data in the inner buckets?
I have the following index mappings
{
"parent": {
"properties": {
"children": {
"type": "nested",
"properties": {
"child_id": { "type": "keyword" }
}
}
}
}
}
and each child (in data) has also the properties last_modified: Date and other_property: String.
I need to fetch a list of children (of all the parents but without the parents), but only the one with the latest last_modified per each child_id. Then I need to sort and paginate those results to return manageable amounts of data.
I'm able to get the data and paginate over it with a combination of nested, terms, top_hits, and bucket_sort aggregations (and also get the total count with cardinality)
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"totalCount": {
"cardinality": {
"field": "children.child_id"
}
},
"oneChildPerId": {
"terms": {
"field": "children.child_id",
"order": { "_term": "asc" },
"size": 1000000
},
"aggs": {
"lastModified": {
"top_hits": {
"_source": [
"children.other_property"
],
"sort": {
"children.last_modified": {
"order": "desc"
}
},
"size": 1
}
},
"paginate": {
"bucket_sort": {
"from": 36,
"size": 3
}
}
}
}
}
}
}
}
but after more than a solid day of going through the docs and experimenting, I seem to be no closer to figuring out, how to sort the buckets of my oneChildPerId aggregation by the other_property of that single child retrieved by lastModified aggregation.
Is there a way to sort a multi-bucket aggregation by results in a nested multi-bucket aggregation?
What I've tried:
I thought I could use bucket_sort for that too, but apparently its sort can only be used with paths containing other single-bucket aggregations and ending in a metic one.
I've tried to find a way to somehow transform the 1-result multi-bucket of lastModified into a single-bucket, but haven't found any.
I'm using ElasticSearch 6.8.6 (the bucket_sort and similar tools weren't available in ES 5.x and older).
I had the same problem. I needed a terms aggregation with a nested top_hits, and want to sort by a specific field inside the nested aggregation.
Not sure how performant my solution is, but the desired behaviour can be achieved with a single-value metric aggregation on the same level as the top_hits. Then you can sort by this new aggregation in the terms aggregation with the order field.
Here an example:
POST books/_doc
{ "genre": "action", "title": "bookA", "pages": 200 }
POST books/_doc
{ "genre": "action", "title": "bookB", "pages": 35 }
POST books/_doc
{ "genre": "action", "title": "bookC", "pages": 170 }
POST books/_doc
{ "genre": "comedy", "title": "bookD", "pages": 80 }
POST books/_doc
{ "genre": "comedy", "title": "bookE", "pages": 90 }
GET books/_search
{
"size": 0,
"aggs": {
"by_genre": {
"terms": {
"field": "genre.keyword",
"order": {"max_pages": "asc"}
},
"aggs": {
"top_book": {
"top_hits": {
"size": 1,
"sort": [{"pages": {"order": "desc"}}]
}
},
"max_pages": {"max": {"field": "pages"}}
}
}
}
}
by_genre has the order field which sorts by a sub aggregation called max_pages. max_pages has only been added for this purpose. It creates a single-value metric by which the order is able to sort by.
Query above returns (I've shortened the output for clarity):
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
If you change "order": {"max_pages": "asc"} to "order": {"max_pages": "desc"}, the output becomes:
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
The type of the max_pages aggregation can be changed as needed , as long as it is a single-value metic aggregation (e.g. sum, avg, etc)

elastic search sort aggregation by selected field

How can I sort the output from an aggregation by a field that is in the source data, but not part of the output of the aggregation?
In my source data I have a date field that I would like the output of the aggregation to be sorted by date.
Is that possible? I've looked at using "order" within the aggregation, but I don't think it can see that date field to use it for sorting?
I've also tried adding a sub aggregation which includes the date field, but again, I cannot get it to sort on this field.
I'm calculating a hash for each document in my ETL on the way in to elastic. My data set contains a lot of duplication, so I'm trying to use the aggregation on the hash field to filter out duplicates and that works fine. I need the output from the aggregation to retain a date sort order so that I can work with the output in angular.
The documents are like this:
{_id: 123,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}
{_id: 124,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}
{_id: 132,
_source: {
"hash": "0202020202020"
"user": "1"
"dateTime" : "2001/2/20 09:20:43"
"action": "Logout"
}
{_id: 200,
_source: {
"hash": "0303030303030303"
"user": "2"
"dateTime" : "2001/2/22 09:32:14"
"action": "Login"
}
So I want to use an aggregation on the hash value to remove duplicates from my set and then render the response in date order.
My query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"action": "Login"
}
}
]
},
"size": 0,
"aggs": {
"md5": {
"terms": {
"field": "hash",
"size": 0
}
},
"size": 0,
"aggs": {
"byDate": {
"terms": {
"field": "dateTime",
"size": 0
}
}
}
}
}
}
}
}
Currently the output is ordered on the hash and I need it ordered on the date field within each hash bucket. Is that possible?
If the aggregation on "hash" is just for removing duplicates, it might work for you to simply aggregate on "dateTime" first, followed by the terms aggregation on "hash". For example:
GET my_index/test/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool": {
"must" : [
{ "term": {"action":"Login"} }
]
}
}
}
},
"size": 0,
"aggs": {
"byDate" : {
"terms": {
"field" : "dateTime",
"order": { "_term": "asc" } <---- EDIT: must specify order here
},
"aggs": {
"byHash": {
"terms": {
"field": "hash"
}
}
}
}
}
}
This way, your results would be sorted by "dateTime" first.

Elastic search 2.1 : Intersection of aggregations

I have some sample data in elastic search, which looks like below
Data1: {
"name": "rahul",
"socialnetwork": "facebook",
"day":1
}Data2: {
"name": "rahul",
"searchengine": "google"
"day": 1
}Data3: {
"name": "vivek",
"socialnetwork": "facebook",
"day":1
}Data4: {
"name": "devendra",
"searchengine": "google",
"day":2
}Data5: {
"name": "rahul",
"socialnetwork": "facebook",
"day":2
}
I need to get aggregations on "name" field, where socialnetwork = "facebook" and searchengine = "google".
As far as I know, we can use two aggregations and get an intersection of aggregations.
1st aggregation :
{
"query": {
"match": {
"searchengine": "google"
}
},
"aggs": {
"searcheng": {
"terms": {
"field": "name"
}
}
}
}
2nd aggregation :
{
"query": {
"match": {
"socialnetwork": "facebook"
}
},
"aggs": {
"socialnet": {
"terms": {
"field": "name"
}
}
}
}
And get the common aggregations (i.e. intersection) from both the aggregations.
But I am not able to get intersection using elastic search.
I have tried many things: subaggregations doesn't help in this case, significant terms aggregations results are not good enough, filters, pipeline aggregations, but couldn't find a solution.
Above sample data is just a simplified version of a big data, there are more than two filters, around 20 filters.
No,you dont need to have intersection of two aggregations.
The above can be easily achieved using bool query.For your desired output you can use should clause.
{
"query": {
"bool": {
"should": [
{
"match": {
"searchengine": "google"
}
},
{
"match": {
"socialnetwork": "facebook"
}
}
],
"minimum_number_should_match": 1
}
},
"aggs": {
"searcheng": {
"terms": {
"field": "name",
"min_doc_count" :2
}
}
}
}
Hope it helps.

Aggregations for categories, sorted by category sequence

I have an elastic index, in which each document contains the following:
category {
"id": 4,
"name": "Green",
"seq": 2
}
I can use aggregations to get me the doc count for each of the categories:
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name"
}
}
}
}
This is fine, but the aggs are sorted by the doc count. What I'd like is to have the buckets sorted by the seq value, something that's easy in SQL.
Any suggestions?
Thanks!
Take a look at ordering terms aggregations.
Something like this could work, but only if "name" and "sequence" have the right relationships (one-to-one, or it works out in some other way):
POST /test_index/_search
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name",
"order" : { "seq_num" : "asc" }
},
"aggs": {
"seq_num": {
"max": {
"field": "category.seq"
}
}
}
}
}
}
Here is some code I used for testing:
http://sense.qbox.io/gist/4e551b2faec81eb0343e0e6d0cc9b10f20d7d4c1

Resources