How to build Price Comparison with Elasticsearch - elasticsearch

I have to build a price comparison system. My idea was to use Elasticsearch to build on.
Now I have this problem. How can I aggregate seller prices for each Product.
As Example see this Screenshot:
Let me say I have this simple mapping:
products: {
product: {
properties: {
id: {
type: "long"
},
name: {
type: "string"
},
....
sellers: {
dynamic: "true",
properties: {
sellerId: {
type: "long"
},
price: {
type: "float"
}
}
}
}
}
}
Can I aggregate or facet the price (min,max, and sellers count) for each Product?
Or is there a way to build this thing with parent child relations?

Assuming you're using 1.0 and not 0.90, then you can do this quite easily using min, max and value_count aggregations.
{
"query": {
"match": {
"name": "item1"
}
},
"aggs": {
"Min": {
"min": {
"field": "sellers.price"
}
},
"Max": {
"max": {
"field": "sellers.price"
}
},
"SellerCount": {
"value_count": {
"field": "sellers.sellerId"
}
}
}
}
Or, you could use a sub-aggregation to return the information for each product and not a specific one.
{
"aggs": {
"Products": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"Min": {
"min": {
"field": "sellers.price"
}
},
"Max": {
"max": {
"field": "sellers.price"
}
},
"SellerCount": {
"value_count": {
"field": "sellers.sellerId"
}
}
}
}
}
}

Related

Compose nested aggregations

Im sorry for any english misstake.
i hope that someone can help me.
Supose that i have the following mapping to my index:
PUT test-index
{
"mappings": {
"properties": {
"nestedOBJField": {
"type": "nested",
"index": true
},
"keywordField": {
"type": "keyword",
"index": true
}
}
}
}
It is possible to use the composite feature with nested fields?
It will be very handful if i can do something like this:
GET /test-index/_search
{
"size": 0,
"aggs": {
"TestAgg": {
"composite": {
"size": 10000,
"sources": [
{
"keyWordFieldAgg": {
"terms": {
"field": "keyWordField"
}
},
{
"nestedFieldAgg": {
"terms": {
"field": "nestedOBJField.attribute"
}
}
}
]
}
}
}
}
But this aproach is returning a several number of errors.
I will appreciate a lot if someone can help
Property nestedOBJField is of data type "nested" and property keyWordField is keyword type and at same level as nestedOBJField.
To use nested fields in aggregation , you need to use nested aggregation but then all sources in composite aggegation must be of type nested. This open issue can tell more about it.
You can use following work arounds.
Move keyWordField inside nested object in your documents.
{
"mappings": {
"properties": {
"nestedOBJField": {
"type": "nested",
"properties":{
"keywordField": {
"type": "keyword"
}
}
}
}
}
}
Sample Document
{
"nestedOBJField":[
{
"attribute":"1",
"age":1,
"keywordField":"xyz"
},
{
"attribute":"2",
"age":2,
"keywordField":"xyz"
}
]
}
Query
"aggs": {
"TestAgg": {
"nested": {
"path": "nestedOBJField"
},
"aggs": {
"name": {
"composite": {
"size": 10000,
"sources": [
{
"nestedFieldAgg": {
"terms": {
"field": "nestedOBJField.attribute.keyword"
}
}
},
{
"a":{
"terms": {
"field": "nestedOBJField.keywordField.keyword"
}
}
}
]
}
}
}
}
}
Moving your field inside nested property will mean data duplication , updating data in all nested documents.
Using terms aggregation - pagination will be an issue in this case
{
"size": 0,
"aggs": {
"TestAgg": {
"nested": {
"path": "nestedOBJField"
},
"aggs": {
"name": {
"terms": {
"field": "nestedOBJField.attribute.keyword",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"keywords": {
"terms": {
"field": "keywordField.keyword",
"size": 10
}
}
}
}
}
}
}
}
}
}

Perform multi-field / multi-dimensional aggregations with nested fields in Elastic Search

I am tracking attendance of few students. I am storing their details in the index like the below.
Each doc in "entries" have few other fields. The following data shows that a student has attended 6 classes on "Monday".
"entries" is of type "nested"
{
reg_id: 1111,
"entires" : [
{
id: "123"
day: 'Monday'
},
{
id: "1234",
attendance: true
},
{
id: "12345",
classes_attended: 6
}
],
}
I want the count of each classes_attended of students for each day.
For Example "72 entries of students found for "Monday", who has attended 6 classes"
Sample desired output - This is just a sample I am completely fine if the output schema is changed.
[
{
"day" : "monday",
"classes_attended": 6,
count: 4
},
{
"day" : "monday",
"classes_attended": 1,
count: 5
},
{
"day" : "tuesday",
"classes_attended": 5,
count: 2
},
{
"day" : "tuesday",
"classes_attended": 6,
count: 1
}
]
Not sure How to start with the aggregations query:
I tried with the following query but I know its not the correct solution
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs":{
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"class_attended_days_count": {
"reverse_nested": {},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.class_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
It's unclear what the top-level object is so I'm going to assume it's a "student attendance entry per day". I'm also unsure what the entries.ids represent but I'll assume you'll be needing them at some later point so I'll keep them untouched.
Now, since all that your entries objects have in common is the id, they can be decoupled. Meaning that you should be using nested if any only if you share some attributes across all objects which need their attribute connections preserved. Since I don't see entries.id anywhere in your aggs, I'd recommend the following adjustments to your mapping:
PUT students
{
"mappings": {
"properties": {
"day": { ------------
"type": "keyword" |
}, |
"attendance": { |
"type": "boolean" | <--
}, |
"classes_attended": { |
"type": "integer" |
}, ------------
"entries": {
"type": "nested",
"properties": {
"day": {
"type": "keyword",
"copy_to": "day" <--
},
"attendance": {
"type": "boolean",
"copy_to": "attendance" <--
},
"classes_attended": {
"type": "integer",
"copy_to": "classes_attended" <--
}
}
}
}
}
}
and here's your query:
GET students/_search
{
"size": 0,
"aggs": {
"days": {
"terms": {
"field": "day"
},
"aggs": {
"classes_attended": {
"terms": {
"field": "classes_attended"
},
"aggs": {
"student_count": {
"cardinality": {
"field": "_id"
}
}
}
}
}
}
}
}
The response can then be post-processed into whatever you prefer.
EDIT
You could hijack reverse_nested but will need to come back to it as you're referencing other nested entries:
GET students/_search
{
"size": 0,
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs": {
"class_attended_day": {
"nested": {
"path": "entries"
},
"aggs": {
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.classes_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Sum and count aggregations over Elasticsearch fields

I am new to Elasticsearch and I am looking to perform certain aggregations over the fields from an Elasticsearch 5.x index. I have an index that contains the documents with fields langs (which have nested structure) and docLang. These are dynamically mapped fields. Following are the examples documents
DOC 1:
{
"_index":"A",
"_type":"document",
"_id":"1",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2,
"zh":3
},
"Y":{
"en":4,
"es":5,
"zh":6
}
},
"docLang": "en"
}
}
DOC 2:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2
},
"Y":{
"en":3,
"es":4
}
},
"docLang": "es"
}
}
DOC 3:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1
},
"Y":{
"en":2
}
},
"docLang": "en"
}
}
I want to perform sum aggregation over the langs field in a way that for each key (X/Y) and for each language, I can get the sum across all documents in an index. Also, I want to produce the counts of documents for each type of language from docLang field.
e.g.: For above 3 documents, sum aggregation over langs field would look like below:
"langs":{
"X":{
"en":3,
"es":4,
"zh":3
},
"Y":{
"en":9,
"es":9,
"zh":6
}
}
And the docLang count would look like below:
"docLang":{
"en" : 2,
"es" : 1
}
Also because of some production env restrictions, I cannot use scripts in Elasticsearch. So, I was wondering if it is possible to use just field aggregation type for above fields?
{
"size": 0,
"aggs": {
"X": {
"nested": {
"path": "langs.X"
},
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
}
}
},
"Y": {
"nested": {
"path": "langs.Y"
},
"aggs": {
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
}
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}
Since you didn't mention, but I think it's important. I made X and Y as nested fields:
"langs": {
"properties": {
"X": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
},
"Y": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
}
}
}
But, if you fields are not nested at all and here I mean actually the nested field type in Elasticsearch, a simple aggregation like this one should be enough:
{
"size": 0,
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
},
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}

ElasticSearch aggregations using filter and without it

I`m building product list page with filters. There a lot of filters, and data for them are counting in ES with aggregation functions.
Simplest example if min/max price:
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
}
}
}
So, this request in ES return me minimal and maximal price according rules installed in filter (category_id 36898, shop_id 44 etc).
It is working perfect.
The question is: is it possible to update this request and get aggregations without filters? Or is it maybe possible to return aggregation data with another filter in one request?
So I want:
min_price and max_price for filtered data (query1)
and mix_price and max_price for unfiltered data (or filtered data with query 2)?
You can use global option for the aggregations to not applying any filters provided in query block.
For example, for your query use the following json input.
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
},
"without_filter_min": {
"global": {},
"aggs": {
"price_value": {
"min": {
"field": "products_price"
}
}
}
},
"without_filter_max": {
"global": {},
"aggs": {
"price_value": {
"max": {
"field": "products_price"
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources