Elasticsearch - mutiplication of 2 fields and then sum aggregation - elasticsearch

I'm trying to make a simple query in elasticsearch but I can't figure out how to do it. I searched all over the internet and there was no discussion on this situation.
Let's say I have items like those:
{
"item_id": 1,
"item_price": 100,
"item_quantity": 2
},
{
"item_id": 2,
"item_price": 200,
"item_quantity": 3
},
{
"item_id": 3,
"item_price": 150,
"item_quantity": 1
},
{
"item_id": 4,
"item_price": 250,
"item_quantity": 5
}
I want to make a query that will give me the result of the total price in the stock.
for example: 100*2 + 200*3 + 150*1 + 250*5
the result for this query supposed to be 2,200
The answer query for the last data is working, But what about this complex situation:
POST tests/test2/
{
"item_category": "aaa",
"items":
[
{
"item_id": 1,
"item_price": 100,
"item_quantity": 2
},
{
"item_id": 2,
"item_price": 150,
"item_quantity": 4
}
]
}
POST tests/test2/
{
"item_category": "bbb",
"items":
[
{
"item_id": 3,
"item_price": 200,
"item_quantity": 3
},
{
"item_id": 4,
"item_price": 200,
"item_quantity": 5
}
]
}
POST tests/test2/
{
"item_category": "ccc",
"items":
[
{
"item_id": 5,
"item_price": 300,
"item_quantity": 2
},
{
"item_id": 6,
"item_price": 150,
"item_quantity": 8
}
]
}
POST tests/test2/
{
"item_category": "ddd",
"items":
[
{
"item_id": 7,
"item_price": 80,
"item_quantity": 10
},
{
"item_id": 8,
"item_price": 250,
"item_quantity": 4
}
]
}
In this case the next query is not working and give me a wrong answer (1,420 instead of 6,000):
GET tests/test2/_search
{
"query": {
"match_all": { }
},
"aggs": {
"total_price": {
"sum": {
"script": {
"lang": "painless",
"inline": "doc['items.item_price'].value * doc['items.item_quantity'].value"
}
}
}
}
}

You can use sum aggregation for values calculated using script
{
"aggs": {
"total_price": {
"sum": {
"script": {
"lang": "painless",
"inline": "doc['item_price'].value * doc['item_quantity'].value"
}
}
}
}
}
Take a look here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#_script_9 for more details
Update
As for your advanced case, it would be better to map your items field as nested type, after that you can use this aggregation
{
"aggs": {
"nested": {
"nested": {
"path": "items"
},
"aggs": {
"total_price": {
"sum": {
"script": {
"inline": "doc['items.item_price'].value * doc['items.item_quantity'].value"
}
}
}
}
}
}
}
this is the mapping query for the example DB in the question:
PUT tests
{
"mappings": {
"test2": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
}
just to clarify, You must make the mapping query before the index has been created. (changing mapping for existing field is not allowed).

Related

Interval search for messages in Elasticsearch

I need to split the found messages into intervals. Can this be done with Elasticsearch?
For example. There are 10 messages, you need to divide them into 3 intervals. It should look like this...
[0,1,2,3,4,5,6,7,8,9] => {[0,1,2], [3,4,5,6], [7,8,9]}.
I'm only interested in the beginning of the intervals. For example: {[count - 3, min 0], [count - 4, min 3], [count - 3, min - 7]}
Example.
PUT /test_index
{
"mappings": {
"properties": {
"id": {
"type": "long"
}
}
}
}
POST /test_index/_doc/0
{
"id": 0
}
POST /test_index/_doc/1
{
"id": 1
}
POST /test_index/_doc/2
{
"id": 2
}
POST /test_index/_doc/3
{
"id": 3
}
POST /test_index/_doc/4
{
"id": 4
}
POST /test_index/_doc/5
{
"id": 5
}
POST /test_index/_doc/6
{
"id": 6
}
POST /test_index/_doc/7
{
"id": 7
}
POST /test_index/_doc/8
{
"id": 8
}
POST /test_index/_doc/9
{
"id": 9
}
It is necessary to divide the values ​​into 3 intervals with the same number of elements in each interval:
{
...
"aggregations": {
"result": {
"buckets": [
{
"min": 0.0,
"doc_count": 3
},
{
"min": 3.0,
"doc_count": 4
},
{
"min": 7.0,
"doc_count": 3
}
]
}
}
}
There is a similar function: "variable width histogram":
GET /test_index/_search?size=0
{
"aggs": {
"result": {
"variable_width_histogram": {
"field": "id",
"buckets": 3
}
}
},
"query": {
"match_all": {}
}
}
But "variable width histogram" separates documents by id value, not by the number of elements in the bucket
Assuming your mapping is like:
{
"some_numeric_field" : {"type" : "integer"}
}
Then you can build histograms out of it with fixed interval sizes:
POST /my_index/_search?size=0
{
"aggs": {
"some_numeric_field": {
"histogram": {
"field": "some_numeric_field",
"interval": 7
}
}
}
}
Results:
{
...
"aggregations": {
"prices": {
"buckets": [
{
"key": 0.0,
"doc_count": 7
},
{
"key": 7.0,
"doc_count": 7
},
{
"key": 14.0,
"doc_count": 7
}
]
}
}
}
To get the individual values inside each bucket, just add a sub-aggregation, maybe "top_hits" or anything else like a "terms"
aggregation.
Without knowing more about your data, I really cannot help further.

elasticSearch count multiple

I have two profiles, "A" and "B" both have events in the elastic
this is the elastic data for ex:
{hits: [
{tag:"A"},
{tag:"B"},
{tag:B}
]}
I want to count how much events tag "a" have and and how much "B" in one request
Ive tried this but it counts them total as 3 and I want A:1 and B:2
GET forensics/_count
{
"query": {
"terms": {
"waas_tag": ["A","B"]
}
}
}
You can use term vector API to get information about the terms of a particular field.
Adding a working example with index data and response
Index Data
{
"waas_tag": [
{
"tag": "A"
},
{
"tag": "B"
},
{
"tag": "B"
}
]
}
Term Vector API:
GET _termvectors/1?fields=waas_tag.tag
Response:
"term_vectors": {
"waas_tag.tag": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 1,
"sum_ttf": 3
},
"terms": {
"a": {
"term_freq": 1, // note this
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 1
}
]
},
"b": {
"term_freq": 2, // note this
"tokens": [
{
"position": 101,
"start_offset": 2,
"end_offset": 3
},
{
"position": 202,
"start_offset": 4,
"end_offset": 5
}
]
}
}
}
}
at the end I found a solution not using count but msearch
GET forensics/_msearch
{} // this means {index:"forensics"}
{"query":{"term":{"waas_tag":"A"}}}
{} // this means {index:"forensics"}
{"query":
{
"bool":{
"must":[{"term":{"waas_tag":"B"}
},
{
"range":{"#timestamp":{"gte":"now-20d","lt":"now/s"}}}]}
}
}
You can use filters aggregation to get the count for each tag in a single query without using _msearch endpoint. This query should work:
{
"size": 0,
"aggs": {
"counts": {
"filters": {
"filters": {
"CountA": {
"term": {
"waas_tag": "A"
}
},
"CountB": {
"term": {
"waas_tag": "B"
}
}
}
}
}
}
}

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

Unique values query without aggregations

We have an index of unique products where each document represents a single product, with the following fields: product_id, group_id, group_score, and product_score.
Consider the following index:
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 60,
},
{
"product_id": "100-002",
"group_id": "100",
"group_score": 100,
"product_score": 40,
},
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 50,
},
{
"product_id": "200-001",
"group_id": "200",
"group_score": 73,
"product_score": 20,
},
{
"product_id": "200-002",
"group_id": "200",
"group_score": 73,
"product_score": 53,
}
Every group contains ~1-200 products.
We are trying to a query that matches the following conditions:
1. Products should be sorted by their group_score (desc).
2. No more than one product per group_id.
3. Get the product with the highest product_score within the group.
For example, applying the query on the above should return:
{
"product_id": "100-001"
},
{
"product_id": "200-002"
}
We ended up with the following query:
{
"size": 0,
"aggs": {
"group_by_group_id": {
"terms": {
"field": "group_id",
"order":{
"max_group_score":"desc"
}
},
"aggs": {
"top_scores_hits": {
"top_hits": {
"sort": [
{
"product_score": {
"order": "desc"
}
}
],
"size": 1
}
},
"max_group_score":{
"max":{
"field":"group_score"
}
}
}
}
}
}
The problem is that the query is really slow because of the aggregations and the search performance is important.
We would love to hear your opinion about a better/efficient solution.
Changing the index structure is tolerable.

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Resources