Elasticsearch: how to filter by summed values in nested objects? - elasticsearch

I have the following products structure in the elasticsearch:
POST /test/products/1
{
"name": "product1",
"sales": [
{
"quantity": 10,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-02"
},
{
"quantity": 5,
"customer": "customer2",
"date": "2013-12-30"
}
]
}
POST /test/products/2
{
"name": "product2",
"sales": [
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 15,
"customer": "customer1",
"date": "2014-02-01"
},
{
"quantity": 1,
"customer": "customer2",
"date": "2014-01-21"
}
]
}
The sales field is nested object. I need to filter products like this:
"get all products which have total quantity >= 16 and sales.customer = 'customer1'".
The total quantity is sum(sales.quantity) where sales.customer = 'customer1'.
Therefore the search results should contain only 'product2'.
I tried to use aggs but I didn't understand how to filter in this case.
I haven't found any information about it in the elasticsearch documentation.
Is it possible?
I would welcome any ideas, thanks!

First of all be clear what do you want as result? Is it count or query fields? Aggregations only gives count and for fields you need to use filter in query. If you want fields then you cant get filter for sum(sales.quantity)>=16 and if you want count you can get it using range aggregation but for that also i think you can use range only in elasticsearch document fields not some computed values.
The nearest solution i can give you is as below
{
"size" : 0,
"query" :{
"filtered" : {
"query" :{ "match_all": {} },
"filter" : {
"nested": {
"path": "sales",
"filter" : {"term" : {"sales.customer" : "customer1"}}
}
}
}
},
"aggregations" :{
"salesNested" : {
"nested" : {"path" : "sales"},
"aggregations" :{
"aggByrange" : {
"numeric_range": {
**"field": "sales.quantity"**,
"ranges": [
{
"from": 16
}]
}
}
},
"aggregations" : {
"quantityStats" : {
"stats" : {
{ "field" : "sales.quantity" }
}
}
}
}
}
}
In above query we are using "field": "sales.quantity". For your solution use must be able change sales.quantity with sum value of quantityStats aggregation which i think elasticsearch dont provide.

Related

Count number of inner elements of array property (Including repeated values)

Given I have the following records.
[
{
"profile": "123",
"inner": [
{
"name": "John"
}
]
},
{
"profile": "456",
"inner": [
{
"name": "John"
},
{
"name": "John"
},
{
"name": "James"
}
]
}
]
I want to get something like:
"aggregations": {
"name": {
"buckets": [
{
"key": "John",
"doc_count": 3
},
{
"key": "James",
"doc_count": 1
}
]
}
}
I'm a beginner using Elasticsearch, and this seems to be a pretty simple operation to do, but I can't find how to achieve this.
If I try a simple aggs using term, it returns 2 for John, instead of 3.
Example request I'm trying:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
}
}
}
}
How can I possibly achieve this?
Additional Info: It will be used on Kibana later.
I can change mapping to whatever I want, but AFAIK Kibana doesn't like the "Nested" type. :(
You need to do a value_count aggregation, by default terms only does a doc_count, but the value_count aggregation will count the number of times a given field exists.
So, for your purposes:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
},
"aggs": {
"total": {
"value_count": {
"field": "inner.name"
}
}
}
}
}
}
Which returns:
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John",
"doc_count" : 2,
"total" : {
"value" : 3
}
},
{
"key" : "James",
"doc_count" : 1,
"total" : {
"value" : 2
}
}
]
}
}

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Elasticsearch : How to do 'group by' with painless in scripted fields?

I would like to do something like the following using painless:
select day,sum(price)/sum(quantity) as ratio
from data
group by day
Is it possible?
I want to do this in order to visualize the ratio field in kibana, since kibana itself doesn't have the ability to divide aggregated values, but I would gladly listen to alternative solutions beyond scripted fields.
Yes, it's possible, you can achieve this with the bucket_script pipeline aggregation:
{
"aggs": {
"days": {
"date_histogram": {
"field": "dateField",
"interval": "day"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
}
UPDATE:
You can use the above query through the Transform API which will create an aggregated index out of the source index.
For instance, I've indexed a few documents in a test index and then we can dry-run the above aggregation query in order to see how the target aggregated index would look like:
POST _transform/_preview
{
"source": {
"index": "test2",
"query": {
"match_all": {}
}
},
"dest": {
"index": "transtest"
},
"pivot": {
"group_by": {
"days": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
The response looks like this:
{
"preview" : [
{
"quantity" : 12.0,
"price" : 1000.0,
"days" : 1580515200000,
"ratio" : 83.33333333333333
}
],
"mappings" : {
"properties" : {
"quantity" : {
"type" : "double"
},
"price" : {
"type" : "double"
},
"days" : {
"type" : "date"
}
}
}
}
What you see in the preview array are documents that are going to be indexed in the transtest target index, that you can then visualize in Kibana as any other index.
So what a transform actually does is run the aggregation query I gave you above and it will then store each bucket into another index that can be used.
I found a solution to get the ratio of sums with TSVB visualization in kibana.
You may see the image here to see an example.
At first, you have to create two sum aggregations, one that sums price and another that sums quantity. Then, you choose the 'Bucket Script' aggregation to divide the aforementioned sums, with the use of painless script.
The only drawback that I found is that you can not aggregate on multiple columns.

Elasticsearch query_string filter with Fields when not empty string

Im trying to build a query_string with elasticsearch DSL, my query is sql style is like this :
SELECT NAME,DESCRIPTION, URL, FACEBOOK_URL, YEAR_CREATION FROM MY_INDEX WHERE FACEBOOK_URL<>'' and ( Match('NAME: sometext OR DESCRIPTION: sometext )) AND YEAR_CREATION > 2000
I dont know how to include filter for no empty value for FACEBOOK_URL
Thanks for help...
It's very clear about #Kamal's point. You should examine the type of your "FACEBOOK" field, which must be keyword type but not text.
Please see the below mapping, sample documents, the request query and response.
Note that I may not have added all the fields but only the concerned fields so as to mirror the query you've added.
Mapping:
PUT facebook
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"description":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"facebook_url":{
"type": "keyword"
},
"year_creation":{
"type": "date"
}
}
}
}
Sample Docs:
In the below 4 documents, only the 3rd document mentioned would be something that you would want to be returned.
Docs 1 and 2 have empty values of facebook_url while doc 4 does not have the field in the first place at all.
POST facebook/_doc/1
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/2
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/3
{
"name" : "sometext",
"description" : "sometext",
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01"
}
POST facebook/_doc/4
{
"name": "sometext",
"description": "sometext",
"year_creation": "2019-01-01"
}
Request Query:
POST facebook/_search
{
"_source": ["name", "description","facebook_url","year_creation"],
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"name": "sometext"
}
},
{
"match": {
"description": "sometext"
}
}
]
}
},
{
"exists": {
"field": "facebook_url"
}
},
{
"range": {
"year_creation": {
"gte": "2000-01-01"
}
}
}
],
"must_not": [
{
"term": {
"facebook_url": {
"value": ""
}
}
}
]
}
}
}
I think the query would be self-explainable.
I have added Exists query so that if the document does not have that field, it would not be appearing the result, however for empty values I've added a clause in must_not.
Notice that in my design, I've used facebook_url as keyword type as it makes no sense to have it in text type. For that reason, I've used Term Query.
Also note that for date filtering, I've made use of Range Query. Do go through the links for more clarification as it is important to understand more on how each of these query works.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.148216,
"hits" : [
{
"_index" : "facebook",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.148216,
"_source" : {
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01",
"name" : "sometext",
"description" : "sometext"
}
}
]
}
}
Updated Answer:
Change the field of ANNEE_CREATION from integer to Date field as that is the correct type for the Date fields.
You have not applied range query on the date field based on your query in question.
Note that for must_not apply the logic on keyword field of facebook that you have and not on text field.
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":" Bordeaux",
"fields":[
"VILLE",
"ADRESSE",
"FACEBOOK"
]
}
},
{
"exists":{
"field":"FACEBOOK"
}
}
],
"must_not":[
{
"term":{
"FACEBOOK.keyword":{ <------ Make sure this is a keyword field
"value":""
}
}
}
],
"filter":[
{
"range":{
"FONDS_LEVEES_TOTAL":{
"gt":0
}
}
},
{
"range":{ <----- Apply the range query here based on what you've mentioned in question
"ANNEE_CREATION":{ <----- Make sure this is the date field
"gte": "2015" <----- Make sure you apply correct query parameter in range query
}
}
}
]
}
},
"track_total_hits":true,
"from":0,
"size":8,
"_source":[
"FACEBOOK",
"NOM",
"ANNEE_CREATION",
"FONDS_LEVEES_TOTAL"
]
}
As expected only the document having Id 3 is returned as result.

Elasticsearch - bump individual result to the top

I'm working with Elasticsearch. I have an array of documents, and I'm trying to sort documents by the property price, except that I'd like a particular document to be the first result no matter what.
The below is what I'm using as my "sort" array as my attempt to order documents by ID 1213, and then all following documents ordered by price descending.
[
{
"id": {
"mode": "max",
"order": "desc",
"nested_filter": {
"term": {
"id": 1213
}
},
"missing": "_last"
}
},
{
"price": {
"order": "asc"
}
}
]
This doesn't appear to be working, though—document 1213 doesn't appear first. What am I doing wrong here?
As an example—the ideal returned result:
[{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
Instead, I get:
[{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
As others have already asked, what is the reason for the nested_filter?
There's many possible ways to do what you need. Here is one possible way which fits with the simple requirements you mentioned so far:
{
"query" : {
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : {
"term" : {
"id" : "1213"
}
},
"boost" : 2
}
]
}
},
"sort" : [
"_score",
"price"
]
}
The assumption here is that your query is simple like the match_all query and does not affect the scores in anyway. If you do have something more complicated for the queries, to not affect the scores, you can try wrapping with a constant_score query. But ideally you get the document set you want where all the documents have the same score and then custom_filters_score query will boost the score of the document you want. You can do this for any number of documents adding further filters or if the documents are equal, use a terms filter. In the end the sort by the score and then the price.
In this case you need to use function_score to modify score of each doc.
{
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"id": "1213"
}
},
"weight": 1
},
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
],
"score_mode": "sum",
"boost_mode" : "replace",
"query" : {
//YOUR QUERY GOES HERE
}
}
}
}
Explanation:
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
Compute score based on price and give a value < 1. The higher the price the smaller the score (ascending). If you want to switch to descending then just replace it with
"script": "(1 - (1 / doc['price'].value))"
{
"filter": {
term": {
"id": "1213"
}
},
"weight": 1
}
This will give any docs with "id" = 1213 an extra 1 score. The total score at the end will be the sum of those 2 functions.

Resources