Elasticsearch - Summing filtered array based on terms aggregation - elasticsearch

I have the following data structure in my documents in elasticsearch which represents a purchase:
{
...
"lineItems": [
{ "id": "1", "quantity": 2 },
{ "id": "2", "quantity": 1 },
]
...
}
I'm trying to work out the most popular product id in a provided date range. Using a terms aggregation I can work out the number of appearances of a product id in baskets, but I'm having trouble summing the quantities to work out how many of an item was purchased.
My current search looks like this:
{
"query": ...,
"size":0,
"aggs":{
"basketAppearances":{
"terms":{
"field":"lineItems.id.keyword"
},
"aggs":{
"timesPurchased":{
"sum":{
"field":"lineItems.quantity"
}
},
"order":{
"bucket_sort":{
"sort":[
{
"timesPurchased":"desc"
}
]
}
}
}
}
}
}
The problem with the above is that it obviously takes the full lineItems array and sums all the values, so basketAppearances is correct and timesPurchased is incorrect. i.e. I get the following result:
{
...
"aggregations":{
"basketAppearances":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"1",
"doc_count":1,
"timesPurchased":{
"value": 3
}
},
{
"key":"2",
"doc_count":1,
"timesPurchased":{
"value": 3
}
}
]
}
}
...
}
I need to sum only the rows in the array with the same ID as the terms bucket it resides. i.e. filter the array based upon the term of the terms aggregation
I appreciate the best answer here is probably to change my data format to have a different document type of "line item" and add a line item per array entry, but the data structure matches my data elsewhere (and it makes sense elsewhere) and I'd ideally like to keep it the same.

Related

Calculate & Sort Sales Differences Across 2 Date Ranges in Elasticsearch 6

I'm using Elasticsearch 6. I want to calculate sales differences (grouped by categories) across 2 date ranges and sort it afterwards.
In the example query below, I used Bucket Script aggregation to evaluate the sales differences before sorting it with Bucket Sort aggregation to obtain the top 10 largest sales difference.
{
"size":0,
"aggs":{
"categories":{
"terms":{
"field":"category"
},
"aggs":{
"months_range":{
"date_range":{
"field":"date",
"ranges":[
{ "from":"01-2015", "to":"03-2015", "key":"start_month" },
{ "from":"03-2015", "to":"06-2015", "key":"end_month" }
],
"keyed":true
},
"aggs":{
"sales":{
"sum":{
"field":"sales_amount"
}
}
}
},
"sales_difference":{
"bucket_script":{
"buckets_path":{
"startMonthSales":"months_range.start_month>sales", // correct syntax?
"endMonthSales":"months_range.end_month>sales" // correct syntax?
},
"script":"params.endMonthSales - params.startMonthSales"
}
},
"sales_bucket_sort":{
"bucket_sort":{
"sort":[
{
"sales_difference":{
"order":"desc"
}
}
],
"size":10
}
}
}
}
}
}
My questions:
Is my bucket path's syntax correct? Is it possible to access individual date_range bucket, e.g. months_range.end_month, in the bucket path?
How's the performance of executing a custom script in Elasticsearch as compared to running similar business logic in the application server?

Sorting a set of results with pre-ordered items

I have a list of pre-ordered items (order by score ASC) like:
[{
"id": "id2",
"score": 1
}, {
"id": "id12",
"score": 1
}, {
"id": "id8",
"score": 1.4
}, {
"id": "id9",
"score": 1.4
}, {
"id": "id14",
"score": 1.75
}, {
...
}]
Let's say I have an elasticsearch index with a massive of items. Note that there's no "score" field in indexed documents.
Now I want elasticsearch to return only those items with ids in the said list. Ok, this one is easy. I'm now stuck at sorting the result. That means I need the result to be sorted exactly as my pre-ordered list above.
Any suggestion for me to achieve that?
I'm not an English native speaker, so sorry for my grammar and words.
As version of 7.4, Elastic introduced pinned query that promotes selected documents to rank higher than those matching a given query. In your case this search query should return what you want:
GET /_search
{
"query": {
"pinned" : {
"ids" : ["id2", "id12", "id8"],
"organic" : {
other queries
}
}
}
}
For more information you can check Elasticsearch official documentation here.

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

how to get the total value of field in kibana 4.x

I need to get the total value of field in Kibana using script field.
e.g) if id field have values like 1, 2, 3, 4, 5
I am looking to sum up all the values of id, I am expecting the output is 15.
I need to achieve the below formula after getting total of each field.
lifetime=a-b-c-(d-e-f)*g
here a,b,c,d,e,f,g all are total of the each field values.
for more info please refer this question which is raised by me.
You could do something like this in your scripted fields:
doc['id'].value
Hence, you could use a sum aggregation to get the total value in Kibana.
This SO could be handy!
EDIT
If you're trying to do it using Elasticsearch, you could do something like this within your request body:
"aggs":{
"total":{
"sum":{
"script":"doc['id'].value"
}
}
}
You could follow up this ref, but then if you're using painless make sure you do include it within lang. related SO
You can definitely use sum aggregations to get the sum of id, but to further equate your formula, you can take a look at pipeline aggregations to use the sum value for further calculations.
Take a look at bucket script aggregation, with proper bucket path to sum aggregator you can achieve your solution.
For sample documents
{
"a":100,
"b":200,
"c":400,
"d":600
}
query
{
"size": 0,
"aggs": {
"result": {
"terms": {"script":"'nice to have it here'"},
"aggs": {
"suma": {
"sum": {
"field": "a"
}
},
"sumb": {
"sum": {
"field": "b"
}
},
"sumc": {
"sum": {
"field": "c"
}
},
"equation": {
"bucket_script": {
"buckets_path": {
"suma": "suma",
"sumb": "sumb",
"sumc" : "sumc"
},
"script": "suma + sumb + 2*sumc"
}
}
}
}
}
}
Now you can surely add term filter on each sum agg to filter the summation for each sum aggregator.

ElasticSearch - sort search results by relevance and custom field (Date)

For example, I have entities with two fields - Text and Date. I want search by entities with results sorted by Date. But if I do it simply, then the result is unexpected.
For search query "Iphone 6" there are the newest texts only with "6" in top of еру results, not with "iphone 6". Without sorting the results seem nice, but not ordered by Date as I want.
How write custom sort function which will consider both relevance and Date? Or may be exist way to give weight to field Date which will be consider in scoring?
In addition, may be I shall want to suppress search results only with "6". How to customize search to find results only by bigrams for example?
Did you tried with bool query like this
{
"query": {
"bool": {
"must": {
"match": {
"field": "iphone 6"
}
}
}
},
"sort": {
"date": {
"order": "desc"
}
}
}
or with your query you can also do this with is more appropriate way of doing i guess ..
just add this as sort
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
all matching results sorted first by date, then by relevance.
The solution is to use _score and the date field both in sort. _score as the first sort order and date field as secondary sort order.
You can use simple match query to perform relevance match.
Try it out.
Data setup:
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-10"
}
POST ecom/prod
{
"name":"iphone 5",
"date":"2019-01-10"
}
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-28"
}
POST ecom/prod
{
"name":"6",
"date":"2019-03-01"
}
Query for relevance and date based sorting:
POST ecommerce/prododuct/_search
{
"query": {
"match": {
"name": "iphone 6"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]
}
You could definitely use a phrase matching query for this.
It does position-aware matching so the documents will be considered a match for your query only if both "iphone" and "6" occur in the searched fields AND that their occurrences respects this order, "iphone" shows up before "6".
looks like you want to sort first by relevance and then by date. this query will do it.
{ "query" : {
"match" : {
"my_field" : "my query"
}
},
"sort": {
"pubDate": {
"order": "desc",
"mode": "min"
}
}
}
When sorting on fields with more than one value, remember that the
values do not have any intrinsic order; a multivalue field is just a
bag of values. Which one do you choose to sort on? For numbers and
dates, you can reduce a multivalue field to a single value by using
the min, max, avg, or sum sort modes. For instance, you could sort on
the earliest date in each dates field by using the above query.
elasticsearch guide sorting
I think your relevance is broken. You should use two different analyzers, 1 for setting up your index and another for searching. like this:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
also you can read more about this here: https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html
Once you fix the relevance then sorting should work correctly.

Resources