Elastic search find difference in a field using range query - elasticsearch

I have to find out how many KWH has been run between two given time. For now, I am having 2 queries to find out last and the first record between the time using asc and desc sorting and doing subtraction to get the KWH value between the time is there any other way to get the KWH without 2 queries
Range query:
"query": {
"bool": {
"must": [
{
"range": {
"createdtime": {
"gte": "1566757800000",
"lte": "1566844199000",
"boost": 2.0
}
}
},
{
"match": {
"meter_id": 101
}
}
]
}
},
"size" : 1,
"from": 0,
"sort": { "createdtime" : {"order" : "desc"} }
}
another query is almost same except the order is asc
So both the 2 queries will return the record, and I am doing the subtractions in the result set to find out the differences.

You could run one query only and use top_hits aggregation to extract the "first" and "last" value, but it won't calculate the difference. You'd still have to do it outside Elasticsearch.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"createdtime": {
"gte": "1566757800000",
"lte": "1566844199000",
"boost": 2.0
}
}
},
{
"match": {
"meter_id": 101
}
}
]
}
},
"aggs": {
"range": {
"filter": {
"range": {
"createddate": {
"gte": "2016-08-19T10:00:00",
"lte": "2016-08-23T10:00:00"
}
}
},
"aggs": {
"min": {
"top_hits": {
"sort": [{"createddate": {"order": "asc"}}],
"_source": {"includes": [ "kwh_value" ]},
"size" : 1
}
},
"max": {
"top_hits": {
"sort": [{"createddate": {"order": "desc"}}],
"_source": {"includes": [ "kwh_value" ]},
"size" : 1
}
}
}
}
}
}

Related

How to get 3 random search results in elasticserch query

I have my elasticsearch query that returns record between the range of publishedDates:
{
query : {
bool: {
filter: [
],
must: {
range: {
publishedDate: {
gte: "2018-11-01",
lte: "2019-03-30"
}
}
}
}
}
from: 0,
size: 3,
}
I need to show 3 random results every time I send this query
It is mentioned in the elastic search documentation that I can send a seed to get random results:
After following the documentation, I updated my query as:
{
"query" : {
"bool": {
"filter": [
],
"must": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
}
},
"function_score": {
"functions": [
{
"random_score": {
"seed": "123123123"
}
}
]
}
},
"from": 0,
"size": 3
}
But it is not working (saying query is malformed), can anyone suggest how to correct this query to return 3 random search results.
If you just need random results returned, you could restructure the query to be similar to the following
{
"query": {
"function_score": {
"query": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
},
"boost": "5",
"random_score": {},
"boost_mode": "multiply"
}
},
"from": 0,
"size": 3
}
Modified from the elastic documentation -
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Need aggregation on document inner array object - ElasticSearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID
Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.
Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

How to get aggregations and sort based on child document hits in elasticsearch

I have documents indexed like so:
{
"attrib": "value", // etc
"prices": [
{
"p": 10,
"d": "2016-01-01"
},
{
"p": 20,
"d": "2016-01-02"
},
{
"p": 30,
"d": "2016-01-03"
},
{
"p": 40,
"d": "2016-01-04"
}
]
}
I would like to get aggregation buckets to tell me something like this:
Price Buckets
prices.p between 1 and 10 (20)
prices.p between 11 and 20 (22)
prices.p between 21 and 30 (2)
Date Buckets
prices.d between Jan 1 and Jan 30 (20)
prices.d between Feb 1 and Feb 28 (22)
prices.d between Mar 1 and Mar 30 (2)
Where the count would show the number of parent documents that have prices.X between X and Y, NOT the number of prices in total.
Secondary to this, if I wanted to perform a filter to only get documents with prices.p between 1 and 30, I'd need the aggregation to reflect this.
Thirdly, I'd like to be able to order my results by the top child hit of the result.
So in plain English, my query would be:
"Find me all documents with at least one price between X and Y having a date between A and B, order the results by price (or date as required)"
My query so far:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 1,
"lte": 30
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
This returns documents in the default sort order, but with the inner_hits sorted by prices.p - so then I can display the lowest price for an item, alongwith the date for that price (prices.d).
Similarly, I'd like to be able to filter where prices.d is between two dates - also aggregating the dates.
Lastly, I'd like to be able to order my full document hits by the first inner hit (p or d)
You have to use nested aggregations. To build two buckets, you have to run two parallel nested aggregations.
To filter furthur more your bucket, you can add a parent query which will filter your document set as well as your buckets.
Following is the query, I changed nested d type to integer for my simplicity, but this will work for you on date range as well.
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"prices.p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
Furthur more if you want to filter only document sets, but you don't want your query to effect your buckets, you can take a look at post_filter
Edit - To sort parent document based on first inner_hit inside prices nested type, use the following query.
You don't need to have a sort clause inside innerhits, as sort inside innerhits is used to sort the nested type only not the parent doc
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
},
"sort": {
"_script" : {
"type" : "number",
"script" : {
"inline": "_source.prices[0].p"
},
"order" : "asc"
}
}
}

Elasticsearch single request to do Union query Top N

Not sure how to do SQL like union in Elasticsearch. I tried bool query but it doesn't meet my requirement yet. For example, the document structure is
{
"id": "123",
"authorId": 28,
"title": "Five Ways to Tap into...",
"byLine": "ashd jsabbdjs international",
"category": "Cat1"
}
I need to find top 5 matched "title" in each "category" when user types something. This can be done using multiple queries to Elasticsearch, but I was wondering if there are other ways to do it in one request.
Use an aggregation with top_hits sub-aggregation:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}
Here is query which returns multi buckets based on "category"
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"authorId": [
1,
28
]
}
}
],
"should": [
{
"query_string": {
"query": "*int*",
"fields": [
"title^2",
"byLine^1"
]
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}

ElasticSearch - significant term aggregation with range

I am interested to know how can I add a range for a significant terms aggregations query. For example:
{
"query": {
"terms": {
"text_content": [
"searchTerm"
]
},
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
},
"aggregations": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
will not work. Any suggestions on how to specify the range?
Instead of using a range query, use a range filter as the relevance/score doesn't seem to matter in your case.
Then, in order to combine your query with a range filter, you should use a filtered query (see documentation).
Try something like this :
{
"query": {
"filtered": {
"query": {
"terms": {
"text_content": [
"searchTerm"
]
}
},
"filter": {
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
}
}
},
"aggs": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
Hope this helps!

Resources