filter weekends from date histogram with Elasticsearch - elasticsearch

I have a specific use case that I'm struggling with. To give you more context, I have an index in Elasticsearch that has data related to working days only, so for a specific month I only have working days without weekends or holidays.
The problem for me is the data that I have is only produced in a specific day when there is activity, so if for example no activity has been done on monday 2018-12-09, I will not have this record.
I want to have a date histogram that ignores the weekends only, so I can have a final set of data including working days without activities, to be able to count them at the end.
So for example a query is like that for this month(December) :
{
"aggs" : {
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
}
}
}
I expect to have buckets, ignoring all the weekends.
If you guys have any ideas of how I should handle this issue, please give me your opinion.
Thank you

Related

Get records for particular day of the week in ElasticSearch

I have an ES cluster that has some summarized numerical data such that there is exactly 1 record per day. I want to write a query that will return the documents for a specific day of the week. For example, all records for Tuesdays. Currently I am doing this by getting all records for the required date range and then filtering out the ones for the day that I need. Is there a way to do that with a query?
You can do it using a script like this:
POST my_index/_search
{
"query": {
"script": {
"script": {
"source": "doc.my_date.value.dayOfWeek == 2"
}
}
}
}
If you're going to run this query often, you would be probably better off creating another field dayOfWeek in your document that contains the day of the week that you can then easily query using a term query. It would be more efficient than a script.

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

ElasticSearch - Unique Tags for multiple documents (indexing)

We would like a unique Tag and multiple values in elastic search : to be clearer. We need to do a timeserie graph. So we get values between 2 dates. But of course we have different kinds of data. That where our tags comes. We want to search our tags with an autoCompletion, then choose our values with the dates.
{tag :["sdfsf", "fddsfsd", "fsdfsf"]
{
values : 145.45
date : "2004-10-23"
},
{
values : 556.09
date : "2010-02-13"
}
}
After, a bit of research we found the parent/child technique but because we want to do a completion on tag (in the parent), we need an aggregation which is impossible in ES with "has_parent".
Our solutions is to do :
{
{
tag :["sdfsf", "fddsfsd", "fsdfsf"],
values : 145.45,
date : "2004-10-23"
},
{
tag :null,
values : 556.09,
date : "2010-02-13"
}, {etc...}
}
So we only have one tag easy to check with completion. But it's kind of "ugly".
Does anybody have a correct way to do what we want to do ?
thx in advance

Need an aggregation on datefield in elastic search which gets the count of a day of last six months (i.e. 1st of last six months count)

I have an Index in my elastic search which contains a date field "createdDate". Here, I need to get the count of documents with 1st date of last six months. i.e. I need to get the count of documents on 1st date for the period of last six months (e.g. count of 1st August, 1st July, 1st June, 1st May, 1st Apr, 1st May for september).
It would be a great help if someone looks into this and help.
Thanks..
Try date histogram aggregation.
{
"aggs" : {
"monthly_cont" : {
"date_histogram" : {
"field" : "createdDate",
"interval" : "month"
}
}
}
}
Refer document here

Conditional Sorting in ElasticSearch

I have some documents that I would like to sort on a date field. For documents with date equal to a specified date, example today, and all dates after that I would like to sort ascending. For dates before the specified date I would like to sort in descending order.
Is this possible in ElasticSearch? If so could you suggest any literature or an approach.
date is of type "date" and format "dateOptionalTime".
Thanks
Yes this is possible in ElasticSearch using a script, either for sorting or for scoring.
My preference would be for a scoring script because 'script based score' is going to be quicker (according to the documentation).
Using a scoring script, you could use the Unix timestamp for the date field of type int/long and an mvel sorting script in the custom_score query. You might need to re-index your documents. You would also need to be able to convert the searched for time into a Unix timestamp to pump it at ElasticSearch.
The sorting script would then deduct the requested timestamp from each document's timestamp and make an absolute value. Then the results are sorted in ascending order - the lowest 'distance' is the best.
So when looking for documents dated about a year ago, it would look something like:
"query": {
"custom_score" : {
"query" : {
....
},
"params" : {
"req_date_stamp" : 1348438345,
},
"script" : "abs(doc['timestamp'].value - req_date_timestamp)"
}
},
"sort": {
"_score": {
'order': 'asc'
}
}
(Apologies for any mistakes in my JSON - I tested this idea in pyes)
You might need to tweak this to get the rounding right - for example your question mentions matching days, so you might want to round the timestamp generator to the nearest day.
For "full" info you can check out the Custom Score Query docs and follow the link to MVEL scripting.
For this kind of specific use cases, you should use a sorting script.
See the "script based sorting" section in the Sort documentation page.
My English is poor.
My soluation is boost.
My data is {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211013,20211014,20211015],"sort_value":1}
My query is {"bool":{"must":[],"should":[{"bool":{"must":[{"terms":{"terms_id":[20211012],"boost":5}}],"must_not":[]}},{"bool":{"must_not":[{"terms":{"terms_id":[20211012]}}]}}],"minimum_should_match":1}}
My sort is {"_score":{"order":"desc"},"sort_value":{"order":"desc"}}
Result is{"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211013,20211014,20211015],"sort_value":1}

Resources