Elasticsearch - scripted sorting by date(s) field ascending with specific input date - elasticsearch

Lets assume i have this kind of document (events):
[
{
"title": "Foo",
"dates": [
"2019-07-01",
"2019-07-15",
"2019-08-01"
]
},
{
"title": "Bar",
"dates": [
"2019-07-18"
]
}
]
And i want to perform a search that finds all events happening in a given datespan, so i search for all events that take place in between the 10th of july (2019-07-10) and the 10th auf august (2019-08-10).
The query wont be the problem but if i want to display the events in an ascending order (so the "Foo" event is ranked before because the 15th of july comes before the 18th) - how do i sort that correctly?
I thought about using a script to sort my documents, maybe by filtering the dates so only valid dates remain and then use the timestamp from the first date value to do a simple numeric ordering. But how would i filter the dates by the script?
Could i use a "script field" and pass my fromDate and toDate to copy the filtered dates onto a new field (lets say validDates) and use THAT field for sorting?
BTW: the index will only contain a few thousand events.
UPDATE:
After some research it looks like this can not be done without using nested objects instead of arrays. Please correct me if i am wrong, i would have preferred to use a simple array of dates over the nested type...

Related

Oracle SODA - How to sort on created using REST?

I can't work out how to use the $orderby with SODA on an id field (such as created or lastModified. I'm using SODA for REST directly and not the other projects.
Sort syntax is:
{
$orderby: {
path: 'created',
datatype: 'date',
order: 'desc'
}
}
And I've also tried:
{
"$orderby": {
"$fields": [{
"path": "created",
"datatype": "date",
"order": "desc"
}],
"$scalarRequired": true
}
}
And replacing the path with $id: 'created' (as you can use that in a filter specification to access non-document metadata. But nothing works to order properly.
Short of putting the created field into my object when I create them (which defeats the purpose of having those fields) how can I use orderby on a metadata field?
Max here from the SODA dev team. I am not 100% sure what you mean by an "id field". Looks like you mean the "created on" and "last modified" document components automatically maintained by SODA, right? If so, we don't support orderbys on these (though it could be added as an enhancement).
As of now, as you mentioned in your post, best option is to create a field in your JSON documents' content and set it to ISO8601 format timestamp value (e.g. 2020-10-13T07:01:01). You can then do an orderby on such a field (with datatype "datetime"). Please let me know if more details on this are needed.
In SODA REST, when you're listing collection contents, you could specify since=timestamp and until=timestamp query parameters. That'll give you all documents with last modified timestamp greater than the "since" one, and less than or equal to the "until" one.
Example:
http://host:port/ords/scott/soda/latest/myColl?since=2020:01:01T00:00:00&until=2021:01:01T00:00:00
As part of this operation, SODA automatically adds an orderby on "last modified". Not sure if that's useful to you though, since that's just for listing all documents in the collection (i.e. you can't combine it with a QBE, for example). So if this doesn't meet your needs, best option right now is to explicitly add something like a "modified' field to the document content, and do an orderby on that.

Elasticsearch simoultaneous sort

I have multiple es indices - each of the tables from the database is kept in separate one.
Each of those indices has different mapping, although they present similar data. And so for dates I have:
In index1 - First_date which is datetype (2018-05-22).
In index2 - First_date which is integer (2018)
In index3 - justDate - integer (2017)
In index4 - date - string ("May 2018")
Is there way of sorting by all of these field simultaneously? I guess the answer might be script sorting, however I'm interested if this can be achieved in any other way.
If not, maybe at least same can be done for fields with same field type.
It could've look like this:
POST index1,index2,index3,index4/_search
{
"sort": [
{"First_date": {"order": "desc"}},
{"justDate": {"order": "desc"}},
{"date": {"order": "desc"}}
]
}
But I assume you want to sort by all of the fields in date order which this query will not give you.
Solving this task with a script will bring unnecessary calculations on query time.
I would suggest you to create date-format field in each index and fill it on index time. In this case query above will work as is.

Elasticsearch: Aggregate documents based on date range

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date.
I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket.
This means that a single document might be in multiple buckets.
Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.
Sample doc content:
{
"start_date": "2013-01-12T00:00:00.000Z",
"end_date": "2016-12-08T00:00:00.000Z",
"id": "123123123"
}
Is there a way to do this in ES?
I have found one way to do this, using filter aggregations (
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:
"bool":{
"must":[{
"range":{
"start_date":{
"lte":"2016-01-01T00:00:00.000Z"
}
}
},{
{
"range":{
"end_date":{
"gt":"2016-02-01T00:00:00.000Z"
}
}
}]
}
However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.
I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".
Here are the documentation links for both of these aggregations.
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

Kibana 4.1 - use JSON input to create an Hour Of Day field from #timestamp for histogram

Edit: I found the answer, see below for Logstash <= 2.0 ===>
Plugin created for Logstash 2.0
Whomever is interested in this with Logstash 2.0 or above, I created a plugin that makes this dead simple:
The GEM is here:
https://rubygems.org/gems/logstash-filter-dateparts
Here is the documentation and source code:
https://github.com/mikebski/logstash-datepart-plugin
I've got a bunch of data in Logstash with a #Timestamp for a range of a couple of weeks. I have a duration field that is a number field, and I can do a date histogram. I would like to do a histogram over hour of day, rather than a linear histogram from x -> y dates. I would like the x axis to be 0 -> 23 instead of date x -> date y.
I think I can use the JSON Input advanced text input to add a field to the result set which is the hour of day of the #timestamp. The help text says:
Any JSON formatted properties you add here will be merged with the elasticsearch aggregation definition for this section. For example shard_size on a terms aggregation which leads me to believe it can be done but does not give any examples.
Edited to add:
I have tried setting up an entry in the scripted fields based on the link below, but it will not work like the examples on their blog with 4.1. The following script gives an error when trying to add a field with format number and name test_day_of_week: Integer.parseInt("1234")
The problem looks like the scripting is not very robust. Oddly enough, I want to do exactly what they are doing in the examples (add fields for day of month, day of week, etc...). I can get the field to work if the script is doc['#timestamp'], but I cannot manipulate the timestamp.
The docs say Lucene expressions are allowed and show some trig and GCD examples for GIS type stuff, but nothing for date...
There is this update to the BLOG:
UPDATE: As a security precaution, starting with version 4.0.0-RC1,
Kibana scripted fields default to Lucene Expressions, not Groovy, as
the scripting language. Since Lucene Expressions only support
operations on numerical fields, the example below dealing with date
math does not work in Kibana 4.0.0-RC1+ versions.
There is no suggestion for how to actually do this now. I guess I could go off and enable the Groovy plugin...
Any ideas?
EDIT - THE SOLUTION:
I added a filter using Ruby to do this, and it was pretty simple:
Basically, in a ruby script you can access event['field'] and you can create new ones. I use the Ruby time bits to create new fields based on the #timestamp for the event.
ruby {
code => "ts = event['#timestamp']; event['weekday'] = ts.wday; event['hour'] = ts.hour; event['minute'] = ts.min; event['second'] = ts.sec; event['mday'] = ts.day; event['yday'] = ts.yday; event['month'] = ts.month;"
}
This no longer appears to work in Logstash 1.5.4 - the Ruby date elements appear to be unavailable, and this then throws a "rubyexception" and does not add the fields to the logstash events.
I've spent some time searching for a way to recover the functionality we had in the Groovy scripted fields, which are unavailable for scripting dynamically, to provide me with fields such as "hourofday", "dayofweek", et cetera. What I've done is to add these as groovy script files directly on the Elasticsearch nodes themselves, like so:
/etc/elasticsearch/scripts/
hourofday.groovy
dayofweek.groovy
weekofyear.groovy
... and so on.
Those script files contain a single line of Groovy, like so:
Integer.parseInt(new Date(doc["#timestamp"].value).format("d")) (dayofmonth)
Integer.parseInt(new Date(doc["#timestamp"].value).format("u")) (dayofweek)
To reference these in Kibana, firstly create a new search and save it, or choose one of your existing saved searches (Please take a copy of the existing JSON before you change it, just in case) in the "Settings -> Saved Objects -> Searches" page. You then modify the query to add "Script Fields" in, so you get something like this:
{
"query" : {
...
},
"script_fields": {
"minuteofhour": {
"script_file": "minuteofhour"
},
"hourofday": {
"script_file": "hourofday"
},
"dayofweek": {
"script_file": "dayofweek"
},
"dayofmonth": {
"script_file": "dayofmonth"
},
"dayofyear": {
"script_file": "dayofyear"
},
"weekofmonth": {
"script_file": "weekofmonth"
},
"weekofyear": {
"script_file": "weekofyear"
},
"monthofyear": {
"script_file": "monthofyear"
}
}
}
As shown, the "script_fields" line should fall outside the "query" itself, or you will get an error. Also ensure the script files are available to all your Elasticsearch nodes.

Conditional Sorting in ElasticSearch

I have some documents that I would like to sort on a date field. For documents with date equal to a specified date, example today, and all dates after that I would like to sort ascending. For dates before the specified date I would like to sort in descending order.
Is this possible in ElasticSearch? If so could you suggest any literature or an approach.
date is of type "date" and format "dateOptionalTime".
Thanks
Yes this is possible in ElasticSearch using a script, either for sorting or for scoring.
My preference would be for a scoring script because 'script based score' is going to be quicker (according to the documentation).
Using a scoring script, you could use the Unix timestamp for the date field of type int/long and an mvel sorting script in the custom_score query. You might need to re-index your documents. You would also need to be able to convert the searched for time into a Unix timestamp to pump it at ElasticSearch.
The sorting script would then deduct the requested timestamp from each document's timestamp and make an absolute value. Then the results are sorted in ascending order - the lowest 'distance' is the best.
So when looking for documents dated about a year ago, it would look something like:
"query": {
"custom_score" : {
"query" : {
....
},
"params" : {
"req_date_stamp" : 1348438345,
},
"script" : "abs(doc['timestamp'].value - req_date_timestamp)"
}
},
"sort": {
"_score": {
'order': 'asc'
}
}
(Apologies for any mistakes in my JSON - I tested this idea in pyes)
You might need to tweak this to get the rounding right - for example your question mentions matching days, so you might want to round the timestamp generator to the nearest day.
For "full" info you can check out the Custom Score Query docs and follow the link to MVEL scripting.
For this kind of specific use cases, you should use a sorting script.
See the "script based sorting" section in the Sort documentation page.
My English is poor.
My soluation is boost.
My data is {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211013,20211014,20211015],"sort_value":1}
My query is {"bool":{"must":[],"should":[{"bool":{"must":[{"terms":{"terms_id":[20211012],"boost":5}}],"must_not":[]}},{"bool":{"must_not":[{"terms":{"terms_id":[20211012]}}]}}],"minimum_should_match":1}}
My sort is {"_score":{"order":"desc"},"sort_value":{"order":"desc"}}
Result is{"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211013,20211014,20211015],"sort_value":1}

Resources