Elasticsearch: Sort by calculated date value

Elasticsearch: Sort by calculated date value - sorting

Is it possible to compare the datefield to current time and then make a sort by the result of that comparison (something like switch cases in SQL order by)?
The goal is to make documents having an specific datetime field which its value is bigger than current time, move to top of the list but all documents having an specific datetime field less than current time are equal in terms of priority and should not be sorted by this datetime field.

Firstly, you can use microtime to easy usage. And there is script sort feature in Elasticsearch. You can also use if statements in this scripts.
{
"query" : {
....
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"inline": "if (doc['time'].value > current_time) return doc['field_name'].value; else return current_time",
"params" : {
"current_time" : 1476354035
}
},
"order" : "asc"
}
}
}
You should send a current time when you run your query with your script.

Related

Performance with nested data in a script field

I am wondering if there is a more performant way of performing a calculation on nested data in a script field or of organizing the data. In the code below, the data will contain values for 50 states and/or other regions. Each user is tied to an area, so the script above will search to see that the averageValue in their area is above a certain threshold and return a true/false value for each matching document.
Mapping
{
"mydata" : {
"properties" : {
...some fields,
"related" : {
"type" : "nested",
"properties" : {
"average_value" : {
"type" : "integer"
},
"state" : {
"type" : "string"
}
}
}
}
}
}
Script
"script_fields" : {
"inBudget" : {
"script" : {
"inline" : "_source.related.find { it.state == default_area && it.average_value >= min_amount } != null",
"params" : {
"min_amount" : 100,
"default_area" : "CA"
}
}
}
}
I have a working solution using the above method, but it slows my query down and I am curious if there is a better solution. I have been toying with the idea of using a inner object with a key, like: related_CA and having each states data in a separate object, however for flexibility I would rather not have to pre-define each region in a mapping (as I may not have them all ahead of time). I feel like I might be missing a simpler/better way and I am open to either reorganizing the data/mapping and/or changes to the script.

Elasticseach, sort on cross datefields

In my elasticsearch-index, if I have records that looks something like this:
{
"date1": "<someDate>",
"date2": "<someOtherDate>"
}
Is it possible to make a query that gives me the documents in order accross the "date1" and "data2" fields?
For instance, if I have these records:
1: {"date1": "1950-01-01",
"date2": "2000-01-01"}
2: {"date1": "1960-01-01",
"date2": "1951-01-01"}
3: {"date1": "1970-01-01",
"date2": "1950-02-02"}
The order I want to receive them in should be 1,3,2 because 1 has the the earliest date in the date1 field, then 3 has the next one in the date2 field, and then 2 in the date2 field.
Thanks!

According to ElasticSearch documentation, you have two options:
sort using array using Sort mode option
sort using custom sorting script
1. Sorting using array
The first option requires that you change your mapping and put documents like this:
PUT /my_index/my_type/1
{"date1": ["1950-01-01", "2000-01-01"]}
Then you will be able to make a query like this:
GET /my_index/my_type/_search
{
"sort" : [
{ "date1" : {"order" : "asc", "mode": "min"}}
]
}
2. Sorting using custom script
The second option is to write a sorting script, and it works with your document structure. Here is an example:
GET /my_index2/_search
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline":
"if (doc['date1'].value < doc['date2'].value) { doc['date1'].value } else { doc['date2'].value} ",
"params" : {
"factor" : 1.1
}
},
"order" : "asc"
}
}
}
The scripting language that is suggested to use is called painless.
Discussion
Which one to choose is up to you. Performance can be a problem with scripting option, also painless scripting was introduced only in ES 5 (In ES 2.3 the closest equivalent was Groovy, which was not enabled by default as it's dangerous). Sorting using arrays should be faster, since it's a built-in feature, but requires to store data differently.

How to return all documents where a string occurs in the document at least N times

If I wanted to return all documents that contain the term beetlejuice, I could use a query like
{
"bool":{
"should":[
{
"term":{
"description":"beetlejuice"
}
}
]
}
}
What's not clear is how to return all documents where the description field contains the string beetlejuice at least 3 times within it. I see minimum_should_match, but I think that is to be used for separate queries in a bool. How can I craft a query to match when a word occurs at least N times within the document's description field?

You can use scripting for achieving what you desired.
Basically, all you need is the term frequency of the desired term in the document field and you can access the value using scripts.
_index['FIELD']['TERM'].tf()
Sample Filter Script :
"filter" : {
"script" : {
"script" : "_index['description']['beetlejuice'].tf() > N",
"params" : {
"N" : 2
}
}
}

How to get elasticsearch most used words?

I am using terms aggregation on elasticsearch to get most used words in a index with 380607390 (380 millions) and i receive timeout on my application.
The aggregated field is a text with a simple analyzer( the field holds post content).
My question is:
The terms aggregation is the correct aggregation to do that? With a large content field?
{
"aggs" : {
"keywords" : {
"terms" : { "field" : "post_content" }
}
}
}

You can try this using min_doc_count. You would ofcourse not want to get those words which have been used just once or twice or thrice...
You can set min_doc_count as per your requirement. This would definitely
reduce the time.
{
"aggs" : {
"keywords" : {
"terms" : { "field" : "post_content",
"min_doc_count": 5 //----->Set it as per your need
}
}
}
}

Ordering term aggregation buckets by sub-aggregration result values

I have two questions about the query seen on this capture:
How do I order by value in the sum_category field in the results?
I use respsize again in the query but it's not correct as you can see below.
Even if I make only an aggregration, why do all the documents come with the result? I mean, if I make a group by query in SQL it retrieves only grouped data, but Elasticsearch retrieves all documents as if I made a normal search query. How do I skip them?

Try this:
{
"query" : {
"match_all" : {}
},
"size" : 0,
"aggs" : {
"categories" : {
"terms" : {
"field" : "category",
"size" : 999999,
"order" : {
"sum_category" : "desc"
}
},
"aggs" : {
"sum_category" : {
"sum" : {
"field" : "respsize"
}
}
}
}
}
}
1). See the note in (2) for what your sort is doing. As for ordering the categories by the value of sum_category, see the order portion. There appears to be an old and closed issue related to that https://github.com/elastic/elasticsearch/issues/4643 but it worked fine for me with v1.5.2 of Elasticsearch.
2). Although you do not have that match_all query, I think that's probably what you are getting results for. And so the sort your specified is actually getting applied to those results. To not get these back, I just have size: 0 portion.
Do you want buckets for all the categories? I noticed you do not have size specified for the main aggregation. That's the size: 999999 portion.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch: Sort by calculated date value - sorting

Related

Performance with nested data in a script field

Elasticseach, sort on cross datefields

How to return all documents where a string occurs in the document at least N times

How to get elasticsearch most used words?

Ordering term aggregation buckets by sub-aggregration result values

Categories

Resources