In my elasticsearch-index, if I have records that looks something like this:
{
"date1": "<someDate>",
"date2": "<someOtherDate>"
}
Is it possible to make a query that gives me the documents in order accross the "date1" and "data2" fields?
For instance, if I have these records:
1: {"date1": "1950-01-01",
"date2": "2000-01-01"}
2: {"date1": "1960-01-01",
"date2": "1951-01-01"}
3: {"date1": "1970-01-01",
"date2": "1950-02-02"}
The order I want to receive them in should be 1,3,2 because 1 has the the earliest date in the date1 field, then 3 has the next one in the date2 field, and then 2 in the date2 field.
Thanks!
According to ElasticSearch documentation, you have two options:
sort using array using Sort mode option
sort using custom sorting script
1. Sorting using array
The first option requires that you change your mapping and put documents like this:
PUT /my_index/my_type/1
{"date1": ["1950-01-01", "2000-01-01"]}
Then you will be able to make a query like this:
GET /my_index/my_type/_search
{
"sort" : [
{ "date1" : {"order" : "asc", "mode": "min"}}
]
}
2. Sorting using custom script
The second option is to write a sorting script, and it works with your document structure. Here is an example:
GET /my_index2/_search
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline":
"if (doc['date1'].value < doc['date2'].value) { doc['date1'].value } else { doc['date2'].value} ",
"params" : {
"factor" : 1.1
}
},
"order" : "asc"
}
}
}
The scripting language that is suggested to use is called painless.
Discussion
Which one to choose is up to you. Performance can be a problem with scripting option, also painless scripting was introduced only in ES 5 (In ES 2.3 the closest equivalent was Groovy, which was not enabled by default as it's dangerous). Sorting using arrays should be faster, since it's a built-in feature, but requires to store data differently.
Related
I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)
Let's see if someone could shed a light on this one, which seems to be a little hard.
We need to correlate data from multiple index and various fields. We are trying painless script.
Example:
We make a search in an index to gather data about the queueid of mails sent by someone#domain
Once we have the queueids, we need to store the queueids in an array an iterate over it to make new searchs to gather data like email receivers, spam checks, postfix results and so on.
Problem: Hos can we store the data from one search and use it later in the second search?
We are testing something like:
GET here_an_index/_search
{
"query": {
"bool" : {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-15m",
"lte": "now"
}
}
}
],
"filter" : {
"script" : {
"script" : {
"source" : "doc['postfix_from'].value == params.from; qu = doc['postfix_queueid'].value; return qu",
"params" : {
"from" : "someona#mdomain"
}
}
}
}
}
}
}
And, of course, it throws an error.
"doc['postfix_from'].value ...",
"^---- HERE"
So, in a nuttshell: is there any way ti execute a search looking for some field value based on a filter (like from:someone#dfomain) and use this values on later searchs?
We have evaluated using script fields or nested, but due to some architecture reasons and what those changes would entail, right now, can not be used.
Thank you very much!
I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.
Without scripting, I need to sort records based on rating. The system-rating exists for all records, but a user-rating may or may not exist. If a user-rating does exist I want to use that value in the sort instead of the system-rating, for that particular record and only for that record.
Tried looking into the missing setting but it only allows _first, _last or a custom value (that will be used for missing docs as the sort value):
{
"sort" : [
{ "user_rating" : {"missing" : "_last"} },
],
"query" : {
"term" : { "meal" : "cabbage" }
}
}
...but is there a way to specify the custom value should be system_rating when user_rating is missing?
I can do the following:
query_hash[:sort] = []
if user_rating.exist?
query_hash[:sort] << {
"user_rating" => {
"order": sort_direction,
"unmapped_type": "long",
"missing": "_last",
}
}
end
query_hash[:sort] << {
"system_rating" => {
"order": sort_direction,
"unmapped_type": "long",
}
}
...but that will always sort user rated records on top regardless of the user_rating value.
I know that scripting will allow me to do it but we cannot use scripting. Is it possible?
The only way is scripting or building a custom field at indexing time that will contain the already built value for sorting.
Is it possible to compare the datefield to current time and then make a sort by the result of that comparison (something like switch cases in SQL order by)?
The goal is to make documents having an specific datetime field which its value is bigger than current time, move to top of the list but all documents having an specific datetime field less than current time are equal in terms of priority and should not be sorted by this datetime field.
Firstly, you can use microtime to easy usage. And there is script sort feature in Elasticsearch. You can also use if statements in this scripts.
{
"query" : {
....
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"inline": "if (doc['time'].value > current_time) return doc['field_name'].value; else return current_time",
"params" : {
"current_time" : 1476354035
}
},
"order" : "asc"
}
}
}
You should send a current time when you run your query with your script.