Elasticsearch: conditionally sort on 2 fields, 1 replaces the other if it exists - ruby

Without scripting, I need to sort records based on rating. The system-rating exists for all records, but a user-rating may or may not exist. If a user-rating does exist I want to use that value in the sort instead of the system-rating, for that particular record and only for that record.
Tried looking into the missing setting but it only allows _first, _last or a custom value (that will be used for missing docs as the sort value):
{
"sort" : [
{ "user_rating" : {"missing" : "_last"} },
],
"query" : {
"term" : { "meal" : "cabbage" }
}
}
...but is there a way to specify the custom value should be system_rating when user_rating is missing?
I can do the following:
query_hash[:sort] = []
if user_rating.exist?
query_hash[:sort] << {
"user_rating" => {
"order": sort_direction,
"unmapped_type": "long",
"missing": "_last",
}
}
end
query_hash[:sort] << {
"system_rating" => {
"order": sort_direction,
"unmapped_type": "long",
}
}
...but that will always sort user rated records on top regardless of the user_rating value.
I know that scripting will allow me to do it but we cannot use scripting. Is it possible?

The only way is scripting or building a custom field at indexing time that will contain the already built value for sorting.

Related

Is there a way to update a document with a Painless script without changing the order of unaffected fields?

I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)

Elasticseach, sort on cross datefields

In my elasticsearch-index, if I have records that looks something like this:
{
"date1": "<someDate>",
"date2": "<someOtherDate>"
}
Is it possible to make a query that gives me the documents in order accross the "date1" and "data2" fields?
For instance, if I have these records:
1: {"date1": "1950-01-01",
"date2": "2000-01-01"}
2: {"date1": "1960-01-01",
"date2": "1951-01-01"}
3: {"date1": "1970-01-01",
"date2": "1950-02-02"}
The order I want to receive them in should be 1,3,2 because 1 has the the earliest date in the date1 field, then 3 has the next one in the date2 field, and then 2 in the date2 field.
Thanks!
According to ElasticSearch documentation, you have two options:
sort using array using Sort mode option
sort using custom sorting script
1. Sorting using array
The first option requires that you change your mapping and put documents like this:
PUT /my_index/my_type/1
{"date1": ["1950-01-01", "2000-01-01"]}
Then you will be able to make a query like this:
GET /my_index/my_type/_search
{
"sort" : [
{ "date1" : {"order" : "asc", "mode": "min"}}
]
}
2. Sorting using custom script
The second option is to write a sorting script, and it works with your document structure. Here is an example:
GET /my_index2/_search
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline":
"if (doc['date1'].value < doc['date2'].value) { doc['date1'].value } else { doc['date2'].value} ",
"params" : {
"factor" : 1.1
}
},
"order" : "asc"
}
}
}
The scripting language that is suggested to use is called painless.
Discussion
Which one to choose is up to you. Performance can be a problem with scripting option, also painless scripting was introduced only in ES 5 (In ES 2.3 the closest equivalent was Groovy, which was not enabled by default as it's dangerous). Sorting using arrays should be faster, since it's a built-in feature, but requires to store data differently.

Exists query for objects inside fields

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html says that it is possible to query for documents that have at least one non-null value in the original field.
If the value of the original fields is an object, is it possible to query for the existence of a key in the object?
Example: a document is
{
"user": {
"name": "XY",
"passport_id": 1234
}
}
Can one make an exists query for user.name? I tried
{
"query": {
"exists" : { "field" : "user.name" }
}
}
but it does not give any results.

Elasticsearch returned fields renaming

In Elasticsearch index , I have field called category , and I want to rename it to cat in the returned array of objects in stead of array of actual value , something like MySQL SELECT category as cat
I tried to use partial_fields
, it returns an array
"partial_fields" : {
"cat" : {
"include" : ["category"]
}
}
but it returns
"fields": {
"cat": [
{
"category": 1
}
]
}
in fact I want it to be something like
"fields": {
"cat": [1]
}
is there any way to do this ?
That's not possible, unfortunately. You'll have to handle this in your application.

Check for id existence in param Array with Elasticsearch custom script field

Is it possible to add a custom script field that is a Boolean and returns true if the document's id exists in an array that is sent as a param?
Something like this https://gist.github.com/2437370
What would be the correct way to do this with mvel?
Update:
Having trouble getting it to work as specified in Imotov's answer.
Mapping:
Sort:
:sort=>{:_script=>{:script=>"return friends_visits_ids.contains(_fields._id.value)", :type=>"string", :params=>{:friends_visits_ids=>["4f8d425366eaa71471000011"]}, :order=>"asc"}}}
place: {
properties: {
_id: { index: "not_analyzed", store: "yes" },
}
}
I don't get any errors, the documents just doesn't get sorted right.
Update 2
Oh, and I do get this back on the documents:
"sort"=>["false"]
You were on the right track. It just might be more efficient to store list of ids in a map instead of an array if this list is large.
"sort" : {
"_script" : {
"script" : "return friends_visits_ids.containsKey(_fields._id.value)",
"type" : "string",
"params": {
"friends_visits_ids": { "1" : {}, "2" : {}, "4" : {}}
}
}
}
Make sure that id field is stored. Otherwise _fields._id.value will return null for all records.

Resources