Sorting by value in multivalued field in elasticsearch - sorting

I have a multivalue field with integers in the document, for example
{
values: [1,2,3,4,5]
}
I apply range filter, for example from 2 to 4 and get list of document with values, contains 2,3,4.
Now I'd like to sort results, and first return documents, which contains 3.
I could do it using script sorting:
{
sort:{
_script: {
script: "doc['values'].getValues().contains(3) ? 0 : 1",
type: "number"
}
}
}
But I don't like it's performance, because getValues() returns a List actually, and contains methods is O(n).
Are any better ways?

Related

Represent enum in Elastic Search for sorting

I have a use case to represent an enum for difficulty level (EASY, MEDIUM, DIFFICULT) in elastic search with support of sorting on this field. If this field is indexed as string the sorting will not work as expected.
One way to support this is to index integer values for each enumeration in ES and map it to string values when sorted results are returned by ES.
Are there other alternatives such that ES itself takes care of sorting in the enumeration order while this field is indexed as string? Can I specify custom sort function for a field? function_score is an option, but given that I have to sort based on enum ordering is there better way than defining custom function_score?
In my use case there are multiple such enumeration defining scale across dimensions like difficulty, height (low, medium, high), grades (good, average, poor), etc. Both the above solution requires custom work as a new dimension is introduced. Can either of the above approach be generalzied?
You can check the answer to the same question here. You will need to use script_score like below:
GET /my-index-2/_search
{
"query": {
"script_score": {
"query": {
"match_all":{}
},
"script": {
"source": "if (doc['field name'].value == 'EASY'){2} else if(doc['field name'].value == 'MEDIUM') {1} else if(doc['field name'].value == 'DIFFICULT') {0}"
}
}
}
}

Searching for multiple values in a String array in Elastic

I have a field that I am indexing into Elasticsearch that is an array of strings. So, for example, here is what the string array will look like in two records:
Record 1: {"str1", str2", str3", "str4", "str5"}
Record 2: {"str1", str2", str6", "str7", "str8"}
Question 1: I want to be able to query for multiple strings in this array. For e.g. my query has "str1", "str2". "str3" as the search parameter. I want to search for records where the string array has any of these three strings
Question 2: For the scenario above will Record 1 return with a higher score than record 2 (since all three strings are in the array for record 1 but only two are there in record 2).
Is this possible at all? Can you please help with what the query should look like and if the scoring works the way I stated.
You can index them as an array, such as:
{
"myArrayField": [ "str1", str2", str3", "str4", "str5" ],
...
}
You would then be able to query a number of ways, the simplest for your case being a match query (which is analyzed):
{
"match" : {
"myArrayField" : "str1 str2 str3"
}
}
Or a terms query (which is not analyzed):
{
"terms" : {
"myArrayField" : [ "str1", "str2", "str3" ]
}
}
And Yes, matches against more query terms will receive a higher score, so Record 1 would be scored higher than Record 2.

Can I reference the document itself in an elasticsearch query?

I would like to get all documents where the value of fieldA is greater than the value of fieldB. What is the most efficient way to do this?
Kind of like this:
body: {
query: {
bool: {
must: {
range: {
"fieldA": { gt: "this.fieldB" }
}
}
}
}
}
The most efficient way to do it is by indexing fieldC with the value of fieldA - filedB and using a range filter to find all records with fieldC greater than 0. It can be also a boolean field that will have true if fieldA is greater than fieldB and false otherwise.
If reindexing is not an option it's possible to use script filter to perform this check, but it will essentially mean a full index scan for every search, so it is not going to scale.

Aggregate Terms Usage Count

I'm trying to work out a way of finding the most popular terms and their usage in ElasticSearch. The Terms Aggregation is very close but returns the count of documents that the term appeared in, rather than how many times the term appeared.
For example, imagine an appropriate index has been created to index these example documents:
{ text: 'one two two' }
{ text: 'two three' }
Then executing the following search:
{
aggregations: {
popular_terms: {
terms: {
field: 'text'
}
}
}
}
Will return:
... {
buckets: [
{ key: 'two', value: 2 },
{ key: 'one', value: 1 },
{ key: 'three', value: 1 }
]
}
Is it possible to search with an aggregation counting instances of the terms in a similar way? So in this example returning 3 for the value 'two' as it appears twice in the first document?
Aggregation counts the number of documents based on a criteria (eg: terms ). So it won't return what you are expecting.
For your use case you can probably use the term vector

ElasticSearch, how to search for a document containing a specific array element

I am having a little problem with elasticsearch and wonder if someone can help me solve it.
I have a document containing an array of tuples (publications).
Something like :
{
....
publications: [
{
item1: 385294,
item2: 11
},
{
item1: 395078,
item2: 1
}
]
....
}
The problem i have is for retrieving documents who contain a specific tuple, for exemple (item1 = 395078 AND item2 = 1).
Whatever i try, it seems to always treat item1 and item2 separately, i fail to tell elasticsearch that item1 and item2 must have a specific value inside the same tuple, not accross the whole array...
Is there something i'm missing here ?
Thanks
This is not possible in the straight way.
ElasticSearch flattens the array before checking for condition.
Which mean
elasticSearch matches
a=x AND b=y1 to [{a=x,b=y},{a=x1,b=y1}] which doesnt happen in the conventianal array checking.
What you can do here is
Usage of nested type - https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html (but for each element in array , an extra document would be created)
Store the array as
publications: [
{
385294:11
},
{
395078:1
}
]

Resources