ElasticSearch | Randomize results with same score - elasticsearch

In ElasticSearch is it possible to randomize the order of search results with equal score without losing pagination?
I'm hosting a database with thousands of job candidates. When a company are searching for a particular skill (or a combination of skills), it's always the same order (and thus the candidates in the top of search results are having a huge advantage)
Example for a search query:
let params = {
index: 'candidates',
type: 'candidate',
explain: true,
size: size,
from: from,
body: {
_source: {
includes: ['firstName', 'middleName', 'lastName']
},
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
}
}
}
};

Henry's answer is good, but I think it is easier to do:
function_score: {
query: {
...
},
random_score: {
seed: 12345678910,
field: '_seq_no',
weight: 0.0001
},
boost_mode: 'sum'
So there is no need to boost the original score, just weight the random score down so that it contributes little (but still enough to break ties).
I do dislike such approach to break ties though, because even if you are contributing just a little to the score, you could still change order of results between results which do not have the same score, but have the score very close. This is why I opened this feature request.

You could use a function_score query, wrap your bool query in it and add a random_score function. Next step is to find the good weighting that match your needs using "boost" and "boost_mode" or "weight"...
Note that if you use filters the output score will be 0 so you will need to change the "boost_mode" from "multiply" to "replace", "sum" or something else...
Finally, don't forget to add a seed (and field as of ES 7.0) to the random_score to keep a near-consistent pagination
From your example I would suggest something like :
let params = {
...
body: {
...
function_score: {
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
boost: 100
}
},
random_score: {
seed: 12345678910,
field: '_seq_no'
},
boost_mode: 'sum'
}
}
};

Related

Can't sort buckets based on specific fields of complex key

New to Open Search and couldn't really find an answer that worked for this use case. Essentially, my query uses scripts to access field document values within a multi_term search, then aggregates them into buckets reflecting certain metrics. The bucket key is an array of strings in the format of ['val1', 'val2', 'val3'] with an associated key_as_string of 'val1|val2|val3'
My goal is to be able to sort these buckets after aggregation based on any of these 3 values. Problem is, I can't seem to get sorting to work outside of a root "order" entry that sorts by the entire key (I think). Query is here:
aggregations: {
plans: {
multi_terms: {
size: 10000,
terms: [
{
script: "doc['plan.title.keyword'].value"
},
{
script: "doc['plan.type.keyword'].value"
},
{
script: "doc['plan.id.keyword'].value"
}
],
order: { _key: order } // This orders buckets by entire key?
},
aggregations: {
completed: {
filter: {
term: { 'status.keyword': 'Completed' }
}
},
in_progress: {
filter: {
term: { 'status.keyword': 'Started' }
}
},
stopped: {
filter: {
term: { 'status.keyword': 'Stopped' }
}
},
assigned: {
filter: {
term: { 'status.keyword': 'Assigned' }
}
},
my_bucket: {
bucket_sort: {
sort: [{_key: {order: 'asc'}}] // Breaks sort
}
}
}
}
},
The output of the query is correct, but the order of buckets output is not and I can't seem to get it right. I've attempted various ways of implementing bucket_sort to no avail. Feels like there is an easy solution to this and I'm just not finding it. My end goal is to be able to sort the buckets returned by a specified index of the key.
Can anyone tell me what I'm doing wrong here?
Note: Using Open Search v2.3

min_score excluding documents with higher scores

I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks
The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.

Treat multiple clauses in Elasticsearch query as distinct

I have an Elasticsearch query with a 'Should' clause of the following format. The intention is to search for multiple query strings with a single request:
[
{ match: { "name": { query: "Candied Apples" } } },
{ match: { "name": { query: "Canned Pears" } } }
]
As per https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html ES is combining these clauses, so a document named 'Canned Apples and Pears' is getting a higher score than 'Canned Pears', even though 'Canned Pears' is an exact match of one of the query strings. Is there a better way to structure my query so that each clause is evaluated separately?
To be clear, I would want a document with the name 'Canned Apples and Pears' to be returned as part of the search example above, but it should have a lower score than any documents named "Candied Apples" or "Canned Pears", because it does not match any of the search clauses exactly. This means a minimum_should_match value of 100% in not appropriate.
Full disclosure - I'm new to ES!
One approach which seems to achieve the scoring structure I am looking for, is to combine "match_phrase" and "match" for each search term into a single query, e.g:
[
{match_phrase: { 'name': { query: 'Candied Apples' } } },
{match_phrase: { 'name': { query: 'Canned Pears' } } },
{match: { 'name': { query: 'Candied Apples' } } },
{match: { 'name': { query: 'Canned Pears' } } }
]
This means that all full and partial matches are returned, but any successful phrase matches will have a higher score than the results of the 'match' clauses.
Please note that for this to work, the 'match_phrase' clauses must be listed first.

Random document in ElasticSearch

Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N (where N is the number of documents currently indexed)?
And as a follow-up question: if all documents have some numeric field s, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i with value s_i is equal to s_i / sum(s_j for j in index)?
I know it is an old question, but now it is possible to use random_score,
with the following search query:
{
"size": 1,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "1477072619038"
}
}
]
}
}
}
For me it is very fast with about 2 million documents.
I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.
The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:
sort: {
_script: {
script: "Math.random() * 200000",
type: "number",
params: {},
order: "asc"
}
}
You can use that script to make some weighting based on some field of the record.
It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.
You can use random_score with a function_score query.
{
"size":1,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": 11
}
}
],
"score_mode": "sum",
}
}
}
The bad part is that this will apply a random score to every document, sort the documents, and then return the first one. I don't know of anything that is smart enough to just pick a random document.
NEST Way :
var result = _elastic.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs.Functions(f => f.RandomScore())
.Query(fq => fq.MatchAll()))));
raw query way :
GET index-name/_search
"size": 1,
"query": {
"function_score": {
"query" : { "match_all": {} },
"random_score": {}
}
}
}
You can use random_score to randomly order responses or retrieve a document with roughly 1/N probability.
Additional notes:
https://github.com/elastic/elasticsearch/issues/1170
https://github.com/elastic/elasticsearch/issues/7783

Elasticsearch flexible search

Does anyone have experience with Elasticsearch and getting the searches to be more flexible?
Currently, if I have a query "House" it will return the correct items back. but if "Hous" is typed in, nothing gets returned. Also, if I search "O.J." it will return O.J. but if I wanted to search OJ I get nothing.
use prefixing:
bool: {
must: [
{
multi_match: {
query: "your text query",
type: "phrase_prefix",
max_expansions: 4,
fields: ["field1", "field2"]
}
}
]
}
you can also add fuzziness, this will allow dynamic mutations but may yield to less accurate results.

Resources