min_score excluding documents with higher scores - elasticsearch

I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks

The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.

Related

ElasticSearch | Randomize results with same score

In ElasticSearch is it possible to randomize the order of search results with equal score without losing pagination?
I'm hosting a database with thousands of job candidates. When a company are searching for a particular skill (or a combination of skills), it's always the same order (and thus the candidates in the top of search results are having a huge advantage)
Example for a search query:
let params = {
index: 'candidates',
type: 'candidate',
explain: true,
size: size,
from: from,
body: {
_source: {
includes: ['firstName', 'middleName', 'lastName']
},
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
}
}
}
};
Henry's answer is good, but I think it is easier to do:
function_score: {
query: {
...
},
random_score: {
seed: 12345678910,
field: '_seq_no',
weight: 0.0001
},
boost_mode: 'sum'
So there is no need to boost the original score, just weight the random score down so that it contributes little (but still enough to break ties).
I do dislike such approach to break ties though, because even if you are contributing just a little to the score, you could still change order of results between results which do not have the same score, but have the score very close. This is why I opened this feature request.
You could use a function_score query, wrap your bool query in it and add a random_score function. Next step is to find the good weighting that match your needs using "boost" and "boost_mode" or "weight"...
Note that if you use filters the output score will be 0 so you will need to change the "boost_mode" from "multiply" to "replace", "sum" or something else...
Finally, don't forget to add a seed (and field as of ES 7.0) to the random_score to keep a near-consistent pagination
From your example I would suggest something like :
let params = {
...
body: {
...
function_score: {
query: {
bool: {
must: [/* Left out */],
should: [/* Left out */],
boost: 100
}
},
random_score: {
seed: 12345678910,
field: '_seq_no'
},
boost_mode: 'sum'
}
}
};

Treat multiple clauses in Elasticsearch query as distinct

I have an Elasticsearch query with a 'Should' clause of the following format. The intention is to search for multiple query strings with a single request:
[
{ match: { "name": { query: "Candied Apples" } } },
{ match: { "name": { query: "Canned Pears" } } }
]
As per https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html ES is combining these clauses, so a document named 'Canned Apples and Pears' is getting a higher score than 'Canned Pears', even though 'Canned Pears' is an exact match of one of the query strings. Is there a better way to structure my query so that each clause is evaluated separately?
To be clear, I would want a document with the name 'Canned Apples and Pears' to be returned as part of the search example above, but it should have a lower score than any documents named "Candied Apples" or "Canned Pears", because it does not match any of the search clauses exactly. This means a minimum_should_match value of 100% in not appropriate.
Full disclosure - I'm new to ES!
One approach which seems to achieve the scoring structure I am looking for, is to combine "match_phrase" and "match" for each search term into a single query, e.g:
[
{match_phrase: { 'name': { query: 'Candied Apples' } } },
{match_phrase: { 'name': { query: 'Canned Pears' } } },
{match: { 'name': { query: 'Candied Apples' } } },
{match: { 'name': { query: 'Canned Pears' } } }
]
This means that all full and partial matches are returned, but any successful phrase matches will have a higher score than the results of the 'match' clauses.
Please note that for this to work, the 'match_phrase' clauses must be listed first.

How to query two different fields with different query terms in same request. ElasticSearch 5.x

new to ElasticSearch - start loving it. I am working on a Rails application (using elasticsearch-rails / elasticsearch-model).
I have two fields - both strings consisting of Tags.
about_me & about_you
Now I was to query the about_you of another user with the current users about_me.
At the same time, I wish to query the about_me of the other users with the about_you of the current user.
Does this make sense? Like two fields, two queries and each query is aimed at a particular field.
I just need a hint how this can be achieved in ES. For the sake of completeness, here is the part method I created in my rails model - it is incomplete:
def home_search(query_you, query_me)
search_definition =
{
query: {
multi_match: {
query: query_me,
fields: ['about_you']
}
..... SOMETHINGs MISSING HERE ..... ?
},
suggest: {
text: query,
about_me: {
term: {
size: 1,
field: :about_me
}
},
about_you: {
term: {
size: 1,
field: :about_you
}
}
}
}
self.class.__elasticsearch__.search(search_definition)
end
Any help, link or donations are welcome. Thank you!
I'm not sure I've understood your question but I can suggest two options:
First Use a bool query of type should and minimum_should_match=1. In this case you can write two queries for you'r searches. and If you want to distinguish between results you can pass a _name parameter in each query. something like this:
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"multi_match": {
"query": "query_me",
"fields": [
"about_you"
],
"_name": "about_you"
}
},
{
"multi_match": {
"query": "query_you",
"fields": [
"about_me"
],
"_name": "about_you"
}
}
]
}
}
}
By providing _name you can see which queries are hitted in your search result.
The second approach could be a _msearch query which in which you can pass multiple queries to the endpoint and get the results back.
Here are some useful links:
Bool Query
Named Queries

Search for multiple incomplete words with Elasticsearch

I have a database of records, each of which has a right and a left field, and both these fields contain text. The database is indexed with Elasticsearch.
I want to search through both fields of these records and find the records that contain in any of the fields two or more of the words with certain prefixes. The search should be specific enough to find only the records that contain all words in the query, not just some of them.
For example, a query qui bro should return the record containing the sentence The quick brown fox jumped over the lazy dog, but not the one containing the sentence The quick fox jumped over the lazy dog
I've seen a description of how to perform prefix queries with Elasticsearch (and can reproduce it when searching for one word in one field).
I've also seen a description of how to perform multi-match queries to search through several fields at once.
But what I need is some combination of these techniques, which would allow me both to search through several fields at once, and to look only for parts of words. And to get only those records that have all the words whose parts are contained in the query.
How can I do that? Any method will do (prefixes, ngrams, whatever).
(P.S.: My question may, to a certain extent, be a duplicate of this one, but since it never was answered, I hope I'm not breaking any rules by asking mine.)
======================================
UPDATED:
Oh, I might have the first part of the question. Here is the syntax that seems to work in my Rails app (using elasticsearch-rails gem):
response = Paragraph.search query: {bool: { must: [ { prefix: {right: "qui"}}, {prefix: {right: "bro"}} ] } }
Or, to re-write it in pure Elasticsearch syntax:
{
"bool": {
"must": [
{ "prefix": { "right": "qui" }},
{ "prefix": { "right": "bro" }}
]
}
}
So my updated question now is how to combine this prefix search with multi_match search (to search both through the right and the left field.
OK, here is a possible answer that seems to work. The code has to search through multiple fields for several incomplete words and return only the records that contain all these words.
Here is the request written in elasticsearch-rails syntax:
response = Paragraph.search query: {bool: { must: [ { multi_match: { query: "qui", type: "phrase_prefix", fields: ["right", "left"]}}, { multi_match: { query: "brow", type: "phrase_prefix", fields: ["right", "left"]}}]}}
Or, re-written in the syntax that is used on Elasticsearch site:
{query:
{bool:
{ must:
[
{ multi_match:
{
query: "qui",
type: "phrase_prefix",
fields: ["right", "left"]
}
},
{ multi_match:
{
query: "brow",
type: "phrase_prefix",
fields: ["right", "left"]
}
}
]
}
}
}
This seems to work. But if somebody has other solutions (particularly if these solutions will make the search case-insensitive), I will be happy to hear them.

Elasticsearch flexible search

Does anyone have experience with Elasticsearch and getting the searches to be more flexible?
Currently, if I have a query "House" it will return the correct items back. but if "Hous" is typed in, nothing gets returned. Also, if I search "O.J." it will return O.J. but if I wanted to search OJ I get nothing.
use prefixing:
bool: {
must: [
{
multi_match: {
query: "your text query",
type: "phrase_prefix",
max_expansions: 4,
fields: ["field1", "field2"]
}
}
]
}
you can also add fuzziness, this will allow dynamic mutations but may yield to less accurate results.

Resources