Treat multiple clauses in Elasticsearch query as distinct - elasticsearch

I have an Elasticsearch query with a 'Should' clause of the following format. The intention is to search for multiple query strings with a single request:
[
{ match: { "name": { query: "Candied Apples" } } },
{ match: { "name": { query: "Canned Pears" } } }
]
As per https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html ES is combining these clauses, so a document named 'Canned Apples and Pears' is getting a higher score than 'Canned Pears', even though 'Canned Pears' is an exact match of one of the query strings. Is there a better way to structure my query so that each clause is evaluated separately?
To be clear, I would want a document with the name 'Canned Apples and Pears' to be returned as part of the search example above, but it should have a lower score than any documents named "Candied Apples" or "Canned Pears", because it does not match any of the search clauses exactly. This means a minimum_should_match value of 100% in not appropriate.
Full disclosure - I'm new to ES!

One approach which seems to achieve the scoring structure I am looking for, is to combine "match_phrase" and "match" for each search term into a single query, e.g:
[
{match_phrase: { 'name': { query: 'Candied Apples' } } },
{match_phrase: { 'name': { query: 'Canned Pears' } } },
{match: { 'name': { query: 'Candied Apples' } } },
{match: { 'name': { query: 'Canned Pears' } } }
]
This means that all full and partial matches are returned, but any successful phrase matches will have a higher score than the results of the 'match' clauses.
Please note that for this to work, the 'match_phrase' clauses must be listed first.

Related

Atlas Search Index partial match

I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3

min_score excluding documents with higher scores

I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks
The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.

Elasticsearch search result relevance issue

Why does match query return less relevant results first? I have an index field named normalized. Its mapping is:
normalized: {
type: "text"
analyzer: "autocomplete"
}
settings for this field are:
analysis; {
filter: {
autocomplete_filter: {
type: "edge_ngram",
min_gram => "1",
max_gram => "20"
}
analyzer: {
autocomplete: {
filter: [
"lowercase",
"asciifolding",
"autocomplete_filter"
],
type: "custom",
tokenizer: "standard"
}
}
so as I know it makes an ascii, lowercase, tokens e.g. MOUSE = m, mo, mou, mous, mouse.
The problem is that request like:
{
'query': {
'bool': {
'must': {
'match': {
'normalized': 'simag'
}
}
}
}
}
returns results like
"siman siman service"
"mgr simona simunkova simiki"
"Siman - SIMANS"
"simunek simunek a simunek"
.....
But there is no SIMAG which contains all the letters of the match phrase.
How to achieve that most relevant result will be the words which contains all the letters before the tokens which does not contain all letters.
Hope somebody understand what I need.
Thanks.
PS: I am not sure but what about this query:
{
'query': {
'bool': {
'should': [
{'term': {'normalized': 'simag'}},
{'match': {'normalized': 'simag'}}
]
}
}
}
Does it make sense in comparison to previous code?
Please note that match query is analyzed, which means the same analyzer is used at the query time, which was used at the index time for the field you mentioned in your query.
In your case, you applied autocomplete analyzer on your normalized field and as you mentioned, it generates below token for MOUSE :
MOUSE = m, mo, mou, mous, mouse.
In similar way, if you search for mouse using the match query on the same field, it would search for below query strings :-
m, mo, mou, mous, mouse .. hence results which contain the words like mousee or mouser will also come as during index .. it created tokens which matches with the tokens generated on the search term.
Read more about match query on Elastic site https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html first line itself explains your search results
match queries accept text/numerics/dates, analyzes them, and
constructs a query:
If you want to go deep and understand, how your search query is matching the documents and its score use explain API
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

Scope Elasticsearch Results to Specific Ids

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?
Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!
As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}
You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.
According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);
You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Boost a document based on the existence of a field

Is it possible to boost a document's relevance based on the presence of a field? I've read about function score queries but I'm wondering how existence is taken into account - from my understanding, the field_value_factor applies to the content of the field, not on its presence.
Function score query is a possibility, however a score function is computationally expensive and not necessary (keep in mind exists query used below may not have been available in 2014).
Can do the following:
POST _search
{
"query": {
"bool" : {
"should" : [
{
match_all: {
"boost": 10
}
},
{
"exists": {
"field": "some_field_that_should_exist"
}
}
],
"minimum_should_match" : 2,
}
}
}
With a minimum should match of 2 we say that both clauses must match in order for the should clause to match. This prevents the boost from being applied to documents that do not have the field.
This can be simplified. Exists is a constant score query, boost can be used with it directly.
{
"query":{
"exists":{
"field":"some_field_that_should_exist",
"boost":10
}
}
}

Resources