Is it possible to ensure n-matches to a search in Elasticsearch? - elasticsearch

I am trying to figure out how to implement the following in Elasticsearch, and feel as though I have read documentation before on how to do it but can no longer find it.
I have 3 fields I will be searching on; profileIds, title, and description.
For profile ids I'm just searching for an exact term match, which is trivial enough.
I will be having a list of phrases to match against title and description, but I only want to match if there's a total of 3 or more matches with any keyword against the title or description (it doesn't have to be the same keyword on the same field).
I get that I should have a nested Or query setup like so: (matches profile ids, (has 3 matches on title OR description for any of the keywords)) but the part I am struggling with is saying "3 matches".
Is this possible in Elasticsearch?

You can try with bool query together with terms query and using minimum_should_match.
Example:
{
"query": {
"bool": {
"must": [
{
"term": {
"profile_ids": {
"value": "42"
}
}
}
],
"should": [
{
"terms": {
"title": ["foo", "bar", "baz"],
"minimum_should_match": 3
}
},
{
"terms": {
"description": ["a", "b", "c", "d"],
"minimum_should_match": 3
}
}
],
"minimum_should_match": 1
}
}
}
So both of the terms queries must match on at least 3 keywords. And then one of those two should match while the profile_ids must always match.
Note that if you have less than 3 keywords the terms query will match if all terms match.

Related

Search multiple fields and output summed score with Elasticsearch

I have multiple fields, eg. f1, f2, f3, that I want to search a single term against each and return the aggregated score where any field matches. I do not want to search each field by the same terms, only search a field by its own term, eg. f1:t1, f2:t2, f3:t3.
Originally, I was using a must bool query with multi_match and the fields all concatenated as t1 t2 t3 and all fields searched, but the results aren't great. Using a dis_max query gets better results where I'm able to search the individual fields by their own term, but if for example t1 is found in f1 AND t2 in f2 the results from dis_max give back the highest resulting score. So if I have 3 documents with { "f1": "foo", "f2": "foo" }, { "f1": "foo", "f2": "bar" }, { "f1": "foo", "f2": "baz" } and I search for f1:foo and f2:ba I can still get back the first record with f2 of foo in the case where it was created most recently. What I'm trying to do is say that f1 matched foo so there's a score related to that, and f2 matched bar so the resultant score should be f1.score + f2.score always bringing it up to the top because it matches both.
I'm finding that I could programmatically build a query that uses query_string, eg. (limiting to two fields for brevity)
GET /_search
{
"query": {
"query_string": {
"query": "(f1:foo OR f1.autocomplete:foo) OR (f2:ba OR f2.autocomplete:ba)"
}
}
}
but I need to add a boost to the fields and this doesn't allow for that. I could also use a dis_max with a set of queries, but I'm really not sure how to aggregate score in that case.
Using better words, what I'm trying to search for is: if I have people data and I want to search for first name and last name, without searching first by last and last by first, a result that matches both first and last name should be higher than if it only returns one or the other.
Is there a better/good/proper way to achieve this using something? I feel like I've been over a lot of the query API and haven't found something that would be most good.
You can use a simple should query
minimum_should_match:1,
"should" : [
{ "term" : { "f1" : "foo" } },
{ "term" : { "f2" : "ba" } }
]
more clause a document matches , more score it will have.
Unable to edit the answer provided so posting the solution that was derived from the other answer here.
GET _search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"f1": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f1.autocomplete": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f2": {
"query": "ba",
"boost": 1
}
}
},
{
"match": {
"f2.autocomplete": {
"query": "ba",
"boost": 1
}
}
}
]
}
}
}
This gets me results that meet all of my criteria.

Elasticsearch - Impact of adding Boost to query

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.
Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

Boost search results based on tags

I have ElasticSearch with a bunch of TV episodes indexed.
Each episode is tagged with an array of tags describing the key features of the content.
Now I want to implement a "similar to" functionality where I want to search all episodes that have a maximum overlap of tags (but not necessarily all) for a given episode.
Example:
Original Episode Tags: ["a","b","c","d"]
Some Other Episode 1: ["a","b"] // should match, 2 matching tags
Some Other Episode 2: ["a","b","c","x","y"] // should match higher, 3 matching tags
Some Other Episode 2: ["a"] // should match lower, only 1 matching tags
Some Other Episode 3: ["e","f","g"] // shouldn't match, no matching tags
I tried using a boolean query with a should clause but the problem is that once I reach the minimum_should_match requirement the document matches and the rest of the clauses seem to be ignored from the score calculation.
You can use a terms query/filter with a specified boost
"query": {
"bool": {
"must": {
"terms": {
"Tags": ["a", "b", "c","d"],
"boost": 1
}
}
}
}
So the score for 1 would be 2, 2 would be 3 and so on
I think I found the (a) right way to do it:
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{"term":{"tags":"a"}},
{"term":{"tags":"b"}},
{"term":{"tags":"c"}}
]
}
},
"functions": [
{"filter":{"term":{"tags":"a"}},"weight": 5},
{"filter":{"term":{"tags":"b"}},"weight": 5},
{"filter":{"term":{"tags":"c"}},"weight": 5}
]
}
}
}
the should clause makes sure that at least one tag matches on the matching document whereas the functions clause boosts the score of the matching document by 5 for every matching tag.

How to limit the results in a multi match query?

i had used multi match phrase when I make search. However I have to put limit result of all math phrase seperately. I mean, I want to take only 2 result for each multi match. I can't find any limit/size attributes. Do you know any solution?
Example Code:
"query": {
"bool": {
"should": [
{
"match_phrase": {
"text": {
"query": " Home is clear and big ",
"slop": 2
}
}
},
{
"match_phrase": {
"text": {
"query": "365 different company use our system in test",
"slop": 2
}
}
}
]}}
use
{"limit" : 3, "from":0, "query": ...}
The simplest solution is to make to individual searches for each of the conditions. The size parameter can be set to retrieve only the first 2 results for each query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
The boolean should query will not distinguish which condition has been satisfied: it returns documents for which at least one of the two conditions holds. The scores for the two matches will be combined into a single score but it will be impossible to tell which s

Resources