Search multiple fields and output summed score with Elasticsearch - elasticsearch

I have multiple fields, eg. f1, f2, f3, that I want to search a single term against each and return the aggregated score where any field matches. I do not want to search each field by the same terms, only search a field by its own term, eg. f1:t1, f2:t2, f3:t3.
Originally, I was using a must bool query with multi_match and the fields all concatenated as t1 t2 t3 and all fields searched, but the results aren't great. Using a dis_max query gets better results where I'm able to search the individual fields by their own term, but if for example t1 is found in f1 AND t2 in f2 the results from dis_max give back the highest resulting score. So if I have 3 documents with { "f1": "foo", "f2": "foo" }, { "f1": "foo", "f2": "bar" }, { "f1": "foo", "f2": "baz" } and I search for f1:foo and f2:ba I can still get back the first record with f2 of foo in the case where it was created most recently. What I'm trying to do is say that f1 matched foo so there's a score related to that, and f2 matched bar so the resultant score should be f1.score + f2.score always bringing it up to the top because it matches both.
I'm finding that I could programmatically build a query that uses query_string, eg. (limiting to two fields for brevity)
GET /_search
{
"query": {
"query_string": {
"query": "(f1:foo OR f1.autocomplete:foo) OR (f2:ba OR f2.autocomplete:ba)"
}
}
}
but I need to add a boost to the fields and this doesn't allow for that. I could also use a dis_max with a set of queries, but I'm really not sure how to aggregate score in that case.
Using better words, what I'm trying to search for is: if I have people data and I want to search for first name and last name, without searching first by last and last by first, a result that matches both first and last name should be higher than if it only returns one or the other.
Is there a better/good/proper way to achieve this using something? I feel like I've been over a lot of the query API and haven't found something that would be most good.

You can use a simple should query
minimum_should_match:1,
"should" : [
{ "term" : { "f1" : "foo" } },
{ "term" : { "f2" : "ba" } }
]
more clause a document matches , more score it will have.

Unable to edit the answer provided so posting the solution that was derived from the other answer here.
GET _search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"f1": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f1.autocomplete": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f2": {
"query": "ba",
"boost": 1
}
}
},
{
"match": {
"f2.autocomplete": {
"query": "ba",
"boost": 1
}
}
}
]
}
}
}
This gets me results that meet all of my criteria.

Related

What is the difference between should and boost final score calculation?

I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Elasticsearch Query Help - Multiple Nested AND/OR

I am struggling with elasticsearch filters. I have a company_office type that looks like this:
{
"company_office_id": 1,
"is_headquarters": true,
"company": {
"name": "Some Company Inc"
},
"attribute_values": [
{
"attribute_id": 1,
"attribute_value": "attribute 1 value",
},
{
"attribute_id": 2,
"attribute_value": "ABC",
},
{
"attribute_id": 3,
"attribute_value": "DEF",
},
{
"attribute_id": 3,
"attribute_value": "HIJ",
}
]
}
Let's assume that attribute_value is not_analyzed - so I can match on it exactly.
Now I want to filter on a combination of multiple attribute_id and value fields. Something like this in SQL:
SELECT *
FROM CompanyOffice c
JOIN Attributes a --omitting the ON here, just assume the join is valid
WHERE
c.is_headquarters = true AND
(
(a.attribute_id=2 AND a.attribute_value IN ('ABC')) OR
(a.attribute_id=3 AND a.attribute_value IN ('DEF','HIJ'))
)
So I need to filter on specific fields + multiple combinations of id/value.
Here is the query I tried:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "term": {"is_headquarters": true } },
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
]
}
}
}
}
}
This query is returning results even the company_office does not have any id/value pairing of 1/'HIJ'. My thinking here is that because this bool filter is sitting inside of the parent must section, then all items must be true:
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
Why would this query return results given the data sample provided at the beginning of the question? Is there a different way to write the filter and accomplish what I am trying to do?
Thanks so much for any help!
If you want to query deeper objects without flattening their structure, you need to set
"type": "nested"
on "attribute_values" property.
Then refer how to write nested queries in documentation, and you should correctly retrieve the whole document. Use inner hits to retrieve matched attribute_values.
By default, Elasticsearch does not nest properties when indexing. All subfields get's squashed into separate subfields without ability to query them by their actual structure. You will not see this effect, because original document is returned.
Apart from that, your queries are a bit off. In the last "should" statement, you have only 1 term filter so it's effectively a "must" part, but they will have to be rewritten to nested format.

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Is it possible to ensure n-matches to a search in Elasticsearch?

I am trying to figure out how to implement the following in Elasticsearch, and feel as though I have read documentation before on how to do it but can no longer find it.
I have 3 fields I will be searching on; profileIds, title, and description.
For profile ids I'm just searching for an exact term match, which is trivial enough.
I will be having a list of phrases to match against title and description, but I only want to match if there's a total of 3 or more matches with any keyword against the title or description (it doesn't have to be the same keyword on the same field).
I get that I should have a nested Or query setup like so: (matches profile ids, (has 3 matches on title OR description for any of the keywords)) but the part I am struggling with is saying "3 matches".
Is this possible in Elasticsearch?
You can try with bool query together with terms query and using minimum_should_match.
Example:
{
"query": {
"bool": {
"must": [
{
"term": {
"profile_ids": {
"value": "42"
}
}
}
],
"should": [
{
"terms": {
"title": ["foo", "bar", "baz"],
"minimum_should_match": 3
}
},
{
"terms": {
"description": ["a", "b", "c", "d"],
"minimum_should_match": 3
}
}
],
"minimum_should_match": 1
}
}
}
So both of the terms queries must match on at least 3 keywords. And then one of those two should match while the profile_ids must always match.
Note that if you have less than 3 keywords the terms query will match if all terms match.

Resources