Match query fuzzily to an array of candidates - elasticsearch

I have an index in elastic with the following document structure:
{
"questions": [
"What is your name?",
"How are you called?",
"What should I call you?",
...
],
"answer": "<answer>"
}
I would like to match queries to one of the entries in the questions array.
For example the query "What's your name"?
The returning document should be the one with the closest matching entry of questions in all the documents in the index.
I have tried:
{
"query": {
"match": { "questions": { "query": "<question>", "fuzziness": "auto" } },
}
}
But that sometimes returns a "wrong" document, even if the query is one of the entries of questions in one of the documents exactly.
I've also tried
{
"query": {
"match_phrase": { "questions": "<query>" },
}
}
But that doesn't allow fuzziness, and since the queries are human inputs, it's not catching enough cases
And lastly I tried
{
"query": {
"span_near": [
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<first word of the query>" },
}
}
},
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<second word of the query>" },
}
}
},
...
]
}
}
But that (at least as far as I seem to notice) only matches questions exactly with fuzzy words.
What I would like (at least as far as I understand), is a fuzzy TF-IDF across all entries of questions, get the best match and then rank the documents according to the best matches of one of the entries of questions (not the entirety of the questions array)
I'm a pretty inexperienced novice when it comes to Elastic, so I appreciate any tips and tricks or outright solutions you might have for me, thank you!

Related

Counting the SEARCH term/phrase in a specific field in Elasticsearch

I have this type of data
{
"name_id": 2145
"address": "Antartica"
"characteristics" : "He is a very nice person with very nice personality. the nicest thing about him is his nice dog"
}
now I am running this query
GET friends/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name_id.keyword": "B08F2BWX2V"
}
},
{
"match_phrase": {
"characteristics": "nice"
}
}
]
}
}
}
is there a way i can get the results and the word count i.e,
nice : 4
There is an elastic api that can return the token count information you need.
It is the Term vectors API.
I'm not sure if it will be exactly what you need but I saw in the post below a question similar to yours:
https://stackoverflow.com/a/69734423/18778181

Give more score to documents that contains all query terms

I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?
One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Elasticsearch complex proximity query

Given that I have a query like below:
council* W/5 (tip OR tips)
The above query can be translated as: Find anything that has council* and (tip OR tips) no more than 5 words apart.
So following text will match:
Shellharbour City Council Tip
council best tip
councils top 10 tips
But this one should not match:
... City Council at Shellharbour. There is not any good tip at all.
I need help to build an elasticsearch query for that. I was thinking about Regex query but I'm not quite sure about better alternatives. Thanks
You can use a combination of the span_near query, span_multi and span_or. We can use the query below to perform the same search.
{
"query": {
"span_near": {
"clauses": [
{
"span_multi":
{
"match":
{
"prefix": { "text": "council"}
}
}
},
{
"span_or": {
"clauses": [
{
"span_term": {
"text": {
"value": "tip"
}
}
},
{
"span_term": {
"text": {
"value": "tips"
}
}
}
]
}
}
],
"slop": 5,
"in_order": true
}
}
}
The important things to look out for are the span_term which is the text your searching for. In this example I only had one field called "text". Slop indicates the number of words we will allow between the terms, and in_order indicates that the order of words is important. So "tip council" will not match, where as "council tip" will.

Resources