Get from ElasticSearch why a result is a hit - elasticsearch

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.

I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

Related

How to make Elastic Engine understand a field is not to be analyzed for an exact match?

The question is based on the previous post where the Exact Search did not work either based on Match or MatchPhrasePrefix.
Then I found a similar kind of post here where the search field is set to be not_analyzed in the mapping definition (by #Russ Cam).
But I am using
package id="Elasticsearch.Net" version="7.6.0" targetFramework="net461"
package id="NEST" version="7.6.0" targetFramework="net461"
and might be for that reason the solution did not work.
Because If I pass "SOME", it matches with "SOME" and "SOME OTHER LOAN" which should not be the case (in my earlier post for "product value").
How can I do the same using NEST 7.6.0?
Well I'm not aware of how your current mapping looks. Also I don't know about NEST as well but I will explain
How to make Elastic Engine understand a field is not to be analyzed for an exact match?
by an example using elastic dsl.
For exact match (case sensitive) all you need to do is to define the field type as keyword. For a field of type keyword the data is indexed as it is without applying any analyzer and hence it is perfect for exact matching.
PUT test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
}
}
}
}
Now lets index some docs
POST test/_doc/1
{
"field1":"SOME"
}
POST test/_doc/2
{
"field1": "SOME OTHER LOAN"
}
For exact matching we can use term query. Lets search for "SOME" and we should get document 1.
GET test/_search
{
"query": {
"term": {
"field1": "SOME"
}
}
}
O/P that we get:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"field1" : "SOME"
}
}
]
}
}
So the crux is make the field type as keyword and use term query.

Querying against an array only works for first element

I am having an issue searching for items within an array in a document; a simple tagging system in my case. I have a relatively simple document representing a recipe. This is a truncated version of the data in the index:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 37,
"title" : "Crab Cakes",
"tags" : [
"seafood",
"appetizer"
]
}
}
When I search for the tag seafood it matches this recipe. However, when I search for the tag appetizer, I get nothing. Here is the explain for a very basic appetizer query:
curl -XGET 'http://localhost:9200/recipes/recipe/37/_explain?pretty' -H 'Content-Type: application/json' -d'{"query":{"term":{"tags":"appetizer"}}}'
Which results in this:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}
Correct answer came in comments from sramalingam24. It was to change the query to a match instead of a term.
[updated]
I also tested switching the tags to be a keyword field and that works as well. This is the solution I ended up going with.

More like this query with ElasticsearchTemplate

In "More like this query" we can specify not only documents, but some other text:
{
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "1"
},
"and potentially some more text here as well"
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
Is it possible using Spring Data and Elasticsearch Template to add a set of documents and a chunk of text to query like the example above? Reading https://docs.spring.io/spring-data/elasticsearch/docs/current/api/ it seems that MoreLikeThisQuery class does not suppport that, Am I right? Is the only way to make that query?

Elasticsearch: Nested query under a boolean 'should' not returning results

I'm running the following query (it has been shortened for clarity):
body : {
query : {
bool : {
must : [
{
match : {
active : 1
}
},
],
should : [
{
term : {
apply : '2'
}
},
{
nested : {
path : 'items',
query : {
terms : {
'items.product' : ["1","2"]
}
}
}
}
],
minimum_should_match : 1
}
}
}
};
When I run this query, I don't pull back the documents that match the nested query in the should clause; I only pull back documents matching the first condition. What am I doing wrong? Why can't the terms query not test the field against an array of input items and return results?
When I change the nested query to a match_all or match the items.product field to an exact value, I do get results.
Changing the nested query into the following instead of the current nested query (while everything else stays the same) gives me no results either.
nested : {
path : 'items',
query : {
bool : {
must : [
{
terms : {
'items.product' : ["1","2"],
minimum_should_match : 1
}
},
]
}
}
}
Any help would be greatly appreciated - this has been driving me crazy for a couple days now!
EDITED to include discussion of the index mapping
Given that the terms condition expects a non-analyzed field (per the documentation here), I would recommend you verify that your index has a mapping that specifically makes it so. For instance:
{"mappings" : {
"your_doc_type" : {
"items" : {
"type" : "nested",
"properties" : {
"product" : {"type" : "string", "index" : "not_analyzed"},
...
... Other properties of the nested object
...
}
},
...
... Mappings for the other fields in your document type
...
}
}
That should enable the terms to do what they are supposed to do when checking items.product.
My earlier suspicion was that there is something else in your query (min_score perhaps) that is filtering out results based upon score, and that threshold is weeding out the documents that match the items.product condition but not the apply condition due to the underlying Lucene scoring model. In other words, if all other things are equal for documents meeting only one item of the should query, the ones that meet the "apply":"2" condition will score higher than the documents for which items.product is 1 or 2. This was my empirical observation querying a trivially small test set of data with your query.
Test data set:
{"active":1, "apply":"2", "items" : [{"product": "3"}]}
{"active":0, "apply":"2", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "1"}]}
{"active":1, "apply":"3", "items" : [{"product": "2"}]}
Based on the conditions in your query, we should see three documents returned - the first, fourth, and fifth documents.
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 0.731233,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 0.4601705,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 0.35959372,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The expected documents came back, but you can see that the first document (for which apply is 2, meeting the first criterion of the should query) scored much higher.
If your intent is for these conditions to not affect the scoring of the documents but to use them instead as simple inclusion/exclusion criteria, you may want to switch to a filtered query. Something like:
{
"query" : {"filtered" : {
"query" : {"match_all" : {}},
"filter" : {"bool" : {
"must" : [
{"term" : {"active" : 1}}
],
"should" : [
{"term" : {"apply" : "2"}},
{"nested" : {
"path": "items",
"query" : {
"terms" : {"items.product" : ["1", "2"]}
}
}}
]
}}
}}
}
Since you are now specifying a filter instead, these conditions should not impact the scoring of the returned documents but instead only determine whether a document qualifies at all for the result set (the scores are then calculated independently of the conditions above). Using this filtered query, the results from my dumb data set are:
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 1.0,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The scores are now identical for all returned documents, without regard for which part of the should was satisfied.
Note that the query property above is match_all - if you had other conditions in your query that are not represented in the original question, then you would need to modify this accordingly.

ElasticSearch search query processing

I have been reading up on ElasticSearch and couldn't find an answer for how to do the following:
Say, you have some records with, "study" in the title and a user uses the word "studying" instead of "study". How would you set up ElasticSearch to match this?
Thanks,
Alex
ps: Sorry, if this is a duplicate. Wasn't sure what to search for!
You might be interested in this: http://www.elasticsearch.org/guide/reference/query-dsl/flt-query/
For eg: I have indexed book titles and on this query:
{
"query": {
"bool": {
"must": [
{
"fuzzy": {
"book": {
"value": "ringing",
"min_similarity": "0.3"
}
}
}
]
}
}
}
I got
{
"took" : "1",
"timed_out" : "false",
"_shards" : {
"total" : "5",
"successful" : "5",
"failed" : "0"
}
"hits" : {
"total" : "1",
"max_score" : "0.19178301",
"hits" : [
{
"_index" : "library",
"_type" : "book",
"_id" : "3",
"_score" : "0.19178301",
"_source" : {
"book" : "The Lord of the Rings",
"author" : "J R R Tolkein"
}
}
]
}
}
which is the only correct result..
You could apply stemming to your documents, so that when you index studying, you are beneath indexing study. And when you query you do the same, so that when you search for studying again, you'll be searching for study and you'll find a match, both looking for study and studying.
Stemming depends of course on the language and there are different techniques, for english snowball is fine. What happens is that you lose some information when you index data, since as you can see you cannot really distinguish between studying and study anymore. If you want to keep that distinction you could index the same text in different ways using a multi_field and apply different text analysis to it. That way you could search on multiple fields, both the non stemmed version and stemmed version, maybe giving different weights to them.

Resources