Match is not obtained by using match_phrase_prefix in elastic search - elasticsearch

I have used match_phrase_prefix. It provides results if I have searched by entering some characters, but when I add some more characters to the search field, query returns zero hits.
For example: if I type abcd it returns match.
But if I type abcd e then even if there is match, I didn't get the hit.
Following is the query I have used
here _field: is the filed name and
_queryText: is the search field value that I enter.
Can I use must or should condition or minimum should match here if yes then how?
Thanks in advance
{
body: {
'query': {
'match_phrase_prefix': {
[_field]: _queryText
},
},
'size': 15,
}
}

Match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text.
In your case, when you search for abcd as it the only term in the search query, it will make a prefix query and documents containing abcd, abcde , abcdef all will match.
But as soon as you change your search query to abcd e, it has a two words abcd and e. It will make a match query on abcd and prefix query on e, so documents containing abcd followed by e or ef or efg will match.
It would be better if you can provide your sample docs so that I can explain it better what should match and what shouldn't and the reason behind it.

Related

ElasticSearch: how can i influence the "directionality" of a trigram match?

we use elasticsearch to search on address data and for the purpose of non-exact matches we include a field variant of the streetname that is analyzed with an ngram tokenizer (trigrams to be specific). and we use a minimum-should-match clause of "3<75%" for the queries on this field, which means 'if there are 3 or less trigrams in the search term then all of them have to match. If there are more than 3, then 75% of them have to match'
generally this works OK, but there are cases where we get unintended results like this
We search for "Uhland" and we find "Am Maschlandgraben". As far as i can tell what happens is that "Uhland" is split into "uhl", "hla", "lan", "and" and 3 of those 4 trigrams can be matched to the trigrams of "Am MascHLANDgraben" (the matching part in upper case). so, 3 out of 4 is 75% that fulfills our "3<75%" requirement, so it becomes a match.
So there is a "directionality" (for lack of a better word) for that 75% match. it only looks at/counts against the number of terms in the search term and ignores how many trigrams of the indexed document are not matched.
One could argue that the 75% match requirement is not met in that example, because 10 out of the 13 trigrams from "Am Maschlandgraben" are not matched by the trigrams of "Uhland". And in fact, if you reverse the query and search for "Am Maschlandgraben" you won't find "Uhland" as a match. Because now the "directionality" is reversed and the query realizes that only 3 out of 13 trigrams are matched and that does not meet the requirement of "3<75%"
what i would love to figure out is how i can modify the query so that the 75% match has no "directionality" and always has to match on "both sides" of the comparison. so to stay with the example above, i neither want "Uhland" to be a match to "Am Maschlandgraben" nor "Am Maschlandgraben" a match to "Uhland"
So i guess, to put it in real life language, instead of "75% of the search term trigrams need to match the indexed document" i would like to have "75% of both search term and indexed document trigrams need to match"
i hope i communicated well enough what my intention is (english is not my native language)
Here is an example of how our query looks right now_
{
"query": {
"bool": {
"should": [
{
"match": {
"address.street.trigram": {
"query": "Uhland",
"minimum_should_match": "3<75%"
}
}
}
]
}
}
}

Extract the words used by a fuzzy query

I use a fuzzy query in Elasticsearch and it works fine.
For example, if I search for dogs and the _source had the word dog
I get the correct document, but I don't know that the word dog was used for the result.
If _source had 10 000 words, how can I find that the query found dog?
Have you an idea to find the words scored into the result ?
i find my solution by adding :
"highlight": {
"fields" : {
"myfields" : {}
}
I find the exact position of the hits
Not sure i understood your question correctly but it looks like you might want to try highlighting.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

Fuzziness on 3 letter words

What kind of analyzers would you implement in Elasticsearch for searching book titles.
The requirements are that there must be fuzziness and there are word that are 3 letters.
I'm not going to include code because I would like to get a fresh insight.
But the problem I am having is that when I search 3 letters words wrong,
Say I type "dns" and there is a document with a field "dna" then I will get
kindness or something that has dns in the word.
I believe to solve your problem you can use the fuzziness field in your fuzzy query, this will let you set the maximum edit distance so long words will not get matched when your input is a very small word.
{
"fuzzy" : {
"user" : {
"value" : "ki",
"fuzziness" : 2,
"prefix_length" : 1
}
}
}
The above query would match all 3 letter words which start with the letter 'k' and all 4 letter words which start with the letters 'ki'. A fuzziness of 2 means that any 2 edits are allowed i.e. either change 'i' to another letter and then add another letter or add two more letter while keeping 'ki'. The prefix length tells elasticsearch how much of the query needs to be exactly matched before the fuzziness can take over.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

Elasticsearch term query with colons

I have a string field "title"(not analyzed) in elasticsearch. A document has title "Garfield 2: A Tail Of Two Kitties (2006)".
When I use the following json to query, no result returns.
{"query":{"term":{"title":"Garfield 2: A Tail Of Two Kitties (2006)"}}}
I tried to escape the colon character and the braces, like:
{"query":{"term":{"title":"Garfield 2\\: A Tail Of Two Kitties \\(2006\\)"}}}
Still not working.
Term query wont tokenize or apply analyzers to the search text. Instead if looks for the exact match which wont work as the string fields are analyzed/tokenized by default.
To give this a better explanation -
Lets say there is a string value as - "I am in summer:camp"
When indexing this its broken into tokens as below -
"I am in summer:camp" => [ I , am , in , summer , camp ]
Hence even if you do a term search for "I am in summer:camp" , it wont still work as the token "I am in summer:camp" is not present in the index.
Something like phrase query might work better here.
Or you can leave "index" field as "not_analyzed" to make sure that string is not tokenized.

Elasticsearch Regex Query

I am running elasticsearch v1.1.1 and I am having trouble getting results from regex searches.
{
"query" : {
"regexp" : {
"lastname" : "smit*"
}
}
}
Returns 0 results (when I know I have 'smith' in the data.
I have also tried:
{
"query" : {
"filtered" : {
"filter" : {
"regexp" : {
"lastname" : "smit*"
}
}
}
}
}
Any help would be appreciated.
So first off, a lot of this is dependent on how you indexed the field - analyzed or not, what kind of tokenizer, was it lowercased, etc.
To answer your specific question concerning regexp queries, assuming your field is indexed as "smith" (all lower case) you should change your search string to "smit.*" which should match "smith". "smit." should also work.
The reason is that in regexp (which is different than wildcard) "." matches any character. "*" matches any number of the previous character. So your search would match "smitt" or "smittt". The construct ".*" means match any number (including 0) of the previous character - which is "." which matches any. The combination of the two is the regexp equivalent of the wildcard "*".
That said, I'd caution that regexp and wildcard searches can have significant performance challenges in text indexes, depending upon the nature of the field, how it's indexed and the number of documents. These kinds of searches can be very useful but more than one person has built wildcard or regexp searches tested on small data sets only to be disappointed by the production performance. Use with caution.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
ElasticSearch Regexp Filter

Resources