Search part of string with ElasticSeach - elasticsearch

In ElasticSearch, I tried to makeit search for a part of a word.
My example data is
{
"_id" : "1",
"name" : "Nopales",
}
{
"_id" : "2",
"name" : "ginger ale",
}
{
"_id" : "3",
"name" : "Kale",
}
{
"_id" : "4",
"name" : "Triticale Flour Whole Grain",
}
and the request is to find the matching words with the word "ale is"
GET /_search
{
"query": {
"query_string" : {"default_field" : "name", "query" : "*ale*"}
}
}
and it returns the results like
Kale, Triticale Flour Whole Grain, ginger ale, Nopales
but the best match is ginger ale, which contains the exact word, then kale, then nopales based on word counts then Triticale Flour Whole Grain.
Anyone have idea how to achieve this?

I think you should use *ale . In your query it searching for ale in all values of name field. And these all values Kale, Triticale Flour Whole Grain, ginger ale, Nopales contains ale. When you will query with *ale, It will search for string which ending with ale. You can also use like this "query" : "(*ale)OR(ale*)" for different kind of wildcards.

Related

ElasticSearch results are inaccurate

My current query is:
GET /index/_search
{
"query": {
"simple_query_string": {
"query": "(\"cheaper+than,+therapy\")",
"analyzer": "standard",
"flags": "OR|AND",
"fields": ["name"]
}
}
}
My main problem is at the moment this still find matches like "GOLF . . . CHEAPER THAN THERAPY". I don't want matches like this. I want to maybe fix some typo and normalize the search query but i don't want to extend them. So in this result the TM's "GOLF . . . CHEAPER THAN THERAPY" and "RUNNING IS: CHEAPER THAN THERAPY" should not be a result.
So the result should just show results which are almost the same as my search query is.
I try something with fuzzienes and so on but it does not help me.
The field name is a text field.
I await the following results:
CHEAPER THAN THERAPY
CHEAPER THAN, THERAPY
I dont await the following results:
GOLF . . . CHEAPER THAN THERAPY
"CHEAPER THAN THERAPY" MOORENKO'S
SHOPPING IS CHEAPER THAN THERAPY!
RUNNING IS: CHEAPER THAN THERAPY
CHEAPER THAN THERAPY AND WAY MORE FUN!
What do I have to do to get more accurate results?
You can use fuzzy query on keyword field.
The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization. Basically it breaks a text in number of tokens.
So when you are using simple_query_string it is just checking if any document has tokens ["CHEAPER","THAN","THERAPY"] in it.
You can use fuzzy query on text.keyword which will match whole string
{
"query": {
"fuzzy": {
"text.keyword": {
"value": "CHEAPER THAN THERAPY",
"fuzziness": "AUTO"
}
}
}
}
Result
[
{
"_index" : "index129",
"_type" : "_doc",
"_id" : "pnXJM3oBX7bKb5rQ30Vb",
"_score" : 1.6739764,
"_source" : {
"text" : "CHEAPER THAN THERAPY"
}
},
{
"_index" : "index129",
"_type" : "_doc",
"_id" : "p3XJM3oBX7bKb5rQ60UT",
"_score" : 1.5902774,
"_source" : {
"text" : "CHEAPER THAN, THERAPY"
}
}
]

Change compound token default behaviour in lucene/elasticsearch

Lucene/elasticsearch provide a possibility of compound tokens / subtokens. This is an important feature for e.g. German with composed words. The default behaviour of lucene is to combine the subtokens with an OR in order to not hurt recall and exclude documents from being returned. In specific situations, however, the opposite is required.
Assume that I want to index the following two documents:
Document 1:
PUT /idxwith/_doc/1
{
"name": "stockfisch"
}
Document 2:
PUT /idxwith/_doc/2
{
"name" : "laufstock"
}
Where the words will be decomposed as follows:
stockfisch ==> stock, fisch
laufstock ==> lauf, stock
Now with the following search query:
POST /idxwith/_search
{
"query": {
"match": {
"name": {
"query": "stockfisch"
}
}
}
}
I'd expect only the first document to be returned - which is not the case. As the subtokens are combined with OR, both documents will be returned (hurting the precision of my search):
"hits" : [
{
"_index" : "idxwith",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.3287766,
"_source" : {
"name" : "stockfisch"
}
},
{
"_index" : "idxwith",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.241631,
"_source" : {
"name" : "laufstock"
}
}
]
I'm looking for hints on how to adapt lucene (or elastic) to make this behaviour configurable, i.e. to be able to define that subtokens are combined with an AND if necessary.
Thanks!
To solve this problem you can use matchphrase query like this:
POST /idxwith/_search
{
"query": {
"match_phrase": {
"name": {
"query": "stockfisch"
}
}
}
}
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. for more info about MatchPhrase check here.
It is also possbile to use Operator in match query that it means all terms should be in term, more info here.
In your specific case I think Match_Phrase is a much better option since the order of terms are important.

Get from ElasticSearch why a result is a hit

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.
I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

Given a Document ID find the matching Document in Elasticsearch

I have indexed some articles in the Elasticsearch. Now suppose a user likes an article now i want to recommend some matching article to him. Assuming articles are precise and well written to the point. All articles are of same type.
I know it is like getting all the tokens related to that article and searching all other article on them. Is there anything in elastic search which does this for me...?
Or any other way of doing this..?
You can use More Like This Query:
From the doc it selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. 
Usage:
{
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "your index",
"_type" : "articles",
"_id" : "1" # your document id
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}

ElasticSearch search query processing

I have been reading up on ElasticSearch and couldn't find an answer for how to do the following:
Say, you have some records with, "study" in the title and a user uses the word "studying" instead of "study". How would you set up ElasticSearch to match this?
Thanks,
Alex
ps: Sorry, if this is a duplicate. Wasn't sure what to search for!
You might be interested in this: http://www.elasticsearch.org/guide/reference/query-dsl/flt-query/
For eg: I have indexed book titles and on this query:
{
"query": {
"bool": {
"must": [
{
"fuzzy": {
"book": {
"value": "ringing",
"min_similarity": "0.3"
}
}
}
]
}
}
}
I got
{
"took" : "1",
"timed_out" : "false",
"_shards" : {
"total" : "5",
"successful" : "5",
"failed" : "0"
}
"hits" : {
"total" : "1",
"max_score" : "0.19178301",
"hits" : [
{
"_index" : "library",
"_type" : "book",
"_id" : "3",
"_score" : "0.19178301",
"_source" : {
"book" : "The Lord of the Rings",
"author" : "J R R Tolkein"
}
}
]
}
}
which is the only correct result..
You could apply stemming to your documents, so that when you index studying, you are beneath indexing study. And when you query you do the same, so that when you search for studying again, you'll be searching for study and you'll find a match, both looking for study and studying.
Stemming depends of course on the language and there are different techniques, for english snowball is fine. What happens is that you lose some information when you index data, since as you can see you cannot really distinguish between studying and study anymore. If you want to keep that distinction you could index the same text in different ways using a multi_field and apply different text analysis to it. That way you could search on multiple fields, both the non stemmed version and stemmed version, maybe giving different weights to them.

Resources