ElasticSearch: How to use edge_ngram and have real relevant hits to display first

ElasticSearch: How to use edge_ngram and have real relevant hits to display first - elasticsearch

I'm new with elasticsearch and I'm trying to develop a search for an ecommerce to suggested 5~10 matching products to the user.
As it should work while the user is typing, we found in the official documentation the use of edge_ngram and it KIND OF worked. But as we searched to test, the results were not the expected. As shows the example below (in our test)
Searching example
As it is shown in the image, the result for the term "Furadeira" (Power Drill) returns accessories before the power drill itself. How can I enhance the results? Even the order where the match is found in the string would help me, I guess.
So, this is the code I have until now:
//PUT example
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"portuguese_stop": {
"type": "stop",
"stopwords": "_portuguese_"
},
"portuguese_stemmer": {
"type": "stemmer",
"language": "light_portuguese"
}
},
"analyzer": {
"portuguese": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"portuguese_stop",
"portuguese_stemmer"
]
},
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
/* mapping */
//PUT /example/products/_mapping
{
"products": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
/* Search */
//GET /example/products/_search
{
"query" : {
"query_string": {
"query" : "furadeira",
"type" : "most_fields", // Tried without this aswell
"fields" : [
"name^8",
"model^10",
"manufacturer^4",
"description"
]
}
}
}
/* Product example */
// PUT example/products/38313
{
"name": "FITA VEDA FRESTA (ESPUMA 4503) 12X5 M [ H0000164055 ]",
"description": "Caracteristicas do produto:Ve…Diminui ruidos indesejaveis.",
"price":21.90,
"product_id": 38313,
"image": "http://placehold.it/200x200",
"quantity": 92,
"width": 20.200,
"height": 1.500,
"length": 21.500,
"weight": 0.082,
"model": "167083",
"manufacturer": "3M DO BRASIL"
}
Thanks in advance.

you could enhance your query to be a so-called boolean query, which contains your existing query in a must clause, but have an additional query in a should clause, that matches exactly (not using the ngrammed field). If the query matches the should clause it will be scored higher.
See the bool query documentation.

let's assume you have a field that differentiates the Main product from Accessories. I call it level_field.
now you can have two approaches to go:
1) boost up The Main product _score by adding 'should' operation:
put your main query in the must operation and in should operation use level_field to boost the _score of documents which are the Main products.
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "furadeira"
}
}
},
"should": [
{ "match": {
"level_field": {
"query": "level1",
"boost": 3
}
}},
{ "match": {
"level_field": {
"query": "level2",
"boost": 2
}
}}
]
}
}
}
2) in second approach you can decrease _score for documents that they are not the Main products by using boosting query:
{
"query": {
"boosting": {
"positive": {
"query_string": {
"query" : "furadeira",
"type" : "most_fields",
"fields" : [
"name^8",
"model^10",
"manufacturer^4",
"description"
]
}
}
},
"negative": {
"term": {
"level_field": {
"value": "level2"
}
}
},
"negative_boost": 0.2
}
}
}
I hope it helps

Related

ElasticSearch Search-as-you-type field type field with partial search

I recently updating my ngram implementation settings to use Search-as-you-type field type.
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html
This worked great but I noticed that partial searching does not work.
If I search for number 00060434 I get the desired result but I would also like to be able to search for 60434, then it should return document 3.
Is there a way todo it with the Search-as-you-type field type or can i only do this with ngrams?
PUT searchasyoutype_example
{
"settings": {
"analysis": {
"analyzer": {
"englishAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"ascii_folding"
]
}
},
"filter": {
"ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
}
}
},
"mappings": {
"properties": {
"number": {
"type": "search_as_you_type",
"analyzer": "englishAnalyzer"
},
"fullName": {
"type": "search_as_you_type",
"analyzer": "englishAnalyzer"
}
}
}
}
PUT searchasyoutype_example/_doc/1
{
"number" : "00069794",
"fullName": "Employee 1"
}
PUT searchasyoutype_example/_doc/2
{
"number" : "00059840",
"fullName": "Employee 2"
}
PUT searchasyoutype_example/_doc/3
{
"number" : "00060434",
"fullName": "Employee 3"
}
GET searchasyoutype_example/_search
{
"query": {
"multi_match": {
"query": "00060434",
"type": "bool_prefix",
"fields": [
"number",
"number._index_prefix",
"fullName",
"fullName._index_prefix"
]
}
}
}

I think you need to query on number,number._2gram & number._3gram like below:
GET searchasyoutype_example/_search
{
"query": {
"multi_match": {
"query": "00060434",
"type": "bool_prefix",
"fields": [
"number",
"number._2gram",
"number._3gram",
]
}
}
}
search_as_you_type creates the 3 sub fields. You can check more on this article how it works:
https://ashish.one/blogs/search-as-you-type/

Elasticsearch "AND in query_string" vs. "default_operator AND"

elasticsearch v7.1.1
I dont understand the difference between a query_string containing "AND"
vs. "default_operator AND"
I thought it should yield the same result, but doesnt:
HTTP POST http://localhost:9200/umlautsuche
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": ["ph => f"]
}
},
"filter": {
"my_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"my_name_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase",
"german_normalization"
]
}
}
}
},
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
}
HTTP POST http://localhost:9200/_bulk
{ "index" : { "_index" : "umlautsuche", "_id" : "1" } }
{"vorname": "Stephan-Jörg", "nachname": "Müller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "2" } }
{"vorname": "Stephan-Joerg", "nachname": "Mueller", "ort": "Hollabrunn"}
{ "index" : { "_index" : "umlautsuche", "_id" : "3" } }
{"vorname": "Stephan-Jörg", "nachname": "Müll", "ort": "Hollabrunn"}
No results here - unexpected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
This query gives the results as expected by me:
HTTP POST http://localhost:9200/umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan AND Müller AND Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": ["vorname", "nachname"]
}
}
}
How do I configure query/analyzer so I dont need these "AND" between my search terms?

What you are facing is an obscurity of boolean logic of query_string boolean operators, and possibly an undocumented behavior. Because of this obscurity I believe it is better to either use bool query with explicit logic, or to use a copy_to.
Let me explain in a bit more detail what's going on and how can you fix it.
Why doesn't the first query match?
In order to see how the query gets executed, let's set profile: true:
POST /umlautsuche/_search
{
"query": {
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"vorname",
"nachname"
]
}
},
"profile": true
}
In the ES response we will see:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17787641,
"breakdown": {
"set_min_competitive_score_count": 0,
We are interested in this part:
"+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)"
Without going into deep analysis, we can tell that this query wants to find documents with surname stefan and with surname muller, which is impossible (because stefan is never a surname among the documents).
What we actually want to do, I presume, is "find people whose full name is Stefan Müller Jör*". This is not what the query generated by Elasticsearch does.
Why does the second query match?
Let's do the same trick with explain: true. The response would contain this:
"profile": {
"shards": [
{
"id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)",
"time_in_nanos": 17970342,
"breakdown": {
We can see that the query got interpreted like this:
"+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)"
Which we can roughly interpret as "find people whose name or surname is one these three names", which is what we expect it to do.
In the documentation of query_string query it says that with default_operator: AND it should interpret spaces as ANDs:
The default operator used if no explicit operator is specified. For
example, with a default operator of OR, the query capital of Hungary
is translated to capital OR of OR Hungary, and with default operator
of AND, the same query is translated to capital AND of AND Hungary.
The default value is OR.
Although, from what we have just seen, this does not seem to be correct - at least in case of querying multiple fields.
So what can we do about it?
Use bool with explicit logic
This query seems to work:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"vorname"
]
}
},
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"fields": [
"nachname"
]
}
}
]
}
}
}
This query is not an exact equivalent, consider it as an example. For instance, if we would have another record like this, without "Jörg":
{"vorname": "Stephan", "nachname": "Müll", "ort": "Hollabrunn"}
the bool query above would match it despite missing "Jörg". To overcome this you can write a more complex bool query, but this will not do if you wanted to avoid parsing user input.
How can we still use plain, unparsed query string?
Introduce a copy_to field
We can try to use copy_to capability. It will copy the content of several fields into another field and will analyze these fields all together.
We will have to modify the mapping configuration (unfortunately the existing index will have to be recreated):
"mappings": {
"date_detection": false,
"dynamic_templates": [
{
"name_fields_german": {
"match_mapping_type": "string",
"match": "*name",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer",
"copy_to": "full_name"
}
}
},
{
"string_fields_german": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "text",
"analyzer": "my_name_analyzer"
}
}
},
{
"dates": {
"match": "lastModified",
"match_pattern": "regex",
"mapping": {
"type": "date",
"ignore_malformed": true
}
}
}
]
}
Then we can populate the index in exactly the same manner as we did before.
Now we can query the new field full_name with the following query:
POST /umlautsuche/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Stefan Müller Jör*",
"analyze_wildcard": true,
"default_operator": "AND",
"fields": [
"full_name"
]
}
}
]
}
}
}
This query will return same 2 documents as the second query. Thus, in this case default_operator: AND behaves as we would expect it, asking for all tokens from the query to be matched.
Hope that helps!

Elasticsearch Edge NGram tokenizer higher score when word begins with n-gram

Suppose there is the following mapping with Edge NGram Tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
And the following documents are indexed:
POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}
Then searching
{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}
yields all with the same score, or TRENDING - HI with a score higher than one of the others.
How can it be configured, to show with a higher score the entries that actually start with the searcher n-gram? In this case, HITS FIND SOME and HITS OTHER to have a higher score than TRENDING HI; at the same time TRENDING HI should be in the results.
Highlighter is also used, so the given solution shouldn't mess it up.
The highlighter used in query is:
"highlight": {
"pre_tags": [
"<"
],
"post_tags": [
">"
],
"fields": {
"name": {}
}
}
Using this with match_phrase_prefix messes up the highlighting, yielding <H><I><T><S> FIND SOME when searching only for H.

You must understand how elasticsearch/lucene analyzes your data and calculate the search score.
1. Analyze API
https://www.elastic.co/guide/en/elasticsearch/reference/current/_testing_analyzers.html this will show you what elasticsearch will store, in your case:
T / TR / TRE /.... TRENDING / / H / HI
2. Score
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
The bool query is often used to build complex query where you need a particular use case. Use must to filter document, then should to score. A common use case is to use different analyzers on a same field (by using the keyword fields in the mapping, you can analyze a same field differently).
3. dont mess highlight
According the doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html#specify-highlight-query
You can add an extra query:
{
"query": {
"bool": {
"must" : [
{
"match": {
"name": "HI"
}
}
],
"should": [
{
"prefix": {
"name": "HI"
}
}
]
}
},
"highlight": {
"pre_tags": [
"<"
],
"post_tags": [
">"
],
"fields": {
"name": {
"highlight_query": {
"match": {
"name": "HI"
}
}
}
}
}
}

In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.

A possible solution for this problem is to use multifields. They allow for indexing of the same data from your source document in different ways. In your case you could index the name field as default text, then as ngrams and also as edgengrams. Then the query would have to be a bool query comparing with all those different fields.
The final score of documents is composed of the match value for each one. Those matches are also called signals, signalling that there is a match between the query and the document. The document with most signals matching gets the highest score.
In your case all documents would match the ngram HI. But only the HITS FIND SOME and the HITS OTHER document would get the edgengram additional score. This would give those two documents a boost and put them on top. The complication with this is that you have to make sure that the edgengram doesn't split on whitespaces, because then the HI at the end would get the same score as in the beginning of the document.
Here is an example mapping and query for your case:
PUT /tag/
{
"settings": {
"analysis": {
"analyzer": {
"edge_analyzer": {
"tokenizer": "edge_tokenizer"
},
"kw_analyzer": {
"tokenizer": "kw_tokenizer"
},
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
},
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"kw_tokenizer": {
"type": "keyword"
},
"edge_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
},
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
},
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"edge": {
"type": "text",
"analyzer": "edge_analyzer"
},
"ngram": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
}
}
}
And a query:
POST /tag/_search
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"match": {
"name.edge": {
"query": "HI"
}
}
},
"boost": "5",
"boost_mode": "multiply"
}
},
{
"match": {
"name.ngram": {
"query": "HI"
}
}
},
{
"match": {
"name": {
"query": "HI"
}
}
}
]
}
}
}

ElasticSearch - Fuzzy and strict match with multiple fields

We want to leverage ElasticSearch to find us similar objects.
Lets say I have an Object with 4 fields:
product_name, seller_name, seller_phone, platform_id.
Similar products can have different product names and seller names across different platforms (fuzzy match).
While, phone is strict and a single variation might cause yield a wrong record (strict match).
What were trying to create is a query that will:
Take into account all fields we have for current record and OR
between them.
Mandate platform_id is the one I want to specific look at. (AND)
Fuzzy the product_name and seller_name
Strictly match the phone number or ignore it in the OR between the fields.
If I would write it in pseudo code, I would write something like:
((product_name like 'some_product_name') OR (seller_name like
'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id
= 123)

To do exact match on seller_phone i am indexing this field without ngram analyzers along with fuzzy_query for product_name and seller_name
Mapping
PUT index111
{
"settings": {
"analysis": {
"analyzer": {
"edge_n_gram_analyzer": {
"tokenizer": "whitespace",
"filter" : ["lowercase", "ednge_gram_filter"]
}
},
"filter": {
"ednge_gram_filter" : {
"type" : "NGram",
"min_gram" : 2,
"max_gram": 10
}
}
}
},
"mappings": {
"document_type" : {
"properties": {
"product_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_phone" : {
"type": "text"
},
"platform_id" : {
"type": "text"
}
}
}
}
}
Index documents
POST index111/document_type
{
"product_name":"macbok",
"seller_name":"apple",
"seller_phone":"9988",
"platform_id":"123"
}
For following pseudo sql query
((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)
Elastic Query
POST index111/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"platform_id": {
"value": "123"
}
}
},
{
"bool": {
"should": [{
"fuzzy": {
"product_name": {
"value": "macbouk",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"fuzzy": {
"seller_name": {
"value": "apdle",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"term": {
"seller_phone": {
"value": "9988"
}
}
}
]
}
}]
}
}
}
Hope this helps

Elasticsearch custom analyzer with ngram and without word delimiter on hyphens

I am trying to index strings that contain hyphens but do not contain spaces, periods or any other punctuation. I do not want to split up the words based on hyphens, instead I would like to have the hyphens be part of the indexed text.
For example, my 6 text strings would be:
magazineplayon
magazineofhorses
online-magazine
best-magazine
friend-of-magazines
magazineplaygames
I would like to be able to search these string for the text containing "play" or for the text starting with "magazine".
I have been able to use ngram to make the text containing "play" work properly. However, the hyphen is causing text to split and it is including results where "magazine" is in the word after a hyphen. I only want words starting at the beginning of the string with "magazine" to appear.
Based on the sample above, only these 3 should appear when beginning with "magazine":
magazineplayon
magazineofhorses
magazineplaygames
Please help with my ElasticSearch Index Sample:
DELETE /sample
PUT /sample
{
"settings": {
"index.number_of_shards":5,
"index.number_of_replicas": 0,
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"word_delimiter_filter": {
"type": "word_delimiter",
"preserve_original": true,
"catenate_all" : true
}
},
"analyzer": {
"ngram_index_analyzer": {
"type" : "custom",
"tokenizer": "lowercase",
"filter" : ["nGram_filter", "word_delimiter_filter"]
}
}
}
}
}
PUT /sample/1/_create
{
"name" : "magazineplayon"
}
PUT /sample/3/_create
{
"name" : "magazineofhorses"
}
PUT /sample/4/_create
{
"name" : "online-magazine"
}
PUT /sample/5/_create
{
"name" : "best-magazine"
}
PUT /sample/6/_create
{
"name" : "friend-of-magazines"
}
PUT /sample/7/_create
{
"name" : "magazineplaygames"
}
GET /sample/_search
{
"query": {
"wildcard": {
"name": "*play*"
}
}
}
GET /sample/_search
{
"query": {
"wildcard": {
"name": "magazine*"
}
}
}
Update 1
I updated all my create statements to use TEST after sample:
PUT /sample/test/7/_create
{
"name" : "magazinefairplay"
}
I then ran the following command to return only names that had the word "play" in them instead of doing the wildcard search. This worked correctly and returned only two records.
POST /sample/test/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{"match": { "name.substrings": "play" }}
]
}
}
}
I ran the following command to return only names that started with "magazine". My expectation was that "online-magazine", "best-magazine" and "friend-of-magazines" would not appear. However, all seven records were returned including these three.
POST /sample/test/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{"match": { "name.prefixes": "magazine" }}
]
}
}
}
Is there a way to filter out the prefix where the hyphen is used?

You're on the right path, however, you need to also add another analyzer that leverages the edge-ngram token filter in order to make the "starts with" contraint work. You can keep the ngram for checking fields that "contain" a given word, but you need edge-ngram to check that a field "starts with" some token.
PUT /sample
{
"settings": {
"index.number_of_shards": 5,
"index.number_of_replicas": 0,
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"edgenGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_index_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"nGram_filter"
]
},
"edge_ngram_index_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"edgenGram_filter"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_index_analyzer",
"search_analyzer": "standard"
},
"substrings": {
"type": "string",
"analyzer": "ngram_index_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
Then your query will become (i.e. search for all documents whose name field contains play or starts with magazine)
POST /sample/test/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{"match": { "name.substrings": "play" }},
{"match": { "name.prefixes": "magazine" }}
]
}
}
}
Note: don't use wildcard for searching for substrings, as it will kill the performance of your cluster (more info here and here)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticSearch: How to use edge_ngram and have real relevant hits to display first - elasticsearch

Related

ElasticSearch Search-as-you-type field type field with partial search

Elasticsearch "AND in query_string" vs. "default_operator AND"

Elasticsearch Edge NGram tokenizer higher score when word begins with n-gram

ElasticSearch - Fuzzy and strict match with multiple fields

Elasticsearch custom analyzer with ngram and without word delimiter on hyphens

Categories

Resources