I'm looking to implement an auto-complete like feature on my app with elasticsearch.
Let's say my input is "ronan f", I want elastic to return all elements where "ronan" or "f" is contained in last name or first name. I expect elasticsearch to sort the result by rank, so the element which is the closest to what I search should be on top.
I tried multiple requests but none of them results as expected.
For example :
{
"query": {
"bool": {
"must_not": [
{
"match": {
"email": "*#guest.booking.com"
}
}
],
"should": [
{
"match": {
"lastname": "ronan"
}
},
{
"match": {
"firstname": "ronan"
}
},
{
"match": {
"lastname": "f"
}
},
{
"match": {
"firstname": "f"
}
}
],
"minimum_should_match" : 1
}
},
"sort": [
"_score"
],
"from": 0,
"size": 30
}
With this request the ranks seams a bit odds, for example :
"_index": "clients",
"_type": "client",
"_id": "4369",
"_score": 20.680058,
"_source": {
"firstname": "F",
"lastname": "F"
}
is on top of :
"_index": "clients",
"_type": "client",
"_id": "212360",
_score": 9.230003,
"_source": {
"firstname": "Ronan",
"lastname": "Fily"
}
For me the second result should have a better rank than the first.
Can someone show me how can I achieve the result I want ?
For info, I can't use Completion Suggester functionality of elasticsearch because I can't access the configuration of the database (so no indexes).
Ok as you can reindex your data i join a "start with" anylyzer. It will work caseless & on text field (i thinck first name and last name can have multi words on it).
Delete / create a new index using mappings.
define your analyzer (PUT my_index)
{
"settings": {:
"filter": {
"name_ngrams": {
"max_gram": "20",
"type": "edgeNGram",
"min_gram": "1",
"side": "front"
}
},
"analyzer": {
"partial_name": {
"type": "custom",
"filter": [
"lowercase"
,
"name_ngrams"
,
"standard"
,
"asciifolding"
],
"tokenizer": "standard"
},
"full_name": {
"type": "custom",
"filter": [
"standard"
,
"lowercase"
,
"asciifolding"
],
"tokenizer": "standard"
}
}
post _mappings using this for your fields:
"lastname": {
"type": "text",
"analyzer": "partial_name",
"search_analyzer": "full_name"
},
"firstname": {
"type": "text",
"analyzer": "partial_name",
"search_analyzer": "full_name"
}
if this is not clear and elasticsearch documentation couldnot help you dont hesite to ask us.
Related
I have an index which is 2-4 characters with no spaces but user often searches for the "full term" which I dont have indexed but has 3 extra characters after a blank space.
Ex: I index "A1" or "A1B" or "A1B2" and the "full term" is something like
"A1 11A" or "A1B ABA" or "A1B2 2C8".
This is current mapping:
"code": {
"type": "text"
},
If he searches "A1" it bring all of them which is also correct, if he types "A1B" I want to bring only the last two and if he searches "A1B2 2C8" I want to bring only the last one.
Is that possible? If so, what would be the best search/index strategy?
Index Mapping:
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
Index data:
{
"code": "A1"
}
{
"code": "A1B"
}
{
"code": "A1B2"
}
Search Query:
{
"query": {
"match": {
"code": {
"query": "A1B2 2C8"
}
}
}
}
Search Result:
"hits": [
{
"_index": "65067196",
"_type": "_doc",
"_id": "3",
"_score": 1.3486402,
"_source": {
"code": "A1B2"
}
}
]
When i run this query
"multi_match": {
"query": "paper copier ",
"fields": [ "allStringFields" ],
"type": "cross_fields",
"operator": "and",
"analyzer": "synonym"
}
i get 1342 results
But when i run this query (notice word order)
"multi_match": {
"query": " copier paper ",
"fields": [ "allStringFields" ],
"type": "cross_fields",
"operator": "and",
"analyzer": "synonym"
}
I get zero results
I am using synonym analyzer and it is the cause for this behavior
Is there a solution to this ?
Adding a working example with index data, mapping, search query, and search result. In the below example, I have taken two synonyms table and tables
I get zero results
Please go through your index mapping once again. According to the below example, the search keyword is table chair, this is to be searched in both the fields title and content. The below query will return the documents that contain both table AND chair. To get a detailed explanation, refer to ES documentation on the Multi match query and synonym token filter.
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"table, tables"
]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}
Index Data:
{ "title": "table chair" }
{ "title": "tables chair" }
{ "title": "table fan" }
{ "title": "light fan", "content": "chair" }
Search Query:
{
"query": {
"multi_match": {
"query": "table chair",
"operator": "and",
"type":"cross_fields",
"fields": [
"title","content"
],
"analyzer": "synonym_analyzer"
}
}
}
Search Result:
"hits": [
{
"_index": "synonym",
"_type": "_doc",
"_id": "1",
"_score": 1.7227666,
"_source": {
"title": "table chair"
}
},
{
"_index": "synonym",
"_type": "_doc",
"_id": "2",
"_score": 1.3862942,
"_source": {
"title": "tables chair"
}
}
]
Searching table chair or chair table, gives the same search result as shown above.
I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.
This is our current query.
{
"query": {
"wildcard" : {
"message" : {
"title" : "h*"
}
}
}
}
By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.
You need to use english analyzer which stemmed tokens to its root form. More info can be found here
I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.
Index Mapping
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "english"
}
}
}
}
Index document
{
"title" : "Hello World"
}
Search query for h and its result
{
"query": {
"match": {
"title": "h"
}
}
}
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.42763555,
"_source": {
"title": "Hello World"
}
}
]
Search query for Hello Worlds and same document comes in result
{
"query": {
"match": {
"title": "Hello worlds"
}
}
}
Result
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.8552711,
"_source": {
"title": "Hello World"
}
}
]
EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens.
Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.
Fuzziness can be used to find similar terms
Mapping
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"text":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Query
GET my_index/_search
{
"query": {
"match": {
"text": {
"query": "hello worlds",
"fuzziness": 1
}
}
}
}
I'm having issues getting the elasticsearch results i need.
My mappings look like this:
"mappings": {
"product": {
"_meta": {
"model": "App\\Entity\\Product"
},
"dynamic_date_formats": [],
"properties": {
"articleNameSearch": {
"type": "text",
"analyzer": "my_analyzer"
},
"articleNumberSearch": {
"type": "text",
"fielddata": true
},
"brand": {
"type": "nested",
"properties": {
"name": {
"type": "text"
}
}
}
}
}
},
My settings:
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "my_index",
"creation_date": "1572252785482",
"analysis": {
"filter": {
"standard": {
"type": "standard"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"standard"
],
"type": "custom",
"tokenizer": "lowercase"
}
}
},
"number_of_replicas": "1",
"uuid": "bwmc7NZ9RXqB1lpQ3e8HTQ",
"version": {
"created": "5060399"
}
}
}
The data inside:
"hits": [
{
"_index": "my_index",
"_type": "product",
"_id": "14",
"_score": 1.0,
"_source": {
"articleNumberSearch": "5003xx843",
"articleNameSearch": "this is a test string",
"brand": {
"name": "Brand name"
}
}
},
Currently the PHP code for the query looks like this (this does not return correct records):
$searchQuery = new BoolQuery();
$formattedQuery = "*" . str_replace(['.', '|'], '', trim(mb_strtolower($query))) . "*";
/**
* Test NGRAM analyzer
*/
$matchQuery = new Query\MultiMatch();
$matchQuery->setFields([
'articleNumberSearch',
'articleNameSearch',
]);
$matchQuery->setQuery($formattedQuery);
$searchQuery->addMust($matchQuery);
/**
* Nested query
*/
$nestedQuery = new Nested();
$nestedQuery->setPath('brand');
$nestedQuery->setQuery(
new Match('brand.name', 'Brand name')
);
$searchQuery->addMust($nestedQuery);
I'm creating and auto-complete search field, where you can search articleNumberSearch and articleNameSearch while brand name is always a fixed value.
I want to be able to search for example:
500 will find this hit, because 500 is in the articleNumberSearch.
But also be able to search:
this is string
Couple questions:
Which query do i need to use?
Am i using the right analyzer?
Is my anaylizer correctly configured?
You should create an ngram type tokenizer.
The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters.
Something like that:
"analysis": {
"analyzer": {
"autocomplete": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"symbol",
"punctuation"
],
"min_gram": "1",
"type": "ngram",
"max_gram": "2"
}
}
}
NGram Tokenizer
I'm trying to achieve google style autocomplete & autocorrection with elasticsearch.
Mappings :
POST music
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"song": {
"properties": {
"song_field": {
"type": "string",
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"search_analyzer": "simple",
"payloads": true
}
}
}
}
}
Docs:
POST music/song
{
"song_field" : "beautiful queen",
"suggest" : "beautiful queen"
}
POST music/song
{
"song_field" : "beautiful",
"suggest" : "beautiful"
}
I expect that when user types: "beaatiful q" he will get something like beautiful queen (beaatiful is corrected to beautiful and q is completed to queen).
I've tried the following query:
POST music/song/_search?search_type=dfs_query_then_fetch
{
"size": 10,
"suggest": {
"didYouMean": {
"text": "beaatiful q",
"completion": {
"field": "suggest"
}
}
},
"query": {
"match": {
"song_field": {
"query": "beaatiful q",
"fuzziness": 2
}
}
}
}
Unfortunately, Completion suggester doesn't allow any typos so I get this response:
"suggest": {
"didYouMean": [
{
"text": "beaatiful q",
"offset": 0,
"length": 11,
"options": []
}
]
}
In addition, search gave me these results (beautiful ranked higher although user started to wrote "queen"):
"hits": [
{
"_index": "music",
"_type": "song",
"_id": "AVUj4Y5NancUpEdFLeLo",
"_score": 0.51315063,
"_source": {
"song_field": "beautiful"
"suggest": "beautiful"
}
},
{
"_index": "music",
"_type": "song",
"_id": "AVUj4XFAancUpEdFLeLn",
"_score": 0.32071912,
"_source": {
"song_field": "beautiful queen"
"suggest": "beautiful queen"
}
}
]
UPDATE !!!
I found out that I can use fuzzy query with completion suggester, but now I get no suggestions when querying (fuzzy only supports 2 edit distance):
POST music/song/_search
{
"size": 10,
"suggest": {
"didYouMean": {
"text": "beaatefal q",
"completion": {
"field": "suggest",
"fuzzy" : {
"fuzziness" : 2
}
}
}
}
}
I still expect "beautiful queen" as suggestion response.
When you want to provide 2 or more words as search suggestions, I have found out (the hard way), its not worth it to use ngrams or edgengrams in Elasticsearch.
Using the Shingles token filter and the shingles analyzer will provide you with multi-word phrases and if you couple that with the match_phrase_prefix it should give you the functionality your looking for.
Basically something like this:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"my_shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2,
"output_unigrams": false
}
},
"analyzer": {
"my_shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_shingle_filter"
]
}
}
}
}
}
And don't forget to do your mapping:
{
"my_type": {
"properties": {
"title": {
"type": "string",
"fields": {
"shingles": {
"type": "string",
"analyzer": "my_shingle_analyzer"
}
}
}
}
}
}
Ngrams and edgengrams are going tokenize single characters, whereas the Shingles analyzer and filters, groups letters (making words) and provide a much more efficient way of producing and searching for phrases. I spent alot of time messing with the 2 above until I saw Shingles mentioned and read up on it. Much better.