Highlight with fuzziness and ngram

Highlight with fuzziness and ngram - elasticsearch

I guess the title of the topic spoiled you enough :D
I use edge_ngram and highlight to build an autocomplete search. I have added fuzziness in the query to allow users to mispell their search, but it brokes a bit the highlight.
When i write Sport this is what I get :
<em>Spor</em>t
<em>Spor</em>t mécanique
<em>Spor</em>t nautique
I guess it's because it matches with the token spor generated by the ngram tokenizer.
The query:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "sport",
"operator": "and",
"fuzziness": "AUTO"
}
}
},
{
"match_phrase_prefix": {
"name.raw": {
"query": "sport"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets"
}
}
}
}
And the mapping:
{
"settings": {
"analysis": {
"analyzer": {
"partialAnalyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["asciifolding", "lowercase"]
},
"keywordAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
},
"searchAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
"place": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "partialAnalyzer",
"search_analyzer": "searchAnalyzer",
"term_vector": "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"analyzer": "keywordAnalyzer"
}
}
}
}
}
}
}
I tried to add a new match clause without fuzziness in the query to try to match the keyword before the match with fuzziness but it changed nothing.
'match': {
'name': {
'query': 'sport',
'operator': 'and'
}
Any idea how I can handle this?
Regards, Raphaël

You could do that with highlight_query I guess
Try this in your highlighting query.
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets",
"highlight_query": {
"match": {
"name.raw": {
"query": "spotr",
"fuzziness": 2
}
}
}
}
}
}
I hope it helps.

Related

How to make edge_ngram token match with certaint quantity ofwords between them?

I'm trying to make a search request that retrieves the results only when less than
5 words are between requested tokens.
{
"settings": {
"index": {
"analysis": {
"filter": {
"stopWords": {
"type": "stop",
"stopwords": [
"_english_"
]
}
},
"normalizer": {
"lowercaseNormalizer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"char_filter": []
}
},
"analyzer": {
"autoCompleteAnalyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "autoCompleteTokenizer"
},
"autoCompleteSearchAnalyzer": {
"type": "custom",
"tokenizer": "lowercase"
},
"charGroupAnalyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "charGroupTokenizer"
}
},
"tokenizer": {
"charGroupTokenizer": {
"type": "char_group",
"max_token_length": "20",
"tokenize_on_chars": [
"whitespace",
"-",
"\n"
]
},
"autoCompleteTokenizer": {
"token_chars": [
"letter"
],
"min_gram": "3",
"type": "edge_ngram",
"max_gram": "20"
}
}
}
}
}
}
The settings:
{
"mappings": {
"_doc": {
"properties": {
"description": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 64
}
},
"analyzer": "autoCompleteAnalyzer",
"search_analyzer": "autoCompleteSearchAnalyzer"
},
"text": {
"type": "text",
"analyzer": "charGroupAnalyzer"
}
}
}
}
}
}
}
And make a bool request with request:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"description.name"
],
"operator": "and",
"query": "rounded elephant",
"fuzziness": 1
}
},
{
"match_phrase": {
"description.text": {
"analyzer": "charGroupAnalyzer",
"query": "rounded elephant",
"slop": 5,
"boost": 20
}
}
}
]
}
}
}
I expect the request to retrieve documents, where description contains:
... rounded very interesting elephant ...
This works good, when i use the complete words, like rounded elephant.
But, whe i enter prefixed words, like round eleph it fails.
But it's obvious that the description.name and description.text have different tokenizers (name contains ngram tokens, but text contain word tokens), so i get completely wrong results.
How can I configure mappings and search, to be able to use ngrams with distance between tokens?

Elasticsearch term query to number token

I need to explain some weird behavior of term query to Elasticsearch database which contains number part in the string. Query is pretty simple:
{
"query": {
"bool": {
"should": [
{
"term": {
"address.street": "8 kvetna"
}
}
]
}
}
}
The problem is that term 8 kvetna returns empty result. I tried to _analyze it ad it make regular tokens like 8, k, kv, kve .... Also I am pretty sure there is a value 8 kvetna in database.
Here is the mapping for the field:
{
"settings": {
"index": {
"refresh_interval": "1m",
"number_of_shards": "1",
"number_of_replicas": "1",
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"asciifolding",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
"default": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"doc": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"analyzer": "autocomplete"
},
"street": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
}
}
What caused this weird result? I don't understand it. Thanks for any help.

Great start so far! Your only issue is that you're using a term query, while you should use a match one. A term query will try to do an exact match for 8 kvetna and that's not what you want. The following query will work:
{
"query": {
"bool": {
"should": [
{
"match": { <--- change this
"address.street": "8 kvetna"
}
}
]
}
}
}

elasticsearch match query is not working for numbers

I have a search query which is used to search in report name.
I have indexed the field name with autocomplete,edge_ngram
Normal field name search is proper when i'm having a number / year in the field name it's not working.
Query :
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"field_name": {
"query": "hybrid seeds india 2017",
"operator": "and"
}
}
}
]
}
}
}
},
"from": 0,
"size": 10
}
Setting and the Mappings
{
"mappings": {
"pages": {
"properties": {
"report_name": {
"fields": {
"autocomplete": {
"search_analyzer": "report_name_search",
"analyzer": "report_name_index",
"type": "string"
},
"report_name": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "multi_field"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"report_name_ngram": {
"max_gram": 150,
"min_gram": 2,
"type": "edge_ngram"
}
},
"analyzer": {
"report_name_index": {
"filter": [
"lowercase",
"report_name_ngram"
],
"tokenizer": "keyword"
},
"report_name_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
}
}
Can you guys help me out in this.
Thanks in advance

Elastic Search FVH highlights the smallest matching token

Settings:
{
"settings": {
"analysis": {
"analyzer": {
"idx_analyzer_ngram": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding",
"edgengram_filter_1_32"
],
"tokenizer": "ngram_alltokenchar_tokenizer_1_32"
},
"ngrm_srch_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
},
"tokenizer": {
"ngram_alltokenchar_tokenizer_1_32": {
"token_chars": [
"letter",
"whitespace",
"punctuation",
"symbol",
"digit"
],
"min_gram": "1",
"type": "nGram",
"max_gram": "32"
}
}
}
}
}
Mappings:
{
"properties": {
"TITLE": {
"type": "string",
"fields": {
"untouched": {
"index": "not_analyzed",
"type": "string"
},
"ngramanalyzed": {
"search_analyzer": "ngrm_srch_analyzer",
"index_analyzer": "idx_analyzer_ngram",
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
Query:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "have some ha",
"fields": [
"TITLE.ngramanalyzed"
],
"default_operator": "and"
}
}
}
},
"highlight": {
"fields": {
"TITLE.ngramanalyzed": {}
}
}
}
I have document indexed with TITLE have some happy meal. When I search have some, I am able to get proper highlights.
<em>have</em> <em>some</em> happy meal
As i type more have some ha, the highlight results are not as expected.
<em>ha</em>ve <em>some</em> <em>ha</em>ppy meal
The have word gets partially highlighted as ha.
I would expect it to highlight the longest matching token, because with an ngrams with min size = 1, this gives me a highlight of 1 or more char while there should be another matching token of 4 or 5 chars (for example: have should also be highlighted along with ha being highlighted.
I am not able to find any solution for the same. Please suggest.

Boost if result begin with the word

I use Elasticsearch to search with autocompletion with an ngram filter. I need to boost a result if it starts with the search keyword.
My query is simple :
"query": {
"match": {
"query": "re",
"operator": "and"
}
}
And this is my results :
Restaurants
Couture et retouches
Restauration rapide
But I want them like this :
Restaurants
Restauration rapide
Couture et retouches
How can I boost a result starting with the keyword?
In case it can helps, here is my mapping :
{
"settings": {
"analysis": {
"analyzer": {
"partialAnalyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["asciifolding", "lowercase"]
},
"searchAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
"place": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "partialAnalyzer",
"search_analyzer": "searchAnalyzer",
"term_vector": "with_positions_offsets"
}
}
}
}
}
Regards,

How about this idea, not 100% sure of it as it depends on the data I think:
create a sub-field in your name field that should be analyzed with keyword analyzer (pretty much staying as is)
change the query to be a bool with shoulds
one should is the query you have now
the other should is a match with phrase_prefix on the sub-field.
The mapping:
{
"settings": {
"analysis": {
"analyzer": {
"partialAnalyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"asciifolding",
"lowercase"
]
},
"searchAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase"
]
},
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"place": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "partialAnalyzer",
"search_analyzer": "searchAnalyzer",
"term_vector": "with_positions_offsets",
"fields": {
"as_is": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
The query:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "re",
"operator": "and"
}
}
},
{
"match": {
"name.as_is": {
"query": "re",
"type": "phrase_prefix"
}
}
}
]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Highlight with fuzziness and ngram - elasticsearch

You could do that with highlight_query I guess Try this in your highlighting query. "highlight": { "fields": { "name": { "term_vector": "with_positions_offsets", "highlight_query": { "match": { "name.raw": { "query": "spotr", "fuzziness": 2 } } } } } } I hope it helps.

Related

How to make edge_ngram token match with certaint quantity ofwords between them?

Elasticsearch term query to number token

elasticsearch match query is not working for numbers

Elastic Search FVH highlights the smallest matching token

Boost if result begin with the word

Categories

Resources