I need to explain some weird behavior of term query to Elasticsearch database which contains number part in the string. Query is pretty simple:
{
"query": {
"bool": {
"should": [
{
"term": {
"address.street": "8 kvetna"
}
}
]
}
}
}
The problem is that term 8 kvetna returns empty result. I tried to _analyze it ad it make regular tokens like 8, k, kv, kve .... Also I am pretty sure there is a value 8 kvetna in database.
Here is the mapping for the field:
{
"settings": {
"index": {
"refresh_interval": "1m",
"number_of_shards": "1",
"number_of_replicas": "1",
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"asciifolding",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
"default": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"doc": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"analyzer": "autocomplete"
},
"street": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
}
}
What caused this weird result? I don't understand it. Thanks for any help.
Great start so far! Your only issue is that you're using a term query, while you should use a match one. A term query will try to do an exact match for 8 kvetna and that's not what you want. The following query will work:
{
"query": {
"bool": {
"should": [
{
"match": { <--- change this
"address.street": "8 kvetna"
}
}
]
}
}
}
Related
I'm trying to make a search request that retrieves the results only when less than
5 words are between requested tokens.
{
"settings": {
"index": {
"analysis": {
"filter": {
"stopWords": {
"type": "stop",
"stopwords": [
"_english_"
]
}
},
"normalizer": {
"lowercaseNormalizer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"char_filter": []
}
},
"analyzer": {
"autoCompleteAnalyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "autoCompleteTokenizer"
},
"autoCompleteSearchAnalyzer": {
"type": "custom",
"tokenizer": "lowercase"
},
"charGroupAnalyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "charGroupTokenizer"
}
},
"tokenizer": {
"charGroupTokenizer": {
"type": "char_group",
"max_token_length": "20",
"tokenize_on_chars": [
"whitespace",
"-",
"\n"
]
},
"autoCompleteTokenizer": {
"token_chars": [
"letter"
],
"min_gram": "3",
"type": "edge_ngram",
"max_gram": "20"
}
}
}
}
}
}
The settings:
{
"mappings": {
"_doc": {
"properties": {
"description": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 64
}
},
"analyzer": "autoCompleteAnalyzer",
"search_analyzer": "autoCompleteSearchAnalyzer"
},
"text": {
"type": "text",
"analyzer": "charGroupAnalyzer"
}
}
}
}
}
}
}
And make a bool request with request:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"description.name"
],
"operator": "and",
"query": "rounded elephant",
"fuzziness": 1
}
},
{
"match_phrase": {
"description.text": {
"analyzer": "charGroupAnalyzer",
"query": "rounded elephant",
"slop": 5,
"boost": 20
}
}
}
]
}
}
}
I expect the request to retrieve documents, where description contains:
... rounded very interesting elephant ...
This works good, when i use the complete words, like rounded elephant.
But, whe i enter prefixed words, like round eleph it fails.
But it's obvious that the description.name and description.text have different tokenizers (name contains ngram tokens, but text contain word tokens), so i get completely wrong results.
How can I configure mappings and search, to be able to use ngrams with distance between tokens?
Hi can somebody help me to understand how Elasticsearch evaluates the relevance of tokens? I have a field nn which mapping looks like
{
"settings": {
"index": {
"refresh_interval": "-1",
"number_of_shards": "4",
"analysis": {
"filter": {
"stopwords_SK": {
"ignore_case": "true",
"type": "stop",
"stopwords_path": "stopwords/slovak.txt"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"stopwords_SK",
"lowercase",
"stopwords_SK",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1"
}
},
"mappings": {
"doc": {
"dynamic": "strict",
"properties": {
"nn": {
"type": "text",
"fielddata": true,
"fields": {
"raw": {
"type": "keyword"
}
},
"boost": 10,
"analyzer": "autocomplete"
}
}
}
}
}
The nn field is tokenized via standard tokenizer. Next simple query works well and returns relevant result like "softone sro", "softec sro"...
{
"_source": [
"nn",
"nazov"
],
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"nn": "softo"
}
}
]
}
}
}
But if I need to add should condition to the query it returns absolutely no relevant results and previous most relevant like "sofone" or "softex" are missing. It returns e.g. "zo soz kovo zts nova as zts elektronika as" or "agentura socialnych sluzieb ass no"...
Here is the should query
{
"_source": [
"nn",
"nazov"
],
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"nn": "softo"
}
}
],
"should": [
{
"match": {
"nn": "as"
}
},
{
"match": {
"nn": "sro"
}
}
]
}
}
}
Why the should query result missing "sofone" and "softex" items which are the most relevant in the first query? I though the relevance is based on the token length which means the "sotf" token is more relevant then "so" token.
Thanks.
I am trying to implement a auto complete suggester for movies(title) somewhat similar to IMDB. Below is mapping that i used.This mapping gives decent results. I am using edge Ngam.. are there any better alternatives?
But it has some flaws like.
"war civil" "civil war" gives same results. ie it doesn't give priority to movies with words in same order as query.
It doesn't give any results when space is omitted between words eg "smoking barrels" gives good results. but "smokingbarrels" gives zero result.
What is wrong with query and mapping below?
curl -XPUT "http://localhost:9200/movieindex" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
}
},
"mappings": {
"movies": {
"properties": {
"title": {
"type": "text",
"fields": {
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
}
},
"analyzer": "standard"
}
}
}
}
}
GET /movieindex/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title.edgengram": {
"query": "smokingbarrels",
"fuzziness": 1
}
}
}
]
}
}
}
I have a search query which is used to search in report name.
I have indexed the field name with autocomplete,edge_ngram
Normal field name search is proper when i'm having a number / year in the field name it's not working.
Query :
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"field_name": {
"query": "hybrid seeds india 2017",
"operator": "and"
}
}
}
]
}
}
}
},
"from": 0,
"size": 10
}
Setting and the Mappings
{
"mappings": {
"pages": {
"properties": {
"report_name": {
"fields": {
"autocomplete": {
"search_analyzer": "report_name_search",
"analyzer": "report_name_index",
"type": "string"
},
"report_name": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "multi_field"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"report_name_ngram": {
"max_gram": 150,
"min_gram": 2,
"type": "edge_ngram"
}
},
"analyzer": {
"report_name_index": {
"filter": [
"lowercase",
"report_name_ngram"
],
"tokenizer": "keyword"
},
"report_name_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
}
}
Can you guys help me out in this.
Thanks in advance
I guess the title of the topic spoiled you enough :D
I use edge_ngram and highlight to build an autocomplete search. I have added fuzziness in the query to allow users to mispell their search, but it brokes a bit the highlight.
When i write Sport this is what I get :
<em>Spor</em>t
<em>Spor</em>t mécanique
<em>Spor</em>t nautique
I guess it's because it matches with the token spor generated by the ngram tokenizer.
The query:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "sport",
"operator": "and",
"fuzziness": "AUTO"
}
}
},
{
"match_phrase_prefix": {
"name.raw": {
"query": "sport"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets"
}
}
}
}
And the mapping:
{
"settings": {
"analysis": {
"analyzer": {
"partialAnalyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["asciifolding", "lowercase"]
},
"keywordAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
},
"searchAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
"place": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "partialAnalyzer",
"search_analyzer": "searchAnalyzer",
"term_vector": "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"analyzer": "keywordAnalyzer"
}
}
}
}
}
}
}
I tried to add a new match clause without fuzziness in the query to try to match the keyword before the match with fuzziness but it changed nothing.
'match': {
'name': {
'query': 'sport',
'operator': 'and'
}
Any idea how I can handle this?
Regards, Raphaël
You could do that with highlight_query I guess
Try this in your highlighting query.
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets",
"highlight_query": {
"match": {
"name.raw": {
"query": "spotr",
"fuzziness": 2
}
}
}
}
}
}
I hope it helps.