I'm using elastic search 5.3 and have this super simple multi_match query. None of the documents contains the ridiculous query, neither in title nor content. Still ES gives me plenty of matches.
{
"query" : {
"multi_match" : {
"query" : "42a6o8435a6o4385q023bf50",
"fields" : [
"title","content"
],
"type" : "best_fields",
"operator" : "AND",
"minimum_should_match" : "100%"
}
}
}
}
The analyzer for content is "english" and for title it is this one:
"analyzer": {
"english_autocomplete": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"english_stemmer",
"autocomplete_filter"
]
}
}
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
Am I missing something or how can I tell ES not to do that?
Related
I'm having trouble trying to search special characters using query string. I need to search an email address in format "xxx#xxx.xxx". At index time I use a custom normalizer which provide lowercase and ascii folding. At search time I use a custom analyzer which provide a tokenizer for whitespace and a filter that apply lowercase and ascii folding. By the way I am not able to search for a simple email address.
This is my mapping
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"lowerasciinormalizer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"email": {
"type": "keyword",
"normalizer": "lowerasciinormalizer"
}
}
}
And this is my search query
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "pippo#pluto.it",
"fields": [
"email"
],
"analyzer": "folding"
}
}
]
}
}
}
Searching without special characters works fine. Infact if I do "query": "pippo*" I get the correct results.
I also tested the tokenizer doing
GET /_analyze
{
"analyzer": "whitespace",
"text": "pippo#pluto.com"
}
I get what I expect
{
"tokens" : [
{
"token" : "pippo#pluto.com",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
Any suggestions?
Thanks.
Edit:
I'm using elasticsearch 7.5.1
This works right. My problem was somewhere else.
I was wondering if there is any way for the phrase suggester to correct prefix spelling mistakes on phonetic differences.
Elasticsearch 5.1.2
Testing in Kibana 5.1.2
For Example:
Instead of "circus" someone wrote "sircus", or instead of "coding" someone wrote "koding".
Funny thing is, that instead of "phrase" you can write "frase" and get a suggestion.
Here is my setup.
Settings:
PUT text_index
{
"settings": {
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"shingle_filter"
],
"type": "custom"
},
"reverse": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "reverse"]
}
},
"filter": {
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 5,
"type": "shingle"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"suggest_field": {
"type": "text",
"analyzer": "suggests_analyzer",
"fields": {
"reverse": {
"type": "text",
"analyzer": "reverse"
}
}
}
}
}
}
}
Some documents:
POST test_index/test_type/_bulk
{"index":{}}
{ "suggest_field": "phrase"}
{"index":{}}
{ "suggest_field": "Circus"}
{"index":{}}
{ "suggest_field": "Coding"}
Querying:
POST /so-index/_search
{
"suggest" : {
"text" : "sircus",
"simple_phrase" : {
"phrase" : {
"field" : "suggest_field",
"max_errors": 0.9,
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
},
"direct_generator" : [ {
"field" : "suggest_field",
"suggest_mode" : "always"
}, {
"field" : "suggest_field.reverse",
"suggest_mode" : "always",
"pre_filter" : "reverse",
"post_filter" : "reverse"
}]
}
}
}
}
Also, I repeat following steps a few times (between 5 and 10) without changing anything:
delete index
put index, settings & mappings
add documents
query (codign)
Sometimes I get suggestions and sometimes I don't. Is there any explanation for it?
Try setting "prefix_length": 0 in the direct_generator.
I have been reading this documentation from elasticsearch:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/analysis-word-delimiter-tokenfilter.html
And at the same was searching the internet for examples.
Unfortunately not just there were a few examples most of them doesn't work.
I would really appreciate it if someone can post or give an example on how to use word_delimeter token filter in elasticsearch.
Thanks.
Elasticversion - 5.2
Try the following mappings
PUT demo
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer_v1" : {
"tokenizer" : "whitespace",
"filter" : [ "word_delimeter"]
}
},
"filter": {
"ngram_filter" : {
"type" : "nGram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
},
"word_delimeter" : {
"type" : "word_delimiter",
"generate_number_parts" : true,
"catenate_words" : true,
"catenate_numbers": true,
"preserve_original" : true,
"stem_english_possessive": true
},
"stop_words" : {
"type": "stop",
"stopwords": ["and", "is", "the", "we", "in", "are", "was", "were", "of"]
}
}
}
},
"mappings": {
"product": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer_v1"
}
}
}
}
}
Index the following document
POST demo/product
{
"name":"SH-09"
}
Run the following query
POST demo/_search
{
"query": {"bool": {"must": [
{"term": {
"name": {
"value": "09"
}
}}
]}}
}
Furthur more if you want to see the values stored in inverted index, run the following query
GET demo/_analyze?analyzer=index_analyzer_v1&text=SH-09
Hope this helps
I know this questions is old, but anyways...
I cant comment because i dont have reputation points, but if you try to search "everybody" there is no "-" as provided in the above answer,if you want to search "body" in "everybody" maybe a simple wildcard would help you.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-wildcard-query.html
i hope it was usefull
I am searching for the following keyphrase: "xiaomi redmi note 3" in my Elasticsearch database. I m making the following bool query:
"filtered" : {
"query" : {
"match" : {
"name" : {
"query" : "xiaomi redmi note 3",
"type" : "boolean",
"operator" : "AND"
}
}
}
}
However, no matches are found. Still in Elasticsearch there is the following document:
xiaomi redmi note 3 16GB 4G Phablet
Why doesnt es match this document?
What I noticed in general is that es doesnt match any numbers in the keyprhase? Does it have to do with the analyzer I m using?
EDIT
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"autocomplete_filter"
]
},
and the mapping for my field is:
"name" : {
"type" : "string",
"index_analyzer" : "second",
"search_analyzer" : "standard",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
},
"search_quote_analyzer" : "second"
},
Autocomplete_filter:
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
I want to use elasticsearch for multi-word searches, where all the fields are checked in a document with the assigned analyzers.
So if I have a mapping:
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
},
"mappings" : {
"typeName" :{
"date_detection": false,
"properties" : {
"stringfield" : {
"type" : "string",
"index" : "folding"
},
"numberfield" : {
"type" : "multi_field",
"fields" : {
"numberfield" : {"type" : "double"},
"untouched" : {"type" : "string", "index" : "not_analyzed"}
}
},
"datefield" : {
"type" : "multi_field",
"fields" : {
"datefield" : {"type" : "date", "format": "dd/MM/yyyy||yyyy-MM-dd"},
"untouched" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
}
As you see I have different types of fields, but I do know the structure.
What I want to do is starting a search with a string to check all fields using the analyzers too.
For example if the query string is:
John Smith 2014-10-02 300.00
I want to search for "John", "Smith", "2014-10-02" and "300.00" in all the fields, calculating the relevance score as well. The better solution is the one that have more field matches in a single document.
So far I was able to search in all the fields by using multi_field, but in that case I was not able to parse 300.00, since 300 was stored in the string part of multi_field.
If I was searching in "_all" field, then no analyzer was used.
How should I modify my mapping or my queries to be able to do a multi-word search, where dates and numbers are recognized in the multi-word query string?
Now when I do a search, error occurs, since the whole string cannot be parsed as a number or a date. And if I use the string representation of the multi_search then 300.00 will not be a result, since the string representation is 300.
(what I would like is similar to google search, where dates, numbers and strings are recognized in a multi-word query)
Any ideas?
Thanks!
Using whitespace as filter in analyzer and then applying this analyzer as search_analyzer to fields in mapping will split query in parts and each of them would be applied to index to find the best matching. And using ngram for index_analyzer would very improve results.
I am using following setup for query:
"query": {
"multi_match": {
"query": "sample query",
"fuzziness": "AUTO",
"fields": [
"title",
"subtitle",
]
}
}
And for mappings and settings:
{
"settings" : {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase",
"ngram"
]
}
},
"filter": {
"ngram": {
"type": "ngram",
"min_gram": 2,
"max_gram": 15
}
}
},
"mappings": {
"title": {
"type": "string",
"search_analyzer": "whitespace",
"index_analyzer": "autocomplete"
},
"subtitle": {
"type": "string"
}
}
}
See following answer and article for more details.