Using different language analyzers with ngram Analyzer in one mapping in Elasticsearch

Using different language analyzers with ngram Analyzer in one mapping in Elasticsearch - elasticsearch

i want to use english and german custom analyzers together with other analyzers for example ngram. Is the following mapping correct? i am getting error for german analyzer. [unknown setting [index.filter.german_stop.type]. i searched but i did not find any information about using multiple language analyzers in custom type. Is it possible to use language specific ngram-filter?
PUT test {
"settings": {
"analysis": {
"analyzer": {
"english_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"english_stop",
"ngram_filter_en"
],
"tokenizer": "whitespace"
}
},
"filter": {
"english_stop": {
"type": "stop"
},
"ngram_filter_en": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 25
}
},
"german_analyzer" : {
"type" : "custom",
"filter" : [
"lowercase",
"german_stop",
"ngram_filter_de"
],
"tokenizer" : "whitespace"
}
},
"filter" : {
"german_stop" : {
"type" : "stop"
},
"ngram_filter_de" : {
"type" : "edge_ngram",
"min_ngram" : "1",
"max_gram" : 25
}
}
},
"mappings" : {
"dynamic" : true,
"properties": {
"content" : {
"tye" : "text",
"properties" : {
"en" : {
"type" : "text",
"analyzer" : "english_analyzer"
},
"de" : {
"type" : "text",
"analyzer" : "german_analyzer"
}
}
}
}

There are small syntax errors.
You have your last filter object outside the analysis context.
You cannot have same keys multiple times in a JSON.
So, below settings would help
{
"analysis": {
"analyzer": {
"english_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"english_stop",
"ngram_filter_en"
],
"tokenizer": "whitespace"
}
},
"filter": {
"english_stop": {
"type": "stop"
},
"ngram_filter_en": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 25
},
"german_stop": {
"type": "stop"
},
"ngram_filter_de": {
"type": "edge_ngram",
"min_ngram": "1",
"max_gram": 25
}
},
"german_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"german_stop",
"ngram_filter_de"
],
"tokenizer": "whitespace"
}
}
}
To understand the error in your mapping
{
"analysis": {
"analyzer": {
"filter": {
"english_stop": {
"type": "stop"
},
"ngram_filter_en": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 25
}
},
"german_analyzer" : {
"type" : "custom",
"filter" : [
"lowercase",
"german_stop",
"ngram_filter_de"
],
"tokenizer" : "whitespace"
}
},
"filter" : {//**This is outside analysis, you cannot simply add another filter key inside analysis, so you can merge both as above**
"german_stop" : {
"type" : "stop"
},
"ngram_filter_de" : {
"type" : "edge_ngram",
"min_ngram" : "1",
"max_gram" : 25
}
}

Related

Elasticsearch error while mapping - unknown setting

I'm trying to get this code to work but I'm getting the below error:
Reference : https://www.youtube.com/watch?v=PQGlhbf7o7c
Please let me know how this can be fixed. Thank You.
Code:
PUT test
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keywords"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
}
Error
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.analyzer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
}
],
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.analyzer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
"suppressed" : [
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.completion.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.edgengram.analyzer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.edgengram.search_analyzer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.edgengram.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.keywordstring.analyzer] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.fields.keywordstring.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
{
"type" : "illegal_argument_exception",
"reason" : "unknown setting [index.mappings.properties.name.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
}
]
},
"status" : 400
}

You were almost there but the payload structure when creating an index should like this:
PUT test
{
"settings": {
"analysis": {
...
}
},
"mappings": {
"properties": {
...
}
}
}
In your case this would mean:
PUT test
{
"settings": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keywords"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}

Why elastic search not returning result when query contains "IN" prefix?

Below Elastic Query is not returning any result for my application
"query" : {
"bool" : {
"must" : [
{
"simple_query_string" : {
"query" : "IN-123456",
"fields" : [
"field1.auto^1.0",
"field2.auto^1.0"
],
"flags" : -1,
"default_operator" : "AND",
"analyze_wildcard" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
Note that I have document present in elastic data source with matching text "IN-123456" for field2.
I am able to search same document with "123456" as text in query.
Below is the index used
{
"document_****": {
"aliases": {
"document": {}
},
"mappings": {
"_doc": {
"dynamic": "strict",
"date_detection": false,
"properties": {
"#timestamp": {
"type": "date"
},
"field2": {
"type": "keyword",
"fields": {
"auto": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
},
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "document_***",
"creation_date": "1****",
"analysis": {
"filter": {
"autocomplete_filter_30": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "30"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"stop",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"autocomplete_30": {
"filter": [
"lowercase",
"stop",
"autocomplete_filter_30"
],
"type": "custom",
"tokenizer": "standard"
},
"autocomplete_nonstop": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "***",
"version": {
"created": "6020499"
}
}
}
}
}
Note: Few values are replaced with * for confidentiality reason

Check your mapping. The below query working fine.
POST v_upload_branch/_doc
{
"branch_name":"IN-123456",
"branch_head":"Chennai",
}
GET v_upload_branch/_search
{
"query" : {
"bool" : {
"must" : [
{
"simple_query_string" : {
"query" : "IN-123456",
"fields" : [
"branch_head^1.0",
"branch_name^1.0"
],
"flags" : -1,
"default_operator" : "AND",
"analyze_wildcard" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
Below is the index used
{
"document_****": {
"aliases": {
"document": {}
},
"mappings": {
"_doc": {
"dynamic": "strict",
"date_detection": false,
"properties": {
"#timestamp": {
"type": "date"
},
"field2": {
"type": "keyword",
"fields": {
"auto": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
},
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "document_***",
"creation_date": "1****",
"analysis": {
"filter": {
"autocomplete_filter_30": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "30"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"stop",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"autocomplete_30": {
"filter": [
"lowercase",
"stop",
"autocomplete_filter_30"
],
"type": "custom",
"tokenizer": "standard"
},
"autocomplete_nonstop": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "***",
"version": {
"created": "6020499"
}
}
}
}
}
Note: Few values are replaced with * for confidentiality reason

After analyzing my index mapping found that token filter stop is removing the prefix IN from token streams. since it is part of default stop word list english stop words
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html
Because of this elastic search is ignoring the prefix IN while searching and not returning any result

Elastic Search Highlight Not Working With Custom Analyzer/Tokenizer

I can't figure out why highlight is not working. The query works but highlight just shows the field content without em tags. Here is my settings and mappings:
PUT wmsearch
{
"settings": {
"index.mapping.total_fields.limit": 2000,
"analysis": {
"analyzer": {
"custom": {
"type": "custom",
"tokenizer": "custom_token",
"filter": [
"lowercase"
]
},
"custom2": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"custom_token": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
}
}
},
"mappings": {
"doc": {
"properties": {
"document": {
"properties": {
"reference": {
"type": "text",
"analyzer": "custom"
}
}
},
"scope" : {
"type" : "nested",
"properties" : {
"level" : {
"type" : "integer"
},
"ancestors" : {
"type" : "keyword",
"index" : "true"
},
"value" : {
"type" : "keyword",
"index" : "true"
},
"order" : {
"type" : "integer"
}
}
}
}
}
}
}
Here is my query:
GET wmsearch/_search
{
"query": {
"simple_query_string" : {
"fields": ["document.reference"],
"analyzer": "custom2",
"query" : "bloom"
}
},
"highlight" : {
"fields" : {
"document.reference" : {}
}
}
}
The query does return the correct results and highlight field exists within results. However, there is not em tags around "bloom". Rather, it just shows the entire string with no tags at all.
Does anyone see any issues here or can help?
Thanks

I got it to work by adding "index_options": "offsets" to my mappings for document.reference.

How to add custom analyzer to mapping ElasticSearch-2.3.5 for partial searching?

I use ElasticSearch-2.3.5. I want to add my custom analyzer to mapping while index creating.
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
I take a template example from official guide.
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}
But I get an error trying to execute the first part of code. There is my error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [book]: analyzer [search_term_analyzer] not found for field [Title]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
},
"status": 400
}
I can do it if I put my mappings inside of settings, but I think that it is wrong way. So I try to find my book by using a part of title. I have the "King Arthur" book for example. My query looks like this:
POST /library/book/_search
{
"query": {
"match": {
"Title": "kin"
}
}
}
Nothing will be found. What I do wrong? Could you help me? It seems my analyzer and tokenizer don't work. How can I get the terms "k", "i", "ki", "king" etc.? Because I think that I have only two terms right now. There are 'king' and 'arthur'.

You have misplaced the search_term_analyzer analyzer, it should be inside the analyzer section
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long", <---- you probably need to make this a string or remove the analyzers
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
Also make sure to use analyzer instead of index_analyzer, the latter as been deprecated in ES 2.x

Getting elasticsearch synonym work?

I'm trying a simple test on elasticsearch synonym without success, this is what I am so far
POST /mysearch
{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 0,
"analysis": {
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
},
"my_stopwords": {
"type": "stop",
"stopwords": [ ]
},
"mysynonym" : {
"type" : "synonym",
"synonyms" : [
"foo => bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"index_html_analyzer":{
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"search_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "mysynonym", "lowercase", "my_stopwords" ]
}
}
}
},
"mappings" : {
"news" : {
"_source" : { "enabled" : true },
"_all" : {"enabled" : false},
"properties" : {
"name" : { "type" : "string", "index" : "analyzed", "store": "yes" , "analyzer": "index_text_analyzer" , "search_analyzer": "search_text_analyzer" }
}
}
}
}
Add some documnents
POST /mysearch/news
{
"name":"foo kar"
}
POST /mysearch/news
{
"name":"bar kar"
}
Do a search
POST /mysearch/_search?q=name:foo
{
}
Give me result that match foo , not bar , so why?

I think you are doing it wrong, for the following reasons:
why do you use foo => bar? This means that you replace foo with bar, whereas if they are synonyms, they should be both indexed. So, I would use foo,bar instead.
why, at indexing time, you are using a different analyzer than at search time? At indexing time you will want your text to be indexed using its synonyms.
Let me give you an example: assuming you index foo kar. Since bar is a synonym of foo you'd want to index its synonym, as well, so that the index will contain foo, bar, kar. In this way, if you search for foo or bar that document WILL be found in the index even if the original text didn't contain bar.
These being said, I would suggest the following:
POST /mysearch
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_stopwords": {
"type": "stop",
"stopwords": []
},
"mysynonym": {
"type": "synonym",
"synonyms": [
"foo,bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"index_html_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"search_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"mysynonym",
"lowercase",
"my_stopwords"
]
}
}
}
},
"mappings": {
"news": {
"_source": {
"enabled": true
},
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "string",
"index": "analyzed",
"store": "yes",
"analyzer": "search_text_analyzer"
}
}
}
}
}
Or, if you don't want to index the synonyms, just indexing the original text and then, only at search time, search for the synonyms, as well, do the following changes:
"synonyms": ["foo,bar"] because, as I mentioned above, you will replace foo with bar otherwise
explicitly specify the two analyzers:
"index_analyzer": "index_text_analyzer",
"search_analyzer": "search_text_analyzer"
The two changes above will result in your text being indexed as is (with no synonyms), but at search time, when you want to search for foo, Elasticsearch will instead search for its synonym, as well: foo or bar.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using different language analyzers with ngram Analyzer in one mapping in Elasticsearch - elasticsearch

Related

Elasticsearch error while mapping - unknown setting

Why elastic search not returning result when query contains "IN" prefix?

Elastic Search Highlight Not Working With Custom Analyzer/Tokenizer

How to add custom analyzer to mapping ElasticSearch-2.3.5 for partial searching?

Getting elasticsearch synonym work?

Categories

Resources