I'm trying to create a custom analyzer in elasticsearch. here is the analyzer
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer" : "standard",
"filter" : ["custom_stopper", "custom_stems", "custom_synonyms"]
},
"filter" : {
"custom_stopper" : {
"type" : "stop",
"stopwords_path" : "analyze/stopwords.txt"
},
"custom_stems" : {
"type" : "stemmer_override",
"rules_path" : "analyze/stem.txt"
},
"custom_synonyms" : {
"type" : "synonyms",
"synonyms_path" : "analyze/synonym.txt"
}
}
}
}
}
}
but it throwing error
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "analyzer [filter] must specify either an analyzer type, or a tokenizer"
}
],
"type": "illegal_argument_exception",
"reason": "analyzer [filter] must specify either an analyzer type, or a tokenizer"
},
"status": 400
}
What I'm doing wrong here?
The filter must be on the same level with analyzer.
The structure looks somehow like this:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"custom_stopper",
"custom_stems",
"custom_synonyms"
]
}
},
"filter": {
"custom_stopper": {
"type": "stop",
"stopwords_path": "analyze/stopwords.txt"
},
"custom_stems": {
"type": "stemmer_override",
"rules_path": "analyze/stem.txt"
},
"custom_synonyms": {
"type": "synonyms",
"synonyms_path": "analyze/synonym.txt"
}
}
}
}
}
Related
I am trying to create an Elasticsearch index using below JSON which is causing an exception. The current version of elastic search I'm using is 6.4.0.
The exception says Root mapping definition has unsupported parameters. Not sure what is the problem
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
},
"mappings" :{
"properties" :{
"title" :{
"type" : "text",
"analyzer" : "my_analyzer"
}
}
}
}
This is causing below exception:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [title : {analyzer=my_analyzer, type=text}]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [properties]: Root mapping definition has unsupported parameters: [title : {analyzer=my_analyzer, type=text}]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [title : {analyzer=my_analyzer, type=text}]"
}
},
"status": 400
}
This is due to the fact, that you are not adding _doc in your mappings section, this is because types are deprecated and see why this is mandatory to add in Elasticsearch 6.X version as mentioned in schedule_for_removal_of_mapping_types
Please refer the Elasticsearch 6.4 official doc on how to create mappings.
To make it work, you need to add the _doc in the mapping section as below:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
},
"mappings": {
"_doc": { // Note this, you need to add this
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
I'm using Elasticsearch 6.8 with python 3.7
I'm trying to create my own synonyms which refer to emoticons as text.
for example: ":-)" will refer as "happy-smiley".
I'm trying to build and create the synonyms and index with the following code:
def create_analyzer(es_api, index_name, doc_type):
body = {
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
":-), happy-smiley",
":-(, sad-smiley"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
doc_type: {
"properties": {
"tweet": {"type": "text", "fielddata": "true"},
"existence": {"type": "text"},
"confidence": {"type": "float"}
}
}}
}
res = es_api.indices.create(index=index_name, body=body)
But I'm getting errors:
lasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'failed to build synonyms')
What is wrong and how can I fix it ?
I can say you whats wrong, (updated) how to fix this.
So if you will run this query in dev tools or bu cURL you will see the reason of error - think that Python cutting error details, so you cannot see reason.
PUT st_t3
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
":-), happy-smiley",
":-(, sad-smiley"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"tweet": {
"type": "text",
"fielddata": "true"
},
"existence": {
"type": "text"
},
"confidence": {
"type": "float"
}
}
}
}
Response:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[127.0.0.1:9301][indices:admin/create]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: :-) was completely eliminated by analyzer"
}
}
},
"status": 400
}
So the reason "reason": "term: :-) was completely eliminated by analyzer" - means that Elastic not supporting this characters in synonym filter.
UPDATE
It can be done by char_filter filter.
Example:
PUT st_t3
{
"settings": {
"index": {
"analysis": {
"char_filter": {
"happy_filter": {
"type": "mapping",
"mappings": [
":-) => happy-smiley",
":-( => sad-smiley"
]
}
},
"analyzer": {
"smile_analyzer": {
"type": "custom",
"char_filter": [
"happy_filter"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"properties": {
"tweet": {
"type": "text",
"fielddata": "true"
},
"existence": {
"type": "text"
},
"confidence": {
"type": "float"
}
}
}
}
Test
POST st_t3/_analyze
{
"text": ":-) test",
"analyzer": "smile_analyzer"
}
Answer
{
"tokens" : [
{
"token" : "happy",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "smiley",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "test",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
I am trying to configure an index in elastic search with the PUT commands as below and getting the error as following. I'm not able to figure out what's the error. I'm using 7.4.2 version for ELK
The PUT code is as below:
PUT /myfeed
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["html_strip"],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"feed": {
"_all": {
"enabled": false
},
"properties": {
"feed": {
"type": "keyword"
},
"link": {
"type": "keyword"
},
"published": {
"type": "date"
},
"message": {
"type": "string",
"analyzer": "folding"
},
"title": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
the error on the console is as below:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [feed : {_all={enabled=false}, properties={feed={type=keyword}, link={type=keyword}, published={type=date}, message={analyzer=folding, type=string}, title={analyzer=folding, type=string}}}]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters: [feed : {_all={enabled=false}, properties={feed={type=keyword}, link={type=keyword}, published={type=date}, message={analyzer=folding, type=string}, title={analyzer=folding, type=string}}}]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [feed : {_all={enabled=false}, properties={feed={type=keyword}, link={type=keyword}, published={type=date}, message={analyzer=folding, type=string}, title={analyzer=folding, type=string}}}]"
}
},
"status": 400
}
Please help me fixing this
_all is deprecated as told by Lupanoide(thanks)
So The mapping that worked for me as below
PUT /myfeed
{
"mappings": {
"properties": {
"feed": {
"type": "keyword"
},
"link": {
"type": "keyword"
},
"published": {
"type": "date"
},
"message": {
"type": "text",
"analyzer": "folding"
},
"title": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["html_strip"],
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
Thanks
I can't figure out why highlight is not working. The query works but highlight just shows the field content without em tags. Here is my settings and mappings:
PUT wmsearch
{
"settings": {
"index.mapping.total_fields.limit": 2000,
"analysis": {
"analyzer": {
"custom": {
"type": "custom",
"tokenizer": "custom_token",
"filter": [
"lowercase"
]
},
"custom2": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"custom_token": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
}
}
},
"mappings": {
"doc": {
"properties": {
"document": {
"properties": {
"reference": {
"type": "text",
"analyzer": "custom"
}
}
},
"scope" : {
"type" : "nested",
"properties" : {
"level" : {
"type" : "integer"
},
"ancestors" : {
"type" : "keyword",
"index" : "true"
},
"value" : {
"type" : "keyword",
"index" : "true"
},
"order" : {
"type" : "integer"
}
}
}
}
}
}
}
Here is my query:
GET wmsearch/_search
{
"query": {
"simple_query_string" : {
"fields": ["document.reference"],
"analyzer": "custom2",
"query" : "bloom"
}
},
"highlight" : {
"fields" : {
"document.reference" : {}
}
}
}
The query does return the correct results and highlight field exists within results. However, there is not em tags around "bloom". Rather, it just shows the entire string with no tags at all.
Does anyone see any issues here or can help?
Thanks
I got it to work by adding "index_options": "offsets" to my mappings for document.reference.
I am trying to use the Completion suggester with Greek language. Unfortunately I have problems with accents like ά. I've tried a few ways.
One was simply to set the greek analyzer in the mapping the other a lowercase analyzer with asciifolding. No success, with greek analyser I dont even get a result with the accent.
Below is what I did, would be great if anyone can help me out here.
Mapping
PUT t1
{
"mappings": {
"profession" : {
"properties" : {
"text" : {
"type" : "keyword"
},
"suggest" : {
"type" : "completion",
"analyzer": "greek"
}
}
}
}
}
Dummy
POST t1/profession/?refresh
{
"suggest" : {
"input": [ "Μάγειρας"]
}
,"text": "Μάγειρας"
}
Query
GET t1/profession/_search
{ "suggest":
{ "profession" :
{ "prefix" : "Μα"
, "completion" :
{ "field" : "suggest"}
}}}
I found a way to do it with a custom analyzer or via a plugin for es which i highly recommend when it comes to non-latin texts.
Option 1
PUT t1
{ "settings":
{ "analysis":
{ "filter":
{ "greek_lowercase":
{ "type": "lowercase"
, "language": "greek"
}
}
, "analyzer":
{ "autocomplete":
{ "tokenizer": "lowercase"
, "filter":
[ "greek_lowercase" ]
}
}
}}
, "mappings": {
"profession" : {
"properties" : {
"text" : {
"type" : "keyword"
},
"suggest" : {
"type" : "completion",
"analyzer": "autocomplete"
}
}}}
}
Option 2 ICU Plugin
Install ES Plugin:
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html
{ "settings": {
"index": {
"analysis": {
"normalizer": {
"latin": {
"filter": [
"custom_latin_transform"
]
}
},
"analyzer": {
"latin": {
"tokenizer": "keyword",
"filter": [
"custom_latin_transform"
]
}
},
"filter": {
"noDelimiter": {"type": "word_delimiter"},
"custom_latin_transform": {
"type": "icu_transform",
"id": "Greek-Latin/UNGEGN; Lower(); NFD; [:Nonspacing Mark:] Remove; NFC"
}
}
}
}
}
, "mappings":
{ "doc" : {
"properties" : {
"verbose" : {
"type" : "keyword"
},
"name" : {
"type" : "keyword"
},
"slugHash":{
"type" : "keyword",
"normalizer": "latin"
},
"level": { "type": "keyword" },
"hirarchy": {
"type" : "keyword"
},
"geopoint": { "type": "geo_point" },
"suggest" :
{ "type" : "completion"
, "analyzer": "latin"
, "contexts":
[ { "name": "level"
, "type": "category"
, "path": "level"
}
]
}}
}
}}