Updating analyzer within ElasticSearch settings - elasticsearch

I'm using Sense (Chrome plugin) and I've managed to setup an analyzer and it is working correctly. If I issue a GET (/media/_settings) on the settings the following is returned.
{
"media": {
"settings": {
"index": {
"creation_date": "1424971612982",
"analysis": {
"analyzer": {
"folding": {
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"uuid": "ks98Z6YCQzKj-ng0hU7U4w",
"version": {
"created": "1040499"
},
"number_of_replicas": "1"
}
}
}
}
I am trying to update it by doing the following:
Closing the index
Issuing this PUT command (removing a filter)
PUT /media/_settings
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
}
}
}
}
Opening the index
But when the settings come back, the filter is not removed. Can you not update an analyzer once you've created it?

Short answer: No.
Longer answer. From the ES docs:
"Although you can add new types to an index, or add new fields to a
type, you can’t add new analyzers or make changes to existing fields.
If you were to do so, the data that had already been indexed would be
incorrect and your searches would no longer work as expected."
Best way is to create a new index, and move your data. Some clients have helpers to do this for you, but it's not part of the standard Java client.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html

Related

How to use a custom analyser on specific elasticsearch documents

Suppose I have a custom analyser that I want to use on only specific documents that have the table entity_type, how would I go about that?
Document I want to match:
{
... other keys
"_source": {
"entity_type": "table" // <-- I want to match this and use the custom analyser on this entire document
}
}
Custom analyser (currently just set to the default but I want it to only affect tables)
elasticsearch.indices.create(
index="myIndex",
body={
"settings": {
"analysis": {
"char_filter": {
"underscore_to_dash": {
"type": "mapping",
"mappings": ["_ => -"],
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"],
"char_filter": ["underscore_to_dash"],
}
},
},
}
})
Analyzer can only be applied to a specific field. So in your case it might make sense to have 2 fields and use one for "entity_type": "table" and another for other docs.

Elasticsearch Single analyzer across multiple index

I have time-based indices
students-2018
students-2019
students-2020
I have defined 1 analyzer with synonyms, I want to reuse the same analyzer across multiple indexes, how do I achieve that?
You can define an index template and then create your custom analyzer with that template which includes all your student indices.
You can add your index-pattern in below index template call as mention in the official doc.
Sample index template def
{
"index_patterns": ["student-*"],
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}
Now all your student indices like students-2018 , students-2019 will have this my_custom_analyzer which is defined in the index template.
Create a student index without any setting and analyzer like
http://{{you-es-hostname}}/student-2018
And then check its setting using GET http://{{you-es-hostname}}/student-2018, which would give below output and includes the analyzer created in the index template.
{
"student-2018": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "student-2018",
"creation_date": "1588653678067",
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "kjGEgKCOSJeIlrASP-RaMQ",
"version": {
"created": "7040299"
}
}
}
}
}

Empty value generates mapper_parsing_exception for Elasticsearch completion suggester field

I have a name field which is a completion suggester, and indexing generates a mapper_parsing_exception error, stating value must have a length > 0.
There are indeed some empty values in this field. How do I accommodate them?
ignore_malformed had no effect, either at the properties or index level.
I tried filtering out empty strings in the analyzer, setting a min length:
PUT /genes
{
"settings": {
"analysis": {
"filter": {
"remove_empty": {
"type": "length",
"min": 1
}
},
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"remove_empty"
]
}
}
}
},
"mappings": {
"gene": {
"name": {
"type": "completion",
"analyzer": "keyword_lowercase"
}
}
}
}
}
Or filter empty strings as a stopword:
"remove_empty": {
"type": "stop",
"stopwords": [""]
}
Attempting to apply a filter to the name mapping generates an unsupported parameter error:
"mappings": {
"gene": {
"name": {
"type": "completion",
"analyzer": "keyword_lowercase",
"filter": "remove_empty"
}
}
}
}
This sure feels like it ought to be simple. Is there a way to do this?
Thanks!
I have faced the same issue. After some research it seems to me that currently the only option is to change data (e.g. replace empty values with some dummy non-empty values) before indexing.
But there is also good news. This issue exists on GitHub and was resolved about a month ago. It is planned to be released in version 6.4.0.

title.folded not working in elasticsearch

I am new to elastic-search . I have created an index and have used the following analyser
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
while creating the index . The problem is when i use
title.folded = "string to be searched" i am unable to get the results for some data which is present in the index and if i don't use i get the results but then the accent does not work . What could be the problem ?
you only configured that analyzer in the mapping, but you also have to configure this analyzer for certain fields. See the example in the analyzer docs.

Remove index filter settings on an existing index with documents

I've created an index on my elasticsearch server with the following settings:
PUT /myindex
{
"settings": {
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
After adding a lot of documents, I've updated my index settings using the following request:
PUT /myindex/_settings
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "asciifolding" ]
}
}
}
}
}
and removed the index lowercase filter, but it seems that all my documents on that index are still indexed with lowercase filtering. Should I have to reindex all my documents (sigh) or is there any way to tell elasticsearch to update all documents considering my new filter settings?
You need to reindex, basically underlying lucene index segment is immutable. If you have fresh ES version this API will help you: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html otherwise you have to use search&scroll or just refetch the data from the original source

Resources