Elasticsearch Single analyzer across multiple index - elasticsearch

I have time-based indices
students-2018
students-2019
students-2020
I have defined 1 analyzer with synonyms, I want to reuse the same analyzer across multiple indexes, how do I achieve that?

You can define an index template and then create your custom analyzer with that template which includes all your student indices.
You can add your index-pattern in below index template call as mention in the official doc.
Sample index template def
{
"index_patterns": ["student-*"],
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}
Now all your student indices like students-2018 , students-2019 will have this my_custom_analyzer which is defined in the index template.
Create a student index without any setting and analyzer like
http://{{you-es-hostname}}/student-2018
And then check its setting using GET http://{{you-es-hostname}}/student-2018, which would give below output and includes the analyzer created in the index template.
{
"student-2018": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "student-2018",
"creation_date": "1588653678067",
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "kjGEgKCOSJeIlrASP-RaMQ",
"version": {
"created": "7040299"
}
}
}
}
}

Related

Custom predefined stopword list in Elasticsearch

How can i define a custom stopword list globally in a way it is accessible from all indexes.
it would be ideal to use this stopword list just like the way we use predefined language-specific stopword lists:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_my_predefined_stopword_list_"
}
}
}
}
}
The official elastcisearch documentation describes how to create a custom filter with a list of stopwords. You can find the description here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": [ "si", "esta", "el", "la" ]
},
"light_spanish": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"my_spanish": {
"tokenizer": "spanish",
"filter": [
"lowercase",
"asciifolding",
"spanish_stop",
"light_spanish"
]
}
}
}
}
}
After defining this filter spanish_stop you can use it in the definition of your indices.

Finding Synonym library

I have been using synonym in Elasticsearch to map data. I have created a index setting like this
PUT uk-2016.06.22
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,uk,england,britain"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
Rather than created manually a library txt document in mapping synonyms, is there any synonym library available for downloading to map data for elasticsearch application? Because it seems difficult for me to find one.

Elastic search cluster level analyzer

How can I define one custom analyzer that will be used in more than one index (in a cluster level)? All the examples I can find shows how to create a custom analyzer on a specific index.
My analyzer for example:
PUT try_index
{
"settings": {
"analysis": {
"filter": {
"od_synonyms": {
"type": "synonym",
"synonyms": [
"dog, cat => animal",
"john, lucas => boy",
"emma, kate => girl"
]
}
},
"analyzer": {
"od_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"od_synonyms"
]
}
}
}
},
"mappings": {
"record": {
"properties": {
"name": {
"type": "string",
"analyzer":"standard",
"search_analyzer": "od_analyzer"
}
}
}
}
}
Any idea how to change my analyzer scope to cluster level?
thanks
There is no "scope" for analyzers. But you can do something similar with index templates:
PUT /_template/some_name_here
{
"template": "a*",
"order": 0,
"settings": {
"analysis": {
"filter": {
"od_synonyms": {
"type": "synonym",
"synonyms": [
"dog, cat => animal",
"john, lucas => boy",
"emma, kate => girl"
]
}
},
"analyzer": {
"od_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"od_synonyms"
]
}
}
}
}
}
And at "template" you should put the name of the indices that this template should be applied to when the index is created. You could very well specify "*" and matching all the indices. I think that's the best you can do for what you want.

ngrams ins elasticsearch are not working

I use elasticsearch ngram
"analysis": {
"filter": {
"desc_ngram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 8
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "desc_ngram", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
And I have 2 objects here
{
"name": "Shana Calandra",
"username": "shacalandra",
},
{
"name": "Shana Launer",
"username": "shalauner",
},
And using this query
{
query: {
match: {
_all: "Shana"
}
}
}
When I search with this query, it returns me both documents, but I cant search by part of word here, for example I cant use "Shan" instead of "Shana" in query because it doesnt return anything.
Maybe my mapping is wrong, I cant understand problem is on mapping or on query
If you specify
"mappings": {
"test": {
"_all": {
"index_analyzer": "index_ngram",
"search_analyzer": "search_ngram"
},
for your mapping of _all field then it will work. _all has its own analyzers and I suspect you used the analyzers just for name and username and not for _all.

Updating analyzer within ElasticSearch settings

I'm using Sense (Chrome plugin) and I've managed to setup an analyzer and it is working correctly. If I issue a GET (/media/_settings) on the settings the following is returned.
{
"media": {
"settings": {
"index": {
"creation_date": "1424971612982",
"analysis": {
"analyzer": {
"folding": {
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"uuid": "ks98Z6YCQzKj-ng0hU7U4w",
"version": {
"created": "1040499"
},
"number_of_replicas": "1"
}
}
}
}
I am trying to update it by doing the following:
Closing the index
Issuing this PUT command (removing a filter)
PUT /media/_settings
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
}
}
}
}
Opening the index
But when the settings come back, the filter is not removed. Can you not update an analyzer once you've created it?
Short answer: No.
Longer answer. From the ES docs:
"Although you can add new types to an index, or add new fields to a
type, you can’t add new analyzers or make changes to existing fields.
If you were to do so, the data that had already been indexed would be
incorrect and your searches would no longer work as expected."
Best way is to create a new index, and move your data. Some clients have helpers to do this for you, but it's not part of the standard Java client.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html

Resources