Remove index filter settings on an existing index with documents - elasticsearch

I've created an index on my elasticsearch server with the following settings:
PUT /myindex
{
"settings": {
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
After adding a lot of documents, I've updated my index settings using the following request:
PUT /myindex/_settings
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "asciifolding" ]
}
}
}
}
}
and removed the index lowercase filter, but it seems that all my documents on that index are still indexed with lowercase filtering. Should I have to reindex all my documents (sigh) or is there any way to tell elasticsearch to update all documents considering my new filter settings?

You need to reindex, basically underlying lucene index segment is immutable. If you have fresh ES version this API will help you: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html otherwise you have to use search&scroll or just refetch the data from the original source

Related

How to use a custom analyser on specific elasticsearch documents

Suppose I have a custom analyser that I want to use on only specific documents that have the table entity_type, how would I go about that?
Document I want to match:
{
... other keys
"_source": {
"entity_type": "table" // <-- I want to match this and use the custom analyser on this entire document
}
}
Custom analyser (currently just set to the default but I want it to only affect tables)
elasticsearch.indices.create(
index="myIndex",
body={
"settings": {
"analysis": {
"char_filter": {
"underscore_to_dash": {
"type": "mapping",
"mappings": ["_ => -"],
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"],
"char_filter": ["underscore_to_dash"],
}
},
},
}
})
Analyzer can only be applied to a specific field. So in your case it might make sense to have 2 fields and use one for "entity_type": "table" and another for other docs.

Search for parts of a string in _id field in an existing elasticSearch index

Hie,
I am working with an existing Elastic Search index, trying to search for a string in the _id field.
The _id in this index consists of two concatinated strings, and I need to be able to search for the second part of that string.
After reading documentation I have found out that I probably should use ngram to search for a substring, but I can't make this work properly.
I have found an example online from someone who was trying to do the same, så I updated my index with the following:
PUT /"myIndex"
{"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"partial_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"partial": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"partial_filter"
]
}
}
}
}}
And then tried to add this:
PUT /"index"/_mapping/type2
{
"type2": {
"properties": {
"_id": {
"type": "string",
"analyzer": "partial"
}
}
}
}
That gives me an exception: "Rejecting mapping update to [bci_report_provider_s_dev-"myIndex"] as the final mapping would have more than 1 type: [type2, bci-report]"
How can I resolve this, and is there another way to be able to de a partial search on the _id field?
Thanks a lot in advance!
Bjørn Olav Berg

title.folded not working in elasticsearch

I am new to elastic-search . I have created an index and have used the following analyser
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
while creating the index . The problem is when i use
title.folded = "string to be searched" i am unable to get the results for some data which is present in the index and if i don't use i get the results but then the accent does not work . What could be the problem ?
you only configured that analyzer in the mapping, but you also have to configure this analyzer for certain fields. See the example in the analyzer docs.

Updating analyzer within ElasticSearch settings

I'm using Sense (Chrome plugin) and I've managed to setup an analyzer and it is working correctly. If I issue a GET (/media/_settings) on the settings the following is returned.
{
"media": {
"settings": {
"index": {
"creation_date": "1424971612982",
"analysis": {
"analyzer": {
"folding": {
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"uuid": "ks98Z6YCQzKj-ng0hU7U4w",
"version": {
"created": "1040499"
},
"number_of_replicas": "1"
}
}
}
}
I am trying to update it by doing the following:
Closing the index
Issuing this PUT command (removing a filter)
PUT /media/_settings
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
}
}
}
}
Opening the index
But when the settings come back, the filter is not removed. Can you not update an analyzer once you've created it?
Short answer: No.
Longer answer. From the ES docs:
"Although you can add new types to an index, or add new fields to a
type, you can’t add new analyzers or make changes to existing fields.
If you were to do so, the data that had already been indexed would be
incorrect and your searches would no longer work as expected."
Best way is to create a new index, and move your data. Some clients have helpers to do this for you, but it's not part of the standard Java client.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html

Elasticsearch index tokenizer keyword not working

I have a indices with fields like:
"Id":{"type":"string","analyzer":"string_lowercase"} // guid for example
In elasticsearch.yml:
index:
analysis:
analyzer:
string_lowercase:
tokenizer: keyword
filter: lowercase
But filtering like this
{
"filter": {
"term": {
"Id": "2c4294c2-ca84-4f69-b648-8a014ff6e55d"
}
}
}
is not working for a whole guid value, only parts ("2c4294c2","ca84",..)
Interestingly, on other machine it work properly with same configuration.
You can't add a custom analyzer through elasticsearch.yml. There is a REST API for adding a custom analyzer. For your requirement, below is the required command:
PUT <index name>
{
"settings": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Resources