I can not change the analyzer and mapping at already existing index - elasticsearch

I would like change settings and mapping on an already existing index in elasticsearch. However, I get the error.
curl -XPOST localhost:9200/myindex/_close
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/_settings -d '{
"index": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
}'
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
"properties": {
"myfield": {
"type": "string",
"search_analyzer": "custom_analyzer",
"index_analyzer": "custom_analyzer"
}
}
}'
{"error":"MergeMappingException[Merge failed with failures {[mapper [myfield] has different index_analyzer]}]","status":400}
What am I doing wrong?

All indices are stored immutable so you have to reindex you data.
zero downtime ver. 1
To reindex you can follow this steps:
create index with wanted settings/mappings
pull you data from old to new index with the _bulk API
zero downtime ver. 2
Or you can create a new index with wanted settings/mappings and create a alias (with the old index name) to the newly created index.
downtime needed
The last way to do this is to close your index, make your changes and reopen your index again.

Related

Elasticsearch updating the analyzer creates a members field

I came across a problem where I needed to update the stopwords on an index, which was specifying the english analyzer as the default analyzer. Typically, the analyzers are specified in the settings for the index:
{
"twitter": {
"settings": {
"index": {
"creation_date": "1469465586110",
"analysis": {
"filter": {
"lowercaseFilter": {
"type": "lowercase"
}
},
"analyzer": {
"default": {
"type": "english"
},
...
So, the analyzers are located at <index name>.settings.index.analysis.analyzer
To update the analyzer, I ran these commands:
curl -XPOST "http://localhost:9200/twitter/_close" && \
curl -XPUT "http://localhost:9200/twitter/_settings" -d'
{
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
}
}
}
}' && \
curl -XPOST "http://localhost:9200/twitter/_open"
After running those commands, I verified that the default analyzer was analyzing text, and keeping all stopwords.
However, when I use the Jest client, now the settings look like this, and the analysis isn't happening properly (note how the analysis settings are under the "members" property now):
{
"twitter": {
"settings": {
"index": {
"members": {
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
},
I've stepped through the code and everything looks in order:
I figured it out. So by running:
sudo tcpflow -p -c -i lo0 port 9200 2>/dev/null | grep -oE '.*(GET|POST|PUT|DELETE) .*_dev.*' -A30
I could see that the JsonObject I was sending was including the members field, which is where Gson's JsonObject stores the objects inside itself. Since I was passing this raw object into Jest's UpdateSettings builder, it was being serialized in a way I didn't expect (including the members field), and being sent to elasticsearch that way. I solved the problem by calling the JsonObject's toString() method and passing that to the UpdateSettings Builder

Adding ngram to existing index

Is there a way to add ngram matching to existing index? I saw plenty examples in documentation how to create index which will search through using ngrams, but when I try to follow those instructions I get error:
{"error":"IndexAlreadyExistsException[[nameOfIndex] already exists]","status":400}
Example curl which I'm using:
curl -XPUT elasticUrl/nameOfIndex -d '{
"settings": {
"number_of_shards": 1
},
"mappings": {
"title": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes"
}
}
}
}
}'
Try this:
First close the index, then apply updates and then open the index
POST /blog/_close
// apply index settings updates
POST /blog/_open
Source: link

How to dynamically update the synonym filter data base

Is it possible to dynamically update my synonym filter data base in Elasticsearch. As Im doing a synonym training process with my app, is it possible to add new words to the synonym words data base dynamically in elasticsearch?
curl -XPOST 'localhost:9200/my-index/_close'
echo
echo updating new Synynom database
curl -XPUT 'localhost:9200/my-index/_settings' -d #analyzer.json
echo
echo Opening index
curl -XPOST 'localhost:9200/my-index/_open'
analyzer.json file looks like below -
{
"analysis": {
"analyzer": {
"his-synonym": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my-synonym"
]
}
},
"filter": {
"my-synonym": {
"type": "synonym",
"synonyms": [
"big,large",
"run,jogg"
]
}
}
}
}
If your synonyms are kept in a file, then you need to update the file, close the index and then re-open it and it should pick up the new synonyms automatically.

Change settings and mappings on existing index in Elasticsearch

I would like the following settings and mapping set on an already existing index in Elasticsearch:
{
"analysis": {
"analyzer": {
"dot-analyzer": {
"type": "custom",
"tokenizer": "dot-tokenizer"
}
},
"tokenizer": {
"dot-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
}
{
"doc": {
"properties": {
"location": {
"type": "string",
"index_analyzer": "dot-analyzer",
"search_analyzer": "keyword"
}
}
}
}
I have tried to add these two lines of code:
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
But this is the result:
org.elasticsearch.index.mapper.MapperParsingException: Analyzer [dot-analyzer] not found for field [location]
Anyone? Thanks a lot,
Stine
This seems to work:
if (client.admin().indices().prepareExists(Index).execute().actionGet().exists()) {
client.admin().indices().prepareClose(Index).execute().actionGet();
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings.string()).execute().actionGet();
client.admin().indices().prepareOpen(Index).execute().actionGet();
client.admin().indices().prepareDeleteMapping(Index).setType(Type).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
} else {
client.admin().indices().prepareCreate(Index).addMapping(Type, mapping).setSettings(settings).execute().actionGet();
}
If you look at your settings after sending the changes you'll notice that the analyzer is not there. In fact you can't change the analysis section of the settings on a live index. Better to create it with the desired settings, otherwise you can just close it:
curl -XPOST localhost:9200/index_name/_close
While the index is closed you can send the new settings. After that you can reopen the index:
curl -XPOST localhost:9200/index_name/_open
While the index is closed it doesn't use any cluster resource, but it is not readable nor writable. If you want to close and reopen the index using the Java API you can use the following code:
client.admin().indices().prepareClose(indexName).execute().actionGet();
//TODO update settings
client.admin().indices().prepareOpen(indexName).execute().actionGet();

Why Elasticsearch "not_analyzed" field is split into terms?

I have the following field in my mapping definition:
...
"my_field": {
"type": "string",
"index":"not_analyzed"
}
...
When I index a document with value of my_field = 'test-some-another' that value is split into 3 terms: test, some, another.
What am I doing wrong?
I created the following index:
curl -XPUT localhost:9200/my_index -d '{
"index": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then I index the following document:
curl -XPOST localhost:9200/my_index/my_type -d '{
"my_field": "test-some-another"
}'
Then I use the plugin https://github.com/jprante/elasticsearch-index-termlist with the following API:
curl -XGET localhost:9200/my_index/_termlist
That gives me the following response:
{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}
Verify that mapping is actually getting set by running:
curl localhost:9200/my_index/_mapping?pretty=true
The command that creates the index seems to be incorrect. It shouldn't contain "index" : { as a root element. Try this:
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
In ElasticSearch a field is indexed when it goes within the inverted index, the data structure that lucene uses to provide its great and fast full text search capabilities. If you want to search on a field, you do have to index it. When you index a field you can decide whether you want to index it as it is, or you want to analyze it, which means deciding a tokenizer to apply to it, which will generate a list of tokens (words) and a list of token filters that can modify the generated tokens (even add or delete some). The way you index a field affects how you can search on it. If you index a field but don't analyze it, and its text is composed of multiple words, you'll be able to find that document only searching for that exact specific text, whitespaces included.
You can have fields that you only want to search on, and never show: indexed and not stored (default in lucene).
You can have fields that you want to search on and also retrieve: indexed and stored.
You can have fields that you don't want to search on, but you do want to retrieve to show them.

Resources