Change settings and mappings on existing index in Elasticsearch - elasticsearch

I would like the following settings and mapping set on an already existing index in Elasticsearch:
{
"analysis": {
"analyzer": {
"dot-analyzer": {
"type": "custom",
"tokenizer": "dot-tokenizer"
}
},
"tokenizer": {
"dot-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
}
{
"doc": {
"properties": {
"location": {
"type": "string",
"index_analyzer": "dot-analyzer",
"search_analyzer": "keyword"
}
}
}
}
I have tried to add these two lines of code:
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
But this is the result:
org.elasticsearch.index.mapper.MapperParsingException: Analyzer [dot-analyzer] not found for field [location]
Anyone? Thanks a lot,
Stine
This seems to work:
if (client.admin().indices().prepareExists(Index).execute().actionGet().exists()) {
client.admin().indices().prepareClose(Index).execute().actionGet();
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings.string()).execute().actionGet();
client.admin().indices().prepareOpen(Index).execute().actionGet();
client.admin().indices().prepareDeleteMapping(Index).setType(Type).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
} else {
client.admin().indices().prepareCreate(Index).addMapping(Type, mapping).setSettings(settings).execute().actionGet();
}

If you look at your settings after sending the changes you'll notice that the analyzer is not there. In fact you can't change the analysis section of the settings on a live index. Better to create it with the desired settings, otherwise you can just close it:
curl -XPOST localhost:9200/index_name/_close
While the index is closed you can send the new settings. After that you can reopen the index:
curl -XPOST localhost:9200/index_name/_open
While the index is closed it doesn't use any cluster resource, but it is not readable nor writable. If you want to close and reopen the index using the Java API you can use the following code:
client.admin().indices().prepareClose(indexName).execute().actionGet();
//TODO update settings
client.admin().indices().prepareOpen(indexName).execute().actionGet();

Related

Elasticsearch Text with Path Hierarchy vs KeyWord using Prefix query performance

I'm trying to achieve the best way to filter results based on folder hierarchies. We will use this to simulate a situation where we want to get all assets/documents in provided folder and all subfolders (recursive search).
So for example for such a structure
/someFolder/someSubfolder/1
/someFolder/someSubfolder/1/subFolder
/someFolder/someSubfolder/2
/someFolder/someSubfolder/2/subFolder
If we search for /someFolder/someSubfolder/1
We want to get as results
/someFolder/someSubfolder/1
/someFolder/someSubfolder/1/subFolder
Now I've found two ways to do this. Not sure which one would be better from performance perspective.
Use Text property with path_hierarchy Tokenizer
Use Keyword property and use Query prefix to get results
Both of the above seem to work as I want them to (unless I missed something). Not sure which one would be better. On one hand I've read that filtering should be done on Keywords. On the other hand path_hierarchy Tokenizer seems to be created exactly for these scenarios but we can only use it with Text field.
Below I prepared a sample code.
Create index and push some test data into it.
PUT test-index-2
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy"
}
}
}
},
"mappings": {
"properties": {
"folderPath": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
POST test-index-2/_doc/
{
"folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces"
}
POST test-index-2/_doc/
{
"folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces/SomeTestValue/11"
}
Now both of below queries will return two results for matching partial path hierarchy.
1.
GET test-index-2/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "folderPath": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces" }}
]
}
}
}
GET test-index-2/_search
{
"query": {
"prefix" : { "folderPath.keyword": "8bf5ad7949a1_104d753b-0fdf-4b07-9213-534dec89112a/Folder with Spaces" }
}
}
Now the question would be: Which solution is better if we want to get a subset of results ?

Mapping definition for [fields] has unsupported parameters: [analyzer : case_sensitive]

In my search engine, users can select to search case-sensitively or not. If they choose to, the query will search on fields which use a custom case-sensitive analyser. This is my setup:
GET /candidates/_settings
{
"candidates": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "candidates",
"creation_date": "1528210812046",
"analysis": {
"analyzer": {
"case_sensitive": {
"filter": [
"stop",
"porter_stem"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
...
}
}
}
}
So I have created a custom analyser called case_sensitive taken from this answer. I am trying to define my mapping as follows:
PUT /candidates/_mapping/candidate
{
"properties": {
"first_name": {
"type": "text",
"fields": {
"case": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
So, when querying, for a case-sensitive match, I can do:
simple_query_string: {
query: **text to search**,
fields: [
"first_name.case"
]
}
I am not even getting to the last step as I am getting the error described in the title when I am trying to define the mapping. The full stack trace is in the image below:
I initially thought that my error was similar to this one but I think that issue is only related to using the keyword tokenizer and not the standard one
In this mapping definition, I was actually trying to adjust the mapping for several different fields and not just first_name. One of these fields has the type long and that is the mapping definition that was throwing the error. When I remove that from the mapping definition, it works as expected. However, I am unsure as to why this fails for this data type?

How to ingest an elastic search database definition file

I got a file with some definitions for an elastic search database in the following format:
PUT /drafts
{
"settings": {
"max_result_window" : "100000"
}
}
PUT /drafts/draft/_mapping
{
"draft":{
"properties":{
"id":{
"type":"keyword"
},
"analysis_id":{
"type":"keyword"
}
}
}
}
PUT /variants
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase"]
}
}
},
"max_result_window" : "100000"
}
}
How can I ingest that into my elastic search server in bulk?
I tried the _bulk api call but that does not seem to work
curl localhost:9200/_bulk -d #file
I have seen this format in the elastic search tutorials but it never states how to run those files...Is it even possible
Bulk API is for indexing/ updating / deleting . Operations mentioned above are either altering mapping or adding settings to indexing. This is not possible with bulk API.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Elasticsearch updating the analyzer creates a members field

I came across a problem where I needed to update the stopwords on an index, which was specifying the english analyzer as the default analyzer. Typically, the analyzers are specified in the settings for the index:
{
"twitter": {
"settings": {
"index": {
"creation_date": "1469465586110",
"analysis": {
"filter": {
"lowercaseFilter": {
"type": "lowercase"
}
},
"analyzer": {
"default": {
"type": "english"
},
...
So, the analyzers are located at <index name>.settings.index.analysis.analyzer
To update the analyzer, I ran these commands:
curl -XPOST "http://localhost:9200/twitter/_close" && \
curl -XPUT "http://localhost:9200/twitter/_settings" -d'
{
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
}
}
}
}' && \
curl -XPOST "http://localhost:9200/twitter/_open"
After running those commands, I verified that the default analyzer was analyzing text, and keeping all stopwords.
However, when I use the Jest client, now the settings look like this, and the analysis isn't happening properly (note how the analysis settings are under the "members" property now):
{
"twitter": {
"settings": {
"index": {
"members": {
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
},
I've stepped through the code and everything looks in order:
I figured it out. So by running:
sudo tcpflow -p -c -i lo0 port 9200 2>/dev/null | grep -oE '.*(GET|POST|PUT|DELETE) .*_dev.*' -A30
I could see that the JsonObject I was sending was including the members field, which is where Gson's JsonObject stores the objects inside itself. Since I was passing this raw object into Jest's UpdateSettings builder, it was being serialized in a way I didn't expect (including the members field), and being sent to elasticsearch that way. I solved the problem by calling the JsonObject's toString() method and passing that to the UpdateSettings Builder

I can not change the analyzer and mapping at already existing index

I would like change settings and mapping on an already existing index in elasticsearch. However, I get the error.
curl -XPOST localhost:9200/myindex/_close
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/_settings -d '{
"index": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
}'
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
"properties": {
"myfield": {
"type": "string",
"search_analyzer": "custom_analyzer",
"index_analyzer": "custom_analyzer"
}
}
}'
{"error":"MergeMappingException[Merge failed with failures {[mapper [myfield] has different index_analyzer]}]","status":400}
What am I doing wrong?
All indices are stored immutable so you have to reindex you data.
zero downtime ver. 1
To reindex you can follow this steps:
create index with wanted settings/mappings
pull you data from old to new index with the _bulk API
zero downtime ver. 2
Or you can create a new index with wanted settings/mappings and create a alias (with the old index name) to the newly created index.
downtime needed
The last way to do this is to close your index, make your changes and reopen your index again.

Resources