How to add "_routing.path" without downtime & reindexing in Elastic 1.x? - elasticsearch

Elastic 1.x allows to define in mapping default path for extracting required routing field, e.g.:
{
"comment" : {
"_routing" : {
"required" : true,
"path" : "blog.post_id"
}
}
}
Is that possible to add that field on the fly, without a downtime?
So the mapping was previously defined as:
{
"comment" : {
"_routing" : {
"required" : true
}
}
}

The update will not work. Even if the command is acknowledged, the update will not be applied.
You need to reindex the documents, as well. If that path changes and the values are different this means that documents could have ended up in a different shard than in which they are now. So, assuming that the change would have been possible, you are basically changing the hash that the documents can be routed and also GETed (gotten) from shards and it will be a mess.

Related

Elasticsearch conflict while putting document to index

I want to create an index and modify its setting with template and at the same time create an alias for it
"template_1" : {
"order" : 0,
"index_patterns" : [
"test*"
],
"settings" : {
"index" : {
"number_of_shards" : "2",
"number_of_replicas" : "2"
}
},
"mappings" : { },
"aliases" : {
"some-alias" : { }
}
}
}
when I am trying to put a document using alias, it tries to create an index with the alias name. However I am looking for something which will search for the index which has this alias and throws an error that there are no index exist with this alias
The problem is you are referencing multiple indexes with a single alias, so when you PUT a document ES does not know in which document to store it to.
Quoting the doc:
If no write index is specified and there are multiple indices referenced by an alias, then writes will not be allowed.
One solution, as per quote above, is to specify a write index (see docs) as the default destination for new documents (its also possible to specify rollover rules to update it).
The other solution, of course, is use the actual index name when putting docs.

Changing type of property in index type's mapping

I have index mapping for type 'T1' as below:
"T1" : {
"properties" : {
"prop1" : {
"type" : "text"
}
}
}
And now I want to change the type of prop1 from text to keyword. I don't want to delete index. I have also read people suggesting to create another property with new type and replace it. But then I have to update old documents which I am not interested into. I tried to use PUT api as below but I never works.
PUT /indexName/T1/_mapping -d
{
"T1" : {
"properties" : {
"prop1" : {
"type" : "keyword"
}
}
}
}
Is there any way to achieve this?
Mapping cannot be modified, hence the PUT api you have used will not work. The new index will have to be created with the updated mapping to be used and reindexing all the data to new index.
To prevent downtime you can always use alias:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
A mapping cannot be updated once it is persisted. The only option is to create a new index with the correct mappings and reindex your data using the reindex API provided by ES.
You can read about the reindex API here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/docs-reindex.html

ElasticSearch Filtered Aliases Creation - Best Practice

We are planning to use Filtered Aliases as mentioned here - https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
Our input data is going to be a stream with each line of the stream corresponding to an object we would like to store in ES.
Each object contains an 'id', which we are using for routing and filtering.
QUESTION -
How do we create alias and index data in a performant way ?
-- Do we index all data, keep track of all the unique 'id's and the very end create the filtered alias ? OR
-- For each object, check if an alias for that 'id' exists; if it doesn't create one ?
I'm leaning towards the first approach. Is it advisable and performant when compared to the second approach ?
TIA.
Based on our discussion above and after having glanced over the blog article you posted, I'm pretty positive that in your case you don't need aliases at all and the routing key would suffice. Again, only because you have a single index, if you had many indices this would not be true anymore!
You simply need to specify the routing key to use when indexing your document. Until ES 2.0, you can use the _routing field for that purpose, even though it's been deprecated in ES 1.5, but in your case it serves your purpose.
{
"customer" : {
"_routing" : {
"required" : true,
"path" : "customer_id" <----- the field you use as the routing key
},
"properties": { ... }
}
}
Then when searching you simply need to specify &routing=<customer_id> in your search URL in addition to your customer id filter (since a given shard can host documents for different customers). Your search will go directly to the shard identified by the given routing key, and thus, only retrieve data from the specified customer.
Using a filtered alias for this brings nothing as the filter and routing key you'd include in your alias definition would not contribute anything additional, since the retrieved documents are already "filtered" (kind of) by the routing key. This is way easier than trying to detect (on each new document to index) if an alias exists or not and create it if it doesn't.
UPDATE:
Now if you absolutely have/want to create filtered aliases, the more performant way would be the first one you mentioned:
First index your daily data
Then run a terms aggregation on your customer_id field with size high enough (i.e. higher than the cardinality of the field, which was ~100 in your case) to make sure you capture all unique customer ids to create your aliases
Loop over all the buckets to retrieve all unique customer ids
Create all aliases in one shot using one action for each customer_id
curl -XPOST 'http://localhost:9200/_aliases' -d '{
"actions" : [
{
"add" : {
"index" : "customers",
"alias" : "alias_cid1",
"routing" : "cid1",
"filter" : { "term" : { "customer_id" : "cid1" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid2",
"routing" : "cid2",
"filter" : { "term" : { "customer_id" : "cid2" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid3",
"routing" : "cid3",
"filter" : { "term" : { "customer_id" : "cid3" } }
}
},
...
]
}'
Note that you don't have to worry if an alias already exists, the whole command won't fail and silently ignore the existing alias.
When this command has run, you'll have all your aliases on your unique index, properly configured with a filter and a routing key.

Elasticsearch querying alias with routing giving partial results

In an effort to create multi-tenant architecture for my project.
I've created an elasticsearch cluster with an index 'tenant'
"tenant" : {
"some_type" : {
"_routing" : {
"required" : true,
"path" : "tenantId"
},
Now,
I've also created some aliases -
"tenant" : {
"aliases" : {
"tenant_1" : {
"index_routing" : "1",
"search_routing" : "1"
},
"tenant_2" : {
"index_routing" : "2",
"search_routing" : "2"
},
"tenant_3" : {
"index_routing" : "3",
"search_routing" : "3"
},
"tenant_4" : {
"index_routing" : "4",
"search_routing" : "4"
}
I've added some data with tenantId = 2
After all that, I tried to query 'tenant_2' but I only got partial results, while querying 'tenant' index directly returns with the full results.
Why's that?
I was sure that routing is supposed to query all the shards that documents with tenantId = 2 resides on.
When you have created aliases in elasticsearch, you have to do all operations using aliases only. Be it indexing, update or search.
Try reindexing the data again and check if possible (If it is a test index, I hope so).
Remove all the indices.
curl -XDELETE 'localhost:9200/' # Warning:!! Dont use this in production.
Use this command only if it is test index.
Create the index again. Create alias again. Do all the indexing, search and delete operations on alias name. Even the import of data should also be done via alias name.

Attaching a TTL field with every log sent via logstash to Elasticsearch

Summary: I want to attach a TTL field with the logs in logstash and send them over to the Elastic search.
I have already gone through the documentation but could not get much of it, since it is not very clear.
This is my config file in logstash.
input {
stdin {
type => "stdin-type"
}
}
output {
stdout { debug => true debug_format => "json"}
elasticsearch {}
}
Now suppose that for each log that is read, I want to attach a TTL with it for say, 5 days.
I know how to activate the TTL option in elastic search. But What changes will I have to make in the elastic search configuration files is not very clear to me.
The documentation asks to look for the mappings folder, but there is none in the elastic search download folder.
Looking for an expert help.
Have a look here if you want to put the mapping on file system. You have to go to the config folder and create here a folder called mappings, and another one with the name of the index within mappings. Since logstash creates by default an index per day, you'd better use the _default name for the folder, so that the mapping will be applied to all indexes.
The file that you create under that folder must have the name of the type you want to apply the mapping to. I don't remember exactly what type logstash uses, thus I would use the _default_ mapping definition. Just call the file _default_.json and put the following content in it:
{
"_default_" : {
"_ttl" : { "enabled" : true }
}
}
As you can see the name of the type must appear in both the filename and in its content.
Otherwise, you could avoid putting stuff on file system. You could create an index template containing your custom mapping, like the following:
{
"template" : "logstash-*",
"mappings" : {
"_default_" : {
"_ttl" : { "enabled" : true }
}
}
}
The mapping will then be applied to all the indices whose name matches the template pattern. If you use the _default_ mapping definition the mapping will be applied as default to all the types that are going to be created.

Resources