For the app i created, the indices are generated once in a week. And the type and nature of the data is not varying and that implies, I need the same mapping type for these indices. Is it possible in elasticsearch to apply the same mapping to all the indices as they are created?. This could avoid me the overhead of defining mapping each time the index is created.
Definitely, you can use what is called an index template. Since your mapping type is stable, that's the perfect condition for using index templates.
It's as easy as creating an index. See below, whenever you want to index a document in an index whose name matches my_*, ES will select that template and create the index for you using the given mappings, settings and aliases:
curl -XPUT localhost:9200/_template/template_1 -d '{
"template" : "my_*",
"settings" : {
"number_of_shards" : 1
},
"aliases" : {
"my_alias" : {}
},
"mappings" : {
"my_type" : {
"properties" : {
"my_field": { "type": "string" }
}
}
}
}'
It's basically the technique used by Logstash when it needs to index new logs for each new day in a new daily index.
You can employ index template to address your problem. The official documentation can be found here.
A use case of how to apply the same with examples can be found in this blog
Related
How to implement ElasticSearch new index creation after every fix number of days and if its possible then how to search over all the previous indexes? Currently we have only one index which has all the data. I looked at the RollOver API of ES, is this the correct way? But the problem seems when we want to search for some data in previous indexes, how this can be done? Any answers are appreciated, Thanks.
Yes, you are on the correct path, for searching into your old indices, you can link multiple indices to one alias, using alias API, and instead of searching for a single index, you need to search again the unified alias.
Refer to this official example on how to link multiple indices to the same alias(alias1 in the below example)
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}
I want to create an index and modify its setting with template and at the same time create an alias for it
"template_1" : {
"order" : 0,
"index_patterns" : [
"test*"
],
"settings" : {
"index" : {
"number_of_shards" : "2",
"number_of_replicas" : "2"
}
},
"mappings" : { },
"aliases" : {
"some-alias" : { }
}
}
}
when I am trying to put a document using alias, it tries to create an index with the alias name. However I am looking for something which will search for the index which has this alias and throws an error that there are no index exist with this alias
The problem is you are referencing multiple indexes with a single alias, so when you PUT a document ES does not know in which document to store it to.
Quoting the doc:
If no write index is specified and there are multiple indices referenced by an alias, then writes will not be allowed.
One solution, as per quote above, is to specify a write index (see docs) as the default destination for new documents (its also possible to specify rollover rules to update it).
The other solution, of course, is use the actual index name when putting docs.
I have index mapping for type 'T1' as below:
"T1" : {
"properties" : {
"prop1" : {
"type" : "text"
}
}
}
And now I want to change the type of prop1 from text to keyword. I don't want to delete index. I have also read people suggesting to create another property with new type and replace it. But then I have to update old documents which I am not interested into. I tried to use PUT api as below but I never works.
PUT /indexName/T1/_mapping -d
{
"T1" : {
"properties" : {
"prop1" : {
"type" : "keyword"
}
}
}
}
Is there any way to achieve this?
Mapping cannot be modified, hence the PUT api you have used will not work. The new index will have to be created with the updated mapping to be used and reindexing all the data to new index.
To prevent downtime you can always use alias:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
A mapping cannot be updated once it is persisted. The only option is to create a new index with the correct mappings and reindex your data using the reindex API provided by ES.
You can read about the reindex API here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/docs-reindex.html
We are planning to use Filtered Aliases as mentioned here - https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
Our input data is going to be a stream with each line of the stream corresponding to an object we would like to store in ES.
Each object contains an 'id', which we are using for routing and filtering.
QUESTION -
How do we create alias and index data in a performant way ?
-- Do we index all data, keep track of all the unique 'id's and the very end create the filtered alias ? OR
-- For each object, check if an alias for that 'id' exists; if it doesn't create one ?
I'm leaning towards the first approach. Is it advisable and performant when compared to the second approach ?
TIA.
Based on our discussion above and after having glanced over the blog article you posted, I'm pretty positive that in your case you don't need aliases at all and the routing key would suffice. Again, only because you have a single index, if you had many indices this would not be true anymore!
You simply need to specify the routing key to use when indexing your document. Until ES 2.0, you can use the _routing field for that purpose, even though it's been deprecated in ES 1.5, but in your case it serves your purpose.
{
"customer" : {
"_routing" : {
"required" : true,
"path" : "customer_id" <----- the field you use as the routing key
},
"properties": { ... }
}
}
Then when searching you simply need to specify &routing=<customer_id> in your search URL in addition to your customer id filter (since a given shard can host documents for different customers). Your search will go directly to the shard identified by the given routing key, and thus, only retrieve data from the specified customer.
Using a filtered alias for this brings nothing as the filter and routing key you'd include in your alias definition would not contribute anything additional, since the retrieved documents are already "filtered" (kind of) by the routing key. This is way easier than trying to detect (on each new document to index) if an alias exists or not and create it if it doesn't.
UPDATE:
Now if you absolutely have/want to create filtered aliases, the more performant way would be the first one you mentioned:
First index your daily data
Then run a terms aggregation on your customer_id field with size high enough (i.e. higher than the cardinality of the field, which was ~100 in your case) to make sure you capture all unique customer ids to create your aliases
Loop over all the buckets to retrieve all unique customer ids
Create all aliases in one shot using one action for each customer_id
curl -XPOST 'http://localhost:9200/_aliases' -d '{
"actions" : [
{
"add" : {
"index" : "customers",
"alias" : "alias_cid1",
"routing" : "cid1",
"filter" : { "term" : { "customer_id" : "cid1" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid2",
"routing" : "cid2",
"filter" : { "term" : { "customer_id" : "cid2" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid3",
"routing" : "cid3",
"filter" : { "term" : { "customer_id" : "cid3" } }
}
},
...
]
}'
Note that you don't have to worry if an alias already exists, the whole command won't fail and silently ignore the existing alias.
When this command has run, you'll have all your aliases on your unique index, properly configured with a filter and a routing key.
Take the simplest case of indexing the following document in elasticsearch
{
"name": "Mark",
"age": 28
}
With automatic mapping the mapping for this index would now look like
"properties" : {
"doc" : {
"properties" : {
"age" : { "type" : "long"},
"name" : { "type" : "string"
}
}
},
But say I then wanted to allow the case where this document should be indexed
{
"name": "Bill",
"age": "seven"
}
If I try this the mapping does not update and elasticsearch throws an error since there is a conflict with the type of the age property.
Is there any way to do this so both docs could be automatically indexed and consequently queryable?
Mappings are defined per type so what you could do is having two types in your index:
numeric
alphabetical
And split the documents according to the value in the age field. If you run a query you can query both types.
you can add new fields and update a mapping. But you cannot update a mapping.To do that you need to drop the index and create a new mapping and index the data..!
For more info refer this link reference
You can't change existing mapping.You can only add new field in it.
Or you have to delete old mapping & create a new mapping for that particular index.