If I create an alias in ElasticSearch over two or more indices, does that change something in the storage of those indices? Ie. are they merged on the filesystem somewhere or is the merging done on the API level?
There is no merge of any kind. An alias is just a logical representation (from the doc):
The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices.
When you query the API with an alias, ES will resolve the alias and run your query on every index which is part of the alias.
Related
I have an ElasticSearch template that includes several indices and creates several aliases. When testing I've always created my template (and the underlying aliases) first and then the indexes. When I do this the aliases show up as expected. I'm not in a situation where the indexes already exist on a test environment and when I created my template there the aliases are not showing up.
Am I correct in assuming that the reason the aliases aren't showing is because the indexes already existed? If that is correct, is there a way to get the template to pick up the indices without deleting and re-creating the indices? Why is it that the indices need to be created after the template in order to be picked up?
I'm new to ElasticSerach so if the answer is obvious here I apologize. I looked through the documentation for templates, indexes and aliases but couldn't find an explanation for the behavior I was seeing.
Templates are only applied in the index creation.
If you create an index before you created the template that would match that index, this template won't be applied, the same things happens if you change something in your template like the number of shards or aliases.
From the documentation about index templates you have:
An index template is a way to tell Elasticsearch how to configure an index when it is created.
And
Templates are configured prior to index creation. When an index is created - either manually or through indexing a document - the template settings are used as a basis for creating the index.
If you define your alias in the template it will only be applied to index created after the creation of the template, if you want to set alias to existing indices you will need to do it manually using the alias API.
You can't apply templates to existing indices.
Is there a Recipe out there to Reindex all ElasticSearch Indices with Curator?
I'm seeing that it can Reindex a set of indices into one (Daily to Month use case), however I don't see anything that would suggest it could easily apply a new mapping file to every Elastic Index.
I'm taking a guess I'll need to write a wrapper script around Curator to grab index names and feed them into Curator.
I don't know if I got you right as you mentioned reindexing and mapping changes...
If you want to set/update a mapping in a collection of indices and if you know the indices to update by name (or pattern), you are able to apply the same mapping or a mapping change at once with https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html#_multi_index_2
For reindexing, there is no way to specify multiple source/target pairs at once but you can split one index into many. But as you sugessted, you can use subsequent calls to the reindex api.
BTW: The reindex api does not copy the settings nor mappings from the source into the destination index. You need to handle it by yourself, maybe using https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-templates.html
I have a data source which will create a high number of entries that I'm planning to store in ElasticSearch.
The source creates two entries for the same document in ElasticSearch:
the 'init' part which records init-time and other details under a random key in ES
the 'finish' part which contains the main data, and updates the initially created document (merges) in ES under the init's random key.
I will need to use time-based indexes in ElasticSearch, with an alias pointing to the actual index,
using the rollover index.
For updates I'll use the update API to merge init and finish.
Question: If the init document with the random key is not in the current index (but in an older one already rolled over) would updating it using it's key
successfully execute? If not, what is the best practice to perform the update?
After some quietness I've set out to test it.
Short answer: After the index is rolled over under an alias, an update operation using the alias refers to the new index only, so it will create the document in the new index, resulting in two separate documents.
One way of solving it is to perform a search in the last 2 (or more if needed) indexes and figure out which non-alias index name to use for the update.
Other solution which I prefer is to avoid using the rollover, but calculate index name from the required date field of our document, and create new index from the application, using template to define mapping. This way event sourcing and replaying the documents in order will yield the same indexes.
So we are using time frame indexes that are created automatically using an index template.
The ideal situation now would be to have an alias, let's call it 'current', that points to the last index created.
The question then is if there's any way we can do this through the index template. I can see that you can specify aliases, but I want also to remove the bindings between previous indexes and this 'current' alias.
You can use the Curator tool to do it for you
That tool has an alias command that you can use in conjunction with the indices subcommand in order to remove an alias on your old index.
curator alias --name current --remove indices oldindex
You can combine the above command with a cron that would kick in around the same time your index template gets into action to create the new time-based index.
I'm wondering how exactly do index aliases work behind the scenes in Elasticsearch?
Does an alias have a separate copy of the data for each index it is linked to? Or is it only aware of the index names, and the not the data within each index?
If this is the case, are aggregations much slower when performed on an alias with many linked indices?
From the Index Aliases Elasticsearch reference:
APIs in elasticsearch accept an index name when working against a specific index, and several indices when applicable. The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices. An alias can also be associated with a filter that will automatically be applied when searching, and routing values.
So based on this it is only aware of the index names and not the data within each index. An aggregation could be slower when performed against an alias that spans multiple indexes. Because as far as I know in order to perform the aggregation action, Elasticsearch must collect the dataset to perform the aggregation function(s) against.