How do index aliases work in Elasticsearch? - elasticsearch

I'm wondering how exactly do index aliases work behind the scenes in Elasticsearch?
Does an alias have a separate copy of the data for each index it is linked to? Or is it only aware of the index names, and the not the data within each index?
If this is the case, are aggregations much slower when performed on an alias with many linked indices?

From the Index Aliases Elasticsearch reference:
APIs in elasticsearch accept an index name when working against a specific index, and several indices when applicable. The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices. An alias can also be associated with a filter that will automatically be applied when searching, and routing values.
So based on this it is only aware of the index names and not the data within each index. An aggregation could be slower when performed against an alias that spans multiple indexes. Because as far as I know in order to perform the aggregation action, Elasticsearch must collect the dataset to perform the aggregation function(s) against.

Related

Prevent data duplication over multiple indices in Elasticsearch

Data duplication prevention is handled at the index level with the field "_id".
However, to avoid having huge indices, I work with several small indices linked under an alias. Is there a mechanism in place to check existing _ids at the alias level (over multiple indices) when a document is inserted or should it be handled at the application level ?
indices architecture
not natively, no. you'd need to handle this in your own code
Before inserting your document, you need to first find out which real index contains your document via the alias using
GET alias/_search?q=_id:123456&filter_path=hits.hits._index
In the response you'll get the concrete index name that you can then use to index/update your new document version.

Updating existing documents in ElasticSearch (ES) while using rollover API

I have a data source which will create a high number of entries that I'm planning to store in ElasticSearch.
The source creates two entries for the same document in ElasticSearch:
the 'init' part which records init-time and other details under a random key in ES
the 'finish' part which contains the main data, and updates the initially created document (merges) in ES under the init's random key.
I will need to use time-based indexes in ElasticSearch, with an alias pointing to the actual index,
using the rollover index.
For updates I'll use the update API to merge init and finish.
Question: If the init document with the random key is not in the current index (but in an older one already rolled over) would updating it using it's key
successfully execute? If not, what is the best practice to perform the update?
After some quietness I've set out to test it.
Short answer: After the index is rolled over under an alias, an update operation using the alias refers to the new index only, so it will create the document in the new index, resulting in two separate documents.
One way of solving it is to perform a search in the last 2 (or more if needed) indexes and figure out which non-alias index name to use for the update.
Other solution which I prefer is to avoid using the rollover, but calculate index name from the required date field of our document, and create new index from the application, using template to define mapping. This way event sourcing and replaying the documents in order will yield the same indexes.

Get multiple ElasticSearch indices in Bosun

We have several different indices in ElasticSearch:
myindex1.messages.ttl60-${date:format=yyyy.MM.dd}
myindex2.messages.ttl60-${date:format=yyyy.MM.dd}
myindex3.messages.ttl60-${date:format=yyyy.MM.dd}
All of them shares the same scheme and is used to log events.
Now I want to create ONE alert in BOSUN for all listed indices, but I don't want to write their names explicitly.
Can I have some kind of pattern matching for indices just like we have in Kibana: *messages*?
I tried esindices expression, but it requires literal names of indices.
Maybe you can just combine all the indices with a template into an alias on elastic side.

Elastic search index capability?

We need a generic index where our properties are changing with every object. We need a fulltext search capability with a distributed system.
Could we index different objects in one generic index in elasticsearch?
Yes, elasticsearch allows to index different documents on the same index using a single dynamic mapping or, if you have only few different types of documents, you can have multiple mappings on the same index.

Does an ElasticSearch Alias change the underlying index files

If I create an alias in ElasticSearch over two or more indices, does that change something in the storage of those indices? Ie. are they merged on the filesystem somewhere or is the merging done on the API level?
There is no merge of any kind. An alias is just a logical representation (from the doc):
The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices.
When you query the API with an alias, ES will resolve the alias and run your query on every index which is part of the alias.

Resources