Rolling indices in Elasticsearch - elasticsearch

I see a lot of topics on how to create rolling indices in Elasticsearch using logstash.
But is there a way to achieve the same i.e create indices on daily basis in elasticsearch without logstash?
I came a cross a post which says to run cron job to create the indices as date rolls, but that is a manual job I have to do, I was looking for out of the box options if available in elasticsearch

Yes, use index templates (which is what Logstash uses internally to achieve the creation of rolling indices)
Simply create a template with a name pattern like this and then everytime you index a document in an index whose name matches that pattern, ES will create the index for you:
curl -XPUT localhost:9200/_template/my_template -d '{
"template" : "logstash-*",
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"my_type" : {
"properties": {
...
}
}
}
}'

Related

ElasticSearch - How to make a 1-to-1 copy of an existing index

I'm using Elasticsearch 2.3.3 and trying to make an exact copy of an existing index. (using the reindex plugin bundled with Elasticsearch installation)
The problem is that the data is copied but settings such as the mapping and the analyzer are left out.
What is the best way to make an exact copy of an existing index, including all of its settings?
My main goal is to create a copy, change the copy and only if all went well switch an alias to the copy. (Zero downtime backup and restore)
In my opinion, the best way to achieve this would be to leverage index templates. Index templates allow you to store a specification of your index, including settings (hence analyzers) and mappings. Then whenever you create a new index which matches your template, ES will create the index for you using the settings and mappings present in the template.
So, first create an index template called index_template with the template pattern myindex-*:
PUT /_template/index_template
{
"template": "myindex-*",
"settings": {
... your settings ...
},
"mappings": {
"type1": {
"properties": {
... your mapping ...
}
}
}
}
What will happen next is that whenever you want to index a new document in any index whose name matches myindex-*, ES will use this template (+settings and mappings) to create the new index.
So say your current index is called myindex-1 and you want to reindex it into a new index called myindex-2. You'd send a reindex query like this one
POST /_reindex
{
"source": {
"index": "myindex-1"
},
"dest": {
"index": "myindex-2"
}
}
myindex-2 doesn't exist yet, but it will be created in the process using the settings and mappings of index_template because the name myindex-2 matches the myindex-* pattern.
Simple as that.
The following seems to achieve exactly what I wanted:
Using Snapshot And Restore I was able to restore to a different index:
POST /_snapshot/index_backup/snapshot_1/_restore
{
"indices": "original_index",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "original_index",
"rename_replacement": "replica_index"
}
As far as I can currently tell, it has accomplished exactly what I needed.
A 1-to-1 copy of my original index.
I also suspect this operation has better performance than re-indexing for my purposes.
I'm facing the same issue when using the reindex API.
Basically I'm merging daily, weekly, monthly indices to reduce shards.
We have a lot of indices with different data inputs, and maintaining a template for all cases is not an option. Thus we use dynamic mapping.
Due to dynamic mapping the reindex process can produce conflicts if your data is complicated, say json stored in a string field, and the reindexed field can end up as something else.
Sollution:
Copy the mapping of your source index
Create a new index, applying the mapping
Disable dynamic mapping
Start the reindex process.
A script can be created, and should of course have error checking in place.
Abbreviated scripts below.
Create a new empty index with the mapping from an original index.:
#!/bin/bash
SRC=$1
DST=$2
# Create a temporary file for holding the SRC mapping
TMPF=$(mktemp)
# Extract the SRC mapping, use `jq` to get the first record
# write to TMPF
curl -f -s "${URL:?}/${SRC}/_mapping | jq -M -c 'first(.[])' > ${TMPF:?}
# Create the new index
curl -s -H 'Content-Type: application/json' -XPUT ${URL:?}/${DST} -d #${TMPF:?}
# Disable dynamic mapping
curl -s -H 'Content-Type: application/json' -XPUT \
${URL:?}/${DST}/_mapping -d '{ "dynamic": false }'
Start reindexing
curl -s -XPOST "${URL:?}" -H 'Content-Type: application/json' -d'
{
"conflicts": "proceed",
"source": {
"index": "'${SRC}'"
},
"dest": {
"index": "'${DST}'",
"op_type": "create"
}
}'

Query two indexes simultaneously in Kibana 4?

Whenever I create a visualization, Kibana 4 asks me to select the index for doing the search. My project requires searching data that is present in multiple indexes and hence I am stuck. I wish to search two indexes for my data and then visualize them. Any help would be valuable.
Kibana can create Visualization from multiple indexes. But! indexes should have similar names, or alias names with similar names, for example, you can simply grab data from indexes: logstash-2015-01-01 and logstash-2015-01-02 using mask logstash-*.
But yes it would be handy if we could write something like index1,onother_index.
A solution that works in any case: create an alias in Elasticsearch for the indexes you want to query simultaneously and then use the alias as an index-pattern in Kibana.
In the plugin Marvel, through the Sense interface, you can create an alias for multiple indexes by doing this request :
POST _aliases
{
"actions" : [
{ "add" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}
Or using CURL:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "add" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}'
Then, you just need to add an index-pattern in Kibana for "alias1" and create your visualizations.
For more informations on aliases, see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
Thanks for all the help, But I figured out a way in which this could be done.
In Index Pattern of Kibana 4 create an index Pattern as _all. This index pattern contains all the indexes present in your elasticsearch. Hence when you create a new visualization simply select the _all index pattern there and all the data fields from all the indexes in your elasticsearch are accessible and you can easily use it to create visualizations.
If I understand what you are asking correctly, then it may depend on how you've named your indexes.
I can query multiple logstash indexes, by selecting my pattern 'logstash-*'. When you setup your indexes it gives you the option to specify a pattern.
(Settings => Indices => Index Pattern => Add New)
I hope that helps.
Two wildcards (i.e. *-*) works for me in Kibana 4.
I'm not sure i understand correctly, but I think your best option is to create that visualization on both indexes you want separately, and build a dashboard including both the visualizations.
Kibana can't display a single visualization with searches from two separate indexes.

How to view the response for multiple indices for a single query

I have created multiple indices in elasticsearch and have passed a single query to all of them. Is there any way to know,how many results came from each index?
Here is the screenshot of my elasticsearch head,showing a single aggregation applied to two indices
screenshot:
Here as in the figure you can see I have done an aggregation named "posted_time" on the indices foodfind and comics (red box 1).
But in the response window,to the right,only the results for the index "comics" is shown. How can I see the results for the other index too?
You can use terms aggregation on the field _index for this.
Lets say you need to run the same on index-a , index-b and index-c.
You need to make the request in this pattern -
curl -XPOST 'http://localhost:9200/index-a,index-b,index-c/_search' -d '{
"aggs" : {
"indexStats" : {
"terms" : {
"field" : "_index"
}
}
}
}'

Why are Elasticsearch aliases not unique

The Elasticsearch documentation describes aliases as feature to reindex data with zero downtime:
Create a new index and index the whole data
Let your alias point to the new index
Delete the old index
This would be a great feature if aliases would be unique but it's possible that one alias points to multiple indexes. Considering that maybe the deletion of the old index fails my application might speak to two indexes which might not be in sync. Even worse: the application doesn't know about that.
Why is it possible to reuse an alias?
It allows you to easily have several indexes that are both used individually and together with other indexes. This is useful for example when having a logging index where sometimes you want to query the most recent (logs-recent alias) and sometimes want to query everything (logs alias). There are probably lots of other use cases but this one pops up as the first for me.
As per the documentation you can send both the remove and add in one request:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}'
After that succeeds you can remove your old index and if that fails you will just have an extra index taking up some space until its cleaned out.

Updating the default index number_of_replicas setting for new indices

I've tried updating the number of replicas as follows, according to the documentation
curl -XPUT 'localhost:9200/_settings' -d '
{ "index" : { "number_of_replicas" : 4 } }'
This correctly changes the replica count for existing nodes. However, when logstash creates a new index the following day, number_of_replicas is set to the old value.
Is there a way to permanently change the default value for this setting without updating all the elasticsearch.yml files in the cluster and restarting the services?
FWIW I've also tried
curl -XPUT 'localhost:9200/logstash-*/_settings' -d '
{ "index" : { "number_of_replicas" : 4 } }'
to no avail.
Yes, you can use index templates. Index templates are a great way to set default settings (including mappings) for new indices created in a cluster.
Index Templates
Index templates allow to define templates that will automatically be
applied to new indices created. The templates include both settings
and mappings, and a simple pattern template that controls if the
template will be applied to the index created.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html
For your example:
curl -XPUT 'localhost:9200/_template/logstash_template' -d '
{
"template" : "logstash-*",
"settings" : {"number_of_replicas" : 4 }
} '
This will set the default number of replicas to 4 for all new indexes that match the name "logstash-*". Note that this will not change existing indexes, only newly created ones.

Resources