How to create rolling index with date as the index name? - elasticsearch

My elastic search index will ingest thousands of documents per second. The service which puts documents in the index doesn't creates a new index, instead it just gets current data in nodejs and indexes docs in "log-YYYY.MM.DD". So, we know index is created automatically if not present.
Now, my question is can this practice of creating index and putting docs at the same time cause performance issues or failures given that index wil be ingesting thousands of docs per second ?
If the answer to above question is yes, how can I create a rolling index whitch date as the index name? Say today is 5 May, 2021, so I want automatic creation of index for 6 May, 2021 in the format log-2021.05.06.

For your first question, may be this can help
how many indices?
For the second question I think you can use index-alias
like
PUT /_index_template/logdate_template
{
"index_patterns": [
"log*"
],
"priority": 1,
"template": {
"aliases": {
"log":{}
},
"mappings": {
//your mappings
}
}
}
}
As here index_pattern is "log*",
in your application code you can have a job, which creates index everyday by generating date in required format and calling
PUT log-YYYY.MM.DD
The advantage of index-alias is: You can access all those indices with "log" only.

Related

Reindexing more than 10k documents in Elasticsearch

Let's say I have an index- A. It contains 26k documents. Now I want to change a field status with type as Keyword. As I can't change A's status field type which is already existing, I will create a new index: B with my setting my desired type.
I followed reindex API:
POST _reindex
{
"source": {
"index": "A",
"size": 10000
},
"dest": {
"index": "B",
"version_type": "external"
}
}.
But the problem is, here I can migrate only 10k docs. How to copy the rest?
How can I copy all the docs without losing any?
delete the size: 10000 and problem will be solved.
by the way the size field in Reindex API means that what batch size elasticsearch should use to fetch and reindex docs every time. by default the batch size is 100. (you thought it means how many document you want to reindex)

Moving data from oine Elasticsearch index to another with higher number of shards or increasing shard number in existing index

I am new to Elasticsearch and I have been reading documentation in order to find a way of increasing amount of shards that my index consists of. Currently my index looks like this:
country_data 0 p STARTED 227 100.7kb 192.168.0.115 $HOSTNAME
country_data 0 r STARTED 227 100.7kb 192.168.0.116 $HOSTNAME
I wanted to increase the number of shard to 5 however I was unable to find a proper way of doing it. I learnt from another Stackoverflow question that I should be able to do it like this:
POST _reindex?slices=5
{
"source": {
"index": "country_data"
},
"dest": {
"index": "country_data_new"
}
}
However when I did that I got a copy of my country_data with same amount of shards and replicas (1 and 1). I tried to learn more about it in documentation but all I found is this: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/option_slices.html
I couldn't find anything in documentation about increasing number of shards in existing index or how can I move data to new index which would have more shards. I would be grateful for any insights into this problem or at least a website where could I learn how to do it.
This can be done in any of the below mentioned way.
1st Option : You can use the elastic search Split Index API.
I suggest you to please go through the documentation once before proceeding with this method.
2nd Option : Create a new index with same mappings and give the required settings for new shards. Then use the reindex API to copy data from source index to destination index
To create the new Index:
PUT /<NEW_INDEX_NAME>
{
"settings": {
"number_of_shards": <REQUIRED_NUMBER_OF_SHARDS>
},
"mappings": {<MAPPINGS_OF_SOURCE_INDEX>}
}
}
If you don't give the number of shards in the settings while creating an index, by default it creates index with one primary and one replica shard.
To Reindex from source to newly created index:
POST _reindex
{
"source": {
"index": "<SOURCE_INDEX_NAME>"
},
"dest": {
"index": "<NEW_INDEX_NAME>"
}
}

Joining two indexes in Elastic Search like a table join

I am relatively new to this elastic search. So, I have an index called post which contain documents like this:
{
"id": 1,
"link": "https:www.instagram.com/p/XXXXX/",
"profile_id": 11,
"like_count": 100,
"comment_count": 12
}
I have another index called profile which contain documents like this:
{
"id": 11,
"username": "superman",
"name": "Superman",
"followers": 12312
}
So, as you guys can see, I have all profiles data under the index called profile and all posts data under the index called post. The "profile_id" present in the post document is linked with the "id" present in the profile document.
Is there any way, when I am querying the post index and filtering out the post documents the profile data will also appear along with the post document based on the "profile_id" present in the post document? Or somehow fetch the both data doing a multi-index search?
Thank you guys in advance, any help will be appreciated.
For the sake of performance, Elasticsearch encourages you to denormalize your data and model your documents accordingly to the responses you wish to get from your queries. However, in your case, I would suggest defining the relation post-profile by using a Join datatype (link to Elastic documentation) and using the parent-join queries to run your searches (link to Elastic documentation).

Elasticsearch: Aggregate documents based on date range

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date.
I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket.
This means that a single document might be in multiple buckets.
Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.
Sample doc content:
{
"start_date": "2013-01-12T00:00:00.000Z",
"end_date": "2016-12-08T00:00:00.000Z",
"id": "123123123"
}
Is there a way to do this in ES?
I have found one way to do this, using filter aggregations (
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:
"bool":{
"must":[{
"range":{
"start_date":{
"lte":"2016-01-01T00:00:00.000Z"
}
}
},{
{
"range":{
"end_date":{
"gt":"2016-02-01T00:00:00.000Z"
}
}
}]
}
However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.
I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".
Here are the documentation links for both of these aggregations.
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

Remove or delete old data from elastic search

How to remove old data from elastic search index as the index has large amount of data being inserted every day.
You can do that with delete by query plugin.
Assuming you have some timestamp or creation date field in your index, your query would look something like this
DELETE /your_index/your_type/_query
{
"query": {
"range": {
"timestamp": {
"lte": "now-10y"
}
}
}
}
This will delete records older than 10 years.
I hope this helps
Split data to daily indexes and use alias as old index name. then Delete the each index daily. just as logstash:
Daily indices :logstash-20151011,logstash-20151012,logstash-20151013.
Full Alias: logstash
Then daily delete last index.
If you are using time-based indices, that should be something like:
curl -XDELETE http://localhost:9200/test-2017-06

Resources