Elasticsearch TTL not working - elasticsearch

I use elasticsearch for logs, I don't want to use daily index to delete them with a cron job but with the TTL. I 've actived and set TTL with the value: 30s. I have a succesfull answer when I send this operation and I can see the TTL value(in milliseconds) when I do the mapping request.
All seems good but documents are not be deleted...
_mapping :
{
"logs" : {
"webservers" : {
"_ttl" : {
"default" : 30000
},
"properties" : {
#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}

I guess you just need to enable _ttl for your type, which is disabled by default. Have a look here.
{
"webservers" : {
"_ttl" : { "enabled" : true, "default" : "30s" }
}
}

Related

Why auto delete index policy doesn't work in elasticsearch?

I created a index policy in Kibana to delete index order than 7 days. Below is the configuration:
And I have indexes who are using this policy but none of them get deleted. Below is one of the index setting configuration. It has already specified the policy to use: metrics-log-retention. Is there anything I missed?
{
"aws-logs-2022-02-01" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metrics-log-retention"
},
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "aws-logs-2022-02-01",
"creation_date" : "1643673636747",
"priority" : "100",
"number_of_replicas" : "1",
"uuid" : "lLmO753nRpuw6bauKIJI2Q",
"version" : {
"created" : "7150299"
}
}
}
}
}
Below is the hot phase. I have disabled all options under hot as shown in below screenshot. but it still doesn't work.
Below is the raw data for the index policy:
{
"metrics-log-retention" : {
"version" : 4,
"modified_date" : "2022-02-10T22:24:14.492Z",
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "50gb",
"max_primary_shard_size" : "50gb",
"max_age" : "1d"
}
}
},
"delete" : {
"min_age" : "6d",
"actions" : {
"delete" : {
"delete_searchable_snapshot" : true
}
}
}
}
},
"in_use_by" : {
"indices" : [
"aws-logs-2022-02-01",
"aws-logs-2022-02-04",
"aws-logs-2022-02-05",
"aws-logs-2022-02-02",
"aws-logs-2022-02-03",
"aws-metrics-2022-02-01",
"aws-metrics-2022-02-07",
"aws-logs-2022-02-08",
"aws-metrics-2022-02-06",
"aws-logs-2022-02-09",
"aws-logs-2022-02-06",
"aws-metrics-2022-02-09",
"aws-logs-2022-02-07",
"aws-metrics-2022-02-08",
"aws-metrics-2022-02-03",
"aws-metrics-2022-02-02",
"aws-metrics-2022-02-05",
"aws-metrics-2022-02-04",
"aws-logs-2022-02-11",
"aws-logs-2022-02-12",
"aws-logs-2022-02-10",
"aws-logs-2022-02-13",
"aws-metrics-2022-02-10",
"aws-metrics-2022-02-12",
"aws-metrics-2022-02-11",
"aws-metrics-2022-02-13"
],
"data_streams" : [ ],
"composable_templates" : [ ]
}
}
}
As you can see on the hot phase advanced settings, the default rollover settings are 30 days or 50GB, so your indexes will stay in the hot phase for 30 days, unless they grow over 50GB before.
Once the index gets out of the hot phase it gets into the delete phase and if you hover over the (i) icon, you can see that the 7 days are calculated AFTER the roll over from the hot phase.
So if you really want your indexes to be deleted after 7 days, you need to:
configure the hot phase to be shorter (say 6 days)
configure the delete phase to kick in after 1 day from rollover
That way, the index will be created and stay six days in the hot phase and then be deleted after one day.
Just add it to crontab on ES host, it will delete old indices automatically
0 7 * * * curl -u LOGIN:PASSWORD -XDELETE http://localhost:9200/aws-logs-$(date --date="7 days ago" +"%Y.%m.%d")

Setting enabled to true Elasticsearch

I'm new to elasticsearch. I have an index type as follows
{
"myindex" : {
"mappings" : {
"systemChanges" : {
"_all" : {
"enabled" : false
},
"properties" : {
"autoChange" : {
"type" : "boolean"
},
"changed" : {
"type" : "object",
"enabled" : false
},
"created" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
}
}
}
}
}
}
I'm unable to fetch the details having changed.new = completed. After some research i have found that it's because the changed field is set to enabled : false. and I need to change the same. I tried as follows
curl -X PUT "localhost:9200/myindex/" -H 'Content-Type: application/json' -d' {
"mappings": {
"systemChanges" : {
"properties" : {
"changed" : {
"enabled" : true
}
}
}
}
}'
But I'm getting error as following.
{"error":{"root_cause":[{"type":"index_already_exists_exception","reason":"already exists","index":"myindex"}],"type":"index_already_exists_exception","reason":"already exists","index":"myindex"},"status":400}
How can I change the enabled to true in order to fetch the details of the changed.new field?
you are trying to add an index again with the same name and hence the error.
See the below link for updating a mapping
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
The enabled setting can be updated on existing fields using the PUT mapping API.

time-based when configure an index pattern not working

Hi!
I have an issue about set a date field as time-based when I configure my index pattern. When I choose my date filed on the timefield name, I cannot Vizualise any data on the Discover part.
However, when I uncheck the box named Index contains time-based events, all data appears:
Maybe I forgot something during my mapping ? There is the mapping I've set for this index:
"index_test" : {
"mappings": {
"tr": {
"_source": {
"enabled":true
},
"properties" : {
"id" : { "type" : "integer" },
"volume" : { "type" : "integer" },
"high" : { "type" : "float" },
"low" : { "type" : "float" },
"timestamp" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm:ss" }
}
}
}'
}
I am currently try to use timelion also, and it seems to not found any data to show. I think it cannot because of this time-based unchecked... Any idea about how set this timestamp as time-based without loose the data access on the Discover part ?
Simple question with simple answer... I just forgot to set the timepicker in the Right-top of the Discover part to show past data:

Bad indexing performance of elasticsearch

Currently, I'm using elastic search to store and query some logs. We set up a five node elastic search cluster. Among them two indexing nodes and three query nodes. In the indexing node, we have redis, logstash and elasticsearch on both two servers. The elasticsearch uses NFS storage as data store. Our requirement is to index 300 log entries/second. But the best performance I can get from elasticsearch is only 25 log entries/second!
The XMX of the elasticsearch is 16G.
Version of each component:
Redis: 2.8.12
logstash: 1.4.2
elasticsearch: 1.5.0
Our current index settings are like this:
{
"userlog" : {
"settings" : {
"index" : {
"index" : {
"store" : {
"type" : "mmapfs"
},
"translog" : {
"flush_threshold_ops" : "50000"
}
},
"number_of_replicas" : "1",
"translog" : {
"flush_threshold_size" : "1G",
"durability" : "async"
},
"merge" : {
"scheduler" : {
"max_thread_count" : "1"
}
},
"indexing" : {
"slowlog" : {
"threshold" : {
"index" : {
"trace" : "2s",
"info" : "5s"
}
}
}
},
"memory" : {
"index_buffer_size" : "3G"
},
"refresh_interval" : "30s",
"version" : {
"created" : "1050099"
},
"creation_date" : "1447730702943",
"search" : {
"slowlog" : {
"threshold" : {
"fetch" : {
"debug" : "500ms"
},
"query" : {
"warn" : "10s",
"trace" : "1s"
}
}
}
},
"indices" : {
"memory" : {
"index_buffer_size" : "30%"
}
},
"uuid" : "E1ttme3fSxKVD5kRHEr_MA",
"index_currency" : "32",
"number_of_shards" : "5"
}
}
}
}
Here's my logstash config:
input {
redis {
host => "eanprduserreporedis01.eao.abn-iad.ea.com"
port => "6379"
type => "redis-input"
data_type => "list"
key => "userLog"
threads => 15
}
# Second reids block begin
redis {
host => "eanprduserreporedis02.eao.abn-iad.ea.com"
port => "6379"
type => "redis-input"
data_type => "list"
key => "userLog"
threads => 15
}
# Second reids block end
}
output {
elasticsearch {
cluster => "customizedlog_prod"
index => "userlog"
workers => 30
}
stdout{}
}
A very strange thing is although currently the indexing speed is only ~20/s, the IO wait is very high almost 70%. And mostly are read traffic. Through nfsiostat, current read speed is about 200Mbps! So basically, to index every log entry, it will read about 10Mbits of data which is insane because the average length of our log entry is less than 10K.
So, I took a jstack dump of the elastic search, here's the result of one RUNNING thread:
"elasticsearch[somestupidhostname][bulk][T#3]" daemon prio=10 tid=0x00007f230c109800 nid=0x79f6 runnable [0x00007f1ba85f0000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:730)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:715)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:122)
at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:152)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:506)
at org.elasticsearch.common.lucene.uid.PerThreadIDAndVersionLookup.lookup(PerThreadIDAndVersionLookup.java:104)
at org.elasticsearch.common.lucene.uid.Versions.loadDocIdAndVersion(Versions.java:150)
at org.elasticsearch.common.lucene.uid.Versions.loadVersion(Versions.java:161)
at org.elasticsearch.index.engine.InternalEngine.loadCurrentVersionFromIndex(InternalEngine.java:1002)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:277)
- locked <0x00000005fc76b938> (a java.lang.Object)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:256)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:455)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:437)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:149)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:515)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:422)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Can anyone tell me what is elastic search doing and why the indexing is so slow? And is it possible to improve it?
It may not be entirely responsible for your poor performance, but check out the batch_size option for redis. I'll bet it'll get better if you're pulling more than 1 document from redis at a time.

Elasticsearch not storing field, what am I doing wrong?

I have something like the following template in my Elasticsearch. I just want certain part of the data returned, so I turn the source off, and explicitly stated store for the fields I want.
{
"template_1" : {
"order" : 20,
"template" : "test*",
"settings" : { },
"mappings" : {
"_default_" : {
"_source" : {
"enabled" : false
}
},
"type_1" : {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}
}
}
However, when I query the data, I don't get the fields back. The query works, however, if I enable the _source field. I am just starting with Elasticsearch, so I am not quite sure what I am doing wrong. Any help would be appreciated.
Field definitions should be wrapped in properties section of your mapping:
"type_1" : {
"properties": {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}

Resources