I'm running a logstash instance that reads some records from kafka and inserts them onto elasticsearch.
I had a problem with the elasticsearch configuration and new records weren't being inserted into elasticsearch.
Eventually I was able to fix the elasticsearch output. But even though the elasticsearch output wasn't able to write the records logstash didn't stop reading in more data from kafka.
So when I restarted logstash, it didn't pick up from the last successful kafka offset. Basically I lost all the records since the elasticsearch output stopped writing records.
How can I avoid that from happening again? Is there a way to stop the whole pipeline when there is an error on the output?
My simplified config file:
input {
kafka {
zk_connect => "zk01:2181,zk02:2181,zk03:2181/kafka"
topic_id => "my-topic"
auto_offset_reset => "smallest"
group_id => "logstash-es"
codec => "json"
}
}
output {
elasticsearch {
index => "index-%{+YYYY-MM-dd}"
document_type => "dev"
hosts => ["elasticsearch01","elasticsearch02","elasticsearch03","elasticsearch04","elasticsearch05","elasticsearch06"]
template => "/my-template.json"
template_overwrite => true
manage_template => true
}
}
Related
I am relatively new to the whole of the ELK set up part, hence please bear along.
What I want to do is send the cloudtrail logs that are stored on S3 into a locally hosted (non-AWS I mean) ELK set up. I am not using Filebeat anywhere in the set up. I believe it isn't mandatory to use it. Logstash can directly deliver data to ES.
Am I right here ?
Once the data is in ES, I would simply want to visualize it in Kibana.
What I have tried so far, given that my ELK is up and running and that there is no Filebeat involved in the setup:
using the S3 logstash plugin
contents of /etc/logstash/conf.d/aws_ct_s3.conf
input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}
output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}
When logstash is started with the above conf, I can see all working fine. Using the head google chrome plugin, I can see that the documents are continuously getting added to the specified index.In fact when I browse it as well, I can see that there is the data I need. I am able to see the same on the Kibana side too.
The data that each of these gzip files have is of the format:
{
"Records": [
dictionary_D1,
dictionary_D2,
.
.
.
]
}
And I want to have each of these dictionaries from the list of dictionaries above to be a separate event in Kibana. With some Googling around I understand that I could use the split filter to achieve what I want to. And now my aws_ct_s3.conf looks something like :
input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}
filter {
split {
field => "Records"
}
}
output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}
And with this I am in fact getting the data as I need on Kibana.
Now the problem is
Without the filter in place, the number of documents that were being shipped by Logstash from S3 to Elasticsearch was in GBs, while after applying the filter it has stopped at roughly some 5000 documents alone.
I do not know what am I doing wrong here. Could someone please help ?
Current config:
java -XshowSettings:vm => Max Heap Size: 8.9 GB
elasticsearch jvm options => max and min heap size: 6GB
logstash jvm options => max and min heap size: 2GB
ES version - 6.6.0
LS version - 6.6.0
Kibana version - 6.6.0
This is how the current heap usage looks like:
Normally, in ELK logstash parsed data and send to elastics search.
I want to know is it possible that logstash send same data to different location at real time.
If it is possible, please let me know how to do it.
Create several output files that match type and send to different hosts.
output {
if [type] == "syslog" {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "logstash-%{+YYYY.MM.dd}"
codec => "plain"
workers => 1
manage_template => true
template_name => "logstash"
template_overwrite => false
flush_size => 100
idle_flush_time => 1
}
}
}
Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.
Now,I meet a question. My logstash configuration file as follows:
input {
redis {
host => "127.0.0.1"
port => 6379
db => 10
data_type => "list"
key => "local_tag_del"
}
}
filter {
}
output {
elasticsearch {
action => "delete"
hosts => ["127.0.0.1:9200"]
codec => "json"
index => "mbd-data"
document_type => "localtag"
document_id => "%{album_id}"
}
file {
path => "/data/elasticsearch/result.json"
}
stdout {}
}
I want to read id from redis, by logstash, notify es to delete document.
Excuse me,My English is poor,I hope that someone will help me .
Thx.
I can't help you particularly, because your problem is spelled out in your error message - logstash couldn't connect to your elasticsearch instance.
That usually means one of:
elasticsearch isn't running
elasticsearch isn't bound to localhost
That's nothing to do with your logstash config. Using logstash to delete documents is a bit unusual though, so I'm not entirely sure this isn't an XY problem
I'm trying to reindex the data by having elasticsearch as input and sending back to elasticsearch as output. The script is running fine but the indexing is going indefinitely. The script is as below
input {
elasticsearch {
host => "10.0.0.11"
index => "logstash-2015.02.05"
}
}
output {
elasticsearch {
host => "10.0.0.11"
protocol => "http"
cluster => "logstash"
node_name => "logindexer"
index => "logstash-2015.02.05_new"
}
}
This means if I have 200 docs under logstash-2015.02.05 index then it creates duplicate records in logstash-2015.02.05_new and keeps going until I stop the logstash agent. Is there a way to just restrict the documents in new index to have exactly the same as the old index? Pls help.