logstash not fast enough output to elasticsearch - elasticsearch

I use logstash as indexer to output data into elasticsearch from redis, but it is not fast enougth because of large data. And then I used mutil workers,but it will be lead various problem. There are other better ways to do faster output? Thanks.
Here is my configuration:
input {
redis {
host => "10.240.93.41"
data_type => "list"
key => "tcpflow"
}
}
filter {
csv {
columns => [ts,node,clientip,vip,rtt,city,isp,asn,province]
separator => "|"
}
}
output {
elasticsearch {
index => "tcpflow-%{+YYYY.MM.dd}"
index_type => "tcpflow"
cluster => "elasticsearch"
host => ["10.240.93.41", "10.240.129.32"]
#protocol => "node"
#protocol => "http"
#port => 8200
protocol => "transport"
manage_template => false
workers => 30
}
}

The redis{} input in logstash defaults to reading one document at a time. Try setting batch_count to something in the 100-1000 range, depending on the size of your documents.
Having multiple worker threads ("-w") is ideal, unless you're using the multiline{} filter which is not thread-safe.

Related

can logstash send data simultaneusly to mulpile location along with elastic search

Normally, in ELK logstash parsed data and send to elastics search.
I want to know is it possible that logstash send same data to different location at real time.
If it is possible, please let me know how to do it.
Create several output files that match type and send to different hosts.
output {
if [type] == "syslog" {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "logstash-%{+YYYY.MM.dd}"
codec => "plain"
workers => 1
manage_template => true
template_name => "logstash"
template_overwrite => false
flush_size => 100
idle_flush_time => 1
}
}
}

Using Redis key as Elasticsearch index name

I am attempting to use a logstash indexer to move data from redis to elasticsearch.
On the input to redis end, I give a 'key' to one set of logs from logstash output.
redis
{
host => "server
port => "7379"
data_type => "list"
key => "aruba"
}
On input end , I read each keys in the input.
input
{
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "logstash"
codec => "json"
threads => 32
batch_count => 1000
#timeout => 10
}
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "aruba"
codec => "json"
threads => 32
batch_count => 1000
#timeout => 10
}
}
and I am attempting to use the key in the logstash to write to index. i.e.
aruba-2017.24.10. something like that, but the output always goes to logstash. I tried
if[redis.key] == "xyz"
{
elasticsearch {index => "xyz-%{time}"}
}
or if[key] == "xyz" ....
also tried
elasticsearch
{
index => "%{key}-%{time}"
}
and elasticsearch{index => "%{redis.key}-%{time}"}
etc. None of it seems to work.
While #sysadmin1138 is write in that accessing nested fields is done via [field][subfield] rather than [field.subfield] your problem is that you are trying to access data that is not in your log event.
While in Redis, your log events have a key associated with them, but this is not part of the event itself and is merely used to access the event from Redis. When Logstash fetches the event from Redis, it uses that "key" to specify which events it wants, but the key does not make it to elastic.
To see this for yourself, try running logstash with stdout{codec => "rubydebug"} as an output plugin, it will prettyprint your whole log event allowing you to see what data is included.
To your rescue comes the add_field parameter that exists for every logstash plugin. You can add to your input:
redis
{
host => "localhost"
port => "6379"
data_type => "list"
type => "redis-input"
key => "aruba"
codec => "json"
threads => 32
batch_count => 1000
add_field => {
"[redis][key]" => "aruba"
}
}
Then changing your conditional to use [redis][key] will leave your code working.
(Cheers to RELK stacks)
This is likely due to an incorrect definition of the name in your conditional.
if [redis.key] == "xyz" {
elasticsearch {index => "xyz-%{time}"}
}
Should be:
if [redis][key] == "xyz" {
elasticsearch {index => "xyz-%{time}"}
}

Reindexing in Elasticsearch 1.7

There is a problem with our mappings for elasticsearch 1.7. I am fixing the problem by creating a new index with the correct mappings. I understand that since I am creating a new index I will have to reindex from old index with existing data to the new index I have just created. Problem is I have googled around and can't find a way to reindex from old to new. Seems like the reindex API was introduced in ES 2.3 and not supported for 1.7.
My question is how do I reindex my data from old to new after fixing my mappings. Alternatively, what is the best practice for making mapping changes in ES 1.7?
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html will not work for me because we're on an old version of ES (1.7)
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
Initially went down that path but got stuck, need a way to reindex the old to the new
Late for your use case, but wanted to put it out there for others. This is an excellent step-by-step guide on how to reindex an Elasticsearch index using Logstash version 1.5 while maintaining the integrity of the original data: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/
This is the logstash-simple.conf the author creates:
Input {
# We read from the "old" cluster
elasticsearch {
hosts => [ "localhost" ]
port => "9200"
index => "index"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
mutate {
remove_field => [ "#timestamp", "#version" ]
}
}
output {
# We write to the "new" cluster
elasticsearch {
host => "localhost"
port => "9200"
protocol => "http"
index => "new_index"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
# We print dots to see it in action
stdout {
codec => "dots"
}
There are a few options for you:
use logstash - it's very easy to create a reindex config in logstash and use that to reindex your documents. for example:
input {
elasticsearch {
hosts => [ "localhost" ]
port => "9200"
index => "index1"
size => 1000
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
host => "localhost"
port => "9200"
protocol => "http"
index => "index2"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
The problem with this approach that it'll be relatively slow since you'll have only a single machine that peforms the reindexing process.
another option, use this tool. It'll be faster than logstash but you'll have to provide a segmentation logic for all your documents to speed up the processing. For example, if you have a numeric fields whose values range from 1 - 100, then you could segment the queries in the tool for, maybe, 10 intervals (1 - 10, 11 - 20, ... 91 - 100), so the tool will spawn up 10 indexers that will work in parallel reindexing your old index.

I want to Delete document by logstash,but it throws a exception

Now,I meet a question. My logstash configuration file as follows:
input {
redis {
host => "127.0.0.1"
port => 6379
db => 10
data_type => "list"
key => "local_tag_del"
}
}
filter {
}
output {
elasticsearch {
action => "delete"
hosts => ["127.0.0.1:9200"]
codec => "json"
index => "mbd-data"
document_type => "localtag"
document_id => "%{album_id}"
}
file {
path => "/data/elasticsearch/result.json"
}
stdout {}
}
I want to read id from redis, by logstash, notify es to delete document.
Excuse me,My English is poor,I hope that someone will help me .
Thx.
I can't help you particularly, because your problem is spelled out in your error message - logstash couldn't connect to your elasticsearch instance.
That usually means one of:
elasticsearch isn't running
elasticsearch isn't bound to localhost
That's nothing to do with your logstash config. Using logstash to delete documents is a bit unusual though, so I'm not entirely sure this isn't an XY problem

separate indexes on logstash

Currently I have logstash configuration that pushing data to redis, and elastic server that pulling the data using the default index 'logstash'.
I've added another shipper and I've successfully managed to move the data using the default index as well. My goal is to move and restore that data on a separate index, what is the best way to achieve it?
This is my current configuration using the default index:
shipper output:
output {
redis {
host => "my-host"
data_type => "list"
key => "logstash"
codec => json
}
}
elk input:
input {
redis {
host => "my-host"
data_type => "list"
key => "logstash"
codec => json
}
}
Try to give the index filed in output. Give the name you want and then run that. so a seperate index will be created for that.
input {
redis {
host => "my-host"
data_type => "list"
key => "logstash"
codec => json
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "redis-logs"
cluster => "cluster name"
}
}

Resources