Logstash restart slow with large persistent queue - elasticsearch

I have a logstash server that somehow stopped listening on its syslog input (but didn't crash thats odd enough on itself but case for another question), it was configured to have a max queue of 100GB and, after some time (31gb of queue) i decided to restart it.
After restarting logstash it gets "stuck" on
[INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
and it doesnt start sending to Elasticsearch the messages on the queue (neither the ones on the queue nor newly received ones).
If i delete the queue folder and restart logstash then I get new events but obviously I lost all the old ones.
Why is logstash taking so long to process the persistent queue when its big? are there any settings i should tune to make the pipeline flow ?
I have saved the old Persistent Queue for looking deeper into it, any pointers?

Related

Difference between using Filebeat and Logstash to push log file to Elasticsearch

I am trying out the ELK to visualise my log file. I have tried different setups:
Logstash file input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
Logstash Beats input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html with Filebeat Logstash output https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html
Filebeat Elasticsearch output https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
Can someone list out their differences and when to use which setup? If it is not for here, please point me to the right place like Super User or DevOp or Server Fault.
1) To use logstash file input you need a logstash instance running on the machine from where you want to collect the logs, if the logs are on the same machine that you are already running logstash this is not a problem, but if the logs are on remote machines, a logstash instance is not always recommended because it needs more resources than filebeat.
2 and 3) For collecting logs on remote machines filebeat is recommended since it needs less resources than a logstash instance, you would use the logstash output if you want to parse your logs, add or remove fields or make some enrichment on your data, if you don't need to do anything like that you can use the elasticsearch output and send the data directly to elasticsearch.
This is the main difference, if your logs are on the same machine that you are running logstash, you can use the file input, if you need to collect logs from remote machines, you can use filebeat and send it to logstash if you want to make transformations on your data, or send directly to elasticsearch if you don't need to make transformations on your data.
Another advantage of using filebeat, even on the logstash machine, is that if your logstash instance is down, you won't lose any logs, filebeat will resend the events, using the file input you can lose events in some cases.
An additional point for large scale application is that if you have a lot of Beat (FileBeat, HeartBeat, MetricBeat...) instances, you would not want them altogether open connection and sending data directly to Elasticsearch instance at the same time.
Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.
And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.
Figures are captured from Logz.io. They also have a good
article on this topic.
Not really familiar with (2).
But,
Logstash(1) is usually a good choice to take a content play around with it using input/output filters, match it to your analyzers, then send it to Elasticsearch.
Ex.
You point the Logstash to your MySql which takes a row modify the data (maybe do some math on it, then Concat some and cut out some words then send it to ElasticSearch as processed data).
As for Logbeat(2), it's a perfect choice to pick up an already processed data and pass it to elasticsearch.
Logstash (as the name clearly states) is mostly good for log files and stuff like that. usually you can do tiny changes to those.
Ex. I have some log files in my servers (incl errors, syslogs, process logs..)
Logstash listens to those files, automatically picks up new lines added to it and sends those to Elasticsearch.
Then you can filter some things in elasticsearch and find what's important to you.
p.s: logstash has a really good way of load balancing too many data to ES.
You can now use filebeat to send logs to elasticsearch directly or logstash (without a logstash agent, but still need a logstash server of course).
Main advantage is that logstash will allow you to custom parse each line of the logs...whereas filebeat alone will simply send the log and there is not much separation of fields.
Elasticsearch will still index and store the data.

Understanding the logstash retry policy

I have kibana and elasticsearch instance running on a machine. Logstash and filebeat are running on other machine.
The flow is working perfectly fine. I have one doubt and i need to understand that. I made elasticsearch go down and made logstash to pump some logs to elasticearch. Since elasticsearch is down, i am hoping data will be lost. But when i brought up the elasticsearch service, Kibana was able to show the logs which was sent when elasticsearch was down.
When i googled online, i got to know that logstash retries to connect in elasticsearch is down.
May i please know how to set this parameter
The reason is that the elasticsearch output implements exponential backoff using two parameters called:
retry_initial_interval
retry_max_interval
If a bulk call fails, Logstash will wait for retry_initial_interval seconds and try again. If it still fails, it will wait for 2 * retry_initial_interval and try again. Ans so on until the wait time reaches retry_max_interval, at which point it will keep trying every retry_max_interval seconds indefinitely.
Note that this retry policy only works when ES is unreachable. If there's another error, such as a mapping error (HTTP 400) or a conflict (HTTP 409), the bulk call will not be retried.

Filebeat and bufferring

Sorry, if its a naive question.
I've Filebeat is configured to ship data to ES directly. Just incase ES is offline and filebeat harvester found a log to ship, would it buffer, retry and ship?
here is what I tried, my docker container generated a log file, filebeat got that log entry and reported saying sent 'x' events but ES wasn't reachable. I deleted the log file thinking that filebeat got it buffered and then started ES. I dont see the logs coming thru.
How to handle this scenario?
If you want queuing you have to add broker like kafka, redis or rabbitMQ, several configurations are possible, you can also send to logstash will keep data if ES is down... if it is down (ES) go to your log you will see connection refuse no data sent

elasticsearch and logstash shutting down prematurely

I am pretty new to this and currently have a single unix (centos) server running logstash, elasticsearch and kibana. The data is being consumed from rabbitmq exchange and works pretty well but for some reason after a few hours the kibana dashboard will become inactive, the elasticsearch node inactive and logstash stops consuming. I initially set it up to manually start each process for eg. ./elasticsearch etc. and wonder if setting it up as a service would prevent this from occurring.
I want to ensure that the setup runs continuously without any interruptions.
http://192.xxx.xxx.xxx:9200/_plugin/head/
Any suggestions and links appreciated

Why is elasticsearch crashing from inactive shards and logstash failing on bulk actions?

I am currently testing an ELK stack on 1 Ubuntu 14.04 box. It has 6 GB of RAM and 1TB storage. This is pretty modest, but for the amount of data I am getting, this should be plenty right? I followed this guide elk stack guide. In summary, I have Kibana4, Logstash 1.5, and Elasticsearch 1.4.4 all running on one box, with a nginx server acting as a reverse proxy so I can access Kibana from outside. The main difference from the guide is that instead of syslogs, I am taking json input from a logstash-forwarder, sending about 300 events/minute.
Once started, everything is fine -- the logs show up on Kibana and there are no errors. After about 3 hours, elasticsearch crashes. I get a
Discover: Cannot read property 'indexOf' of undefined
error on the site. Logs can be seen on pastebin. It seems that shards become inactive and elasticsearch updates the index_buffer size.
If I refresh the Kibana UI, it starts working again for my json logs. However, if I test a different log source (using TCP input instead of lumberjack), I get similar errors to the above, except that I stop processing logs -- anywhere from 10 min to an hour, I do not process any more logs and I cannot stop logstash unless I perform a kill -KILL.
Killing logstash (pid 13333) with SIGTERM
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
logstash stop failed; still running.
Logstash error log shows . Logstash .log file is empty...
For the tcp input, I get about 1500 events every 15 minutes, in a bulk insert process by logstash.
Any ideas here?
EDIT: I also observed that when starting my elasticsearch process, my shards are set to a lower mb...
[2015-05-08 19:19:44,302][DEBUG][index.engine.internal ] [Eon] [logstash- 2015.05.05][0] updating index_buffer_size from [64mb] to [4mb]
#jeffrey, I have the same problem with DNS filter.
I did two things.
I installed dnsmasq as DNS caching resolver. It help if you have high latency or high load DNS server.
And second I increased number of worker threads of logstash. Just use -w option.
Trick with threads working without dnsmasq. Trick with dnsmask without threads not.
I found out my problem. The problem was in my logstash configuration -- The DNS filter (http://www.logstash.net/docs/1.4.2/filters/dns) for some reason caused by logstash to crash/hang. Once I took out the DNS filter, everything worked fine. Perhaps there was some error with the way the DNS filter was configured.

Resources