Centralized logging with Kafka and ELK stack - elasticsearch

There are more than 50 Java applications (They are not microservices, so we don't have to worry about multiple instance of the service). Now my architect designed a solution to get the log files and feed into a kafka topic and from kafka feed it into logstash and push it to elastic search so we can view the logs in kibana. Now I am new to Kafka and ELK stack. Will someone point me to a right direction on how to do this task. I learnt that Log4J and SLF4J can be configured to push the logs to kafka topic.
1. Now how to consume from kafka and load it into logstash? Do I have to write a kafka consumer or we can do that just by configuration?
2. How logstash will feed the logs to elastic search?
3. How can I differentiate all the 50 application logs, do i have to create topic for each and every application?
I put the business problem, now I need step by step expert advice. - Thanks in advance.

Essentially what your architect has laid out for you can be divided into two major components based upon their function (on architecture level);
Log Buffer (Kafka)
Log Ingester (ELK)
[Java Applications] =====> [Kafka] ------> [ELK]
If you study ELK you would feel like it is sufficient for your solution and Kafka would appear surplus. However, Kafka has important role to play when it comes to scale. When many of your Java applications would send logs to ELK, ELK may become overloaded and break.
To avoid ELK from overload your architect has setup a buffer (Kafka). Kafka will receive logs from applications and queue it up in case ELK is under load. In this way you do not break ELK and also you do not loose logs when ELK is struggling.
Answers to your questions in the same order;
(1) Logstash has 'input' plugins that can be used to setup a link between Kafka & Logstash. Read on Logstash and its plugins.
i- Logstash Guide or Reference
ii- Input Plugins (scroll down to find Kafka plugin)
(2) Logstash will feed received logs to Elasticsearch by Output plugin for Elasticsearch. See Logstash output plugin for Elasticsearch.
(3) I may not be spot-on on this, but I think you would be able to filter & distinguish the logs at the Logstash level once you receive it from Kafka. You could apply tags or fields to each log message on reception. This additional info will be used by Elasticsearch to distinguish the applications from one another.
Implementation Steps
As somebody who is new to Kafka & ELK follow these steps to your solution;
Step 1: Start by setting up ELK first. Once you do that you would be able to see how logs are visualized and will become clearer how end solution may look like.
Guide to ELK Stack
Step 2: Setup Kafka to link your application logs to ELK.
Caveats:
You may find ELK to have some decent learning curve. Much time is required to understand how each element in the ELK stack works and what is its individual configuration and languages are.
To have deep understanding of ELK use the local deployment path where you setup ELK on your system. Avoid the cloud ELK services for that matter.

Logstash has a kafka input and an elasticsearch output, so this is configuration on the logstash side. You could differentiate the application using configuration on the log4j side (although using many topics is another possibility).

Related

ELK (Elasticsearch, Logstash, Kibana) stack - Do I really need both Logstash and Filebeat configured?

I would like to deploy ELK stack on-premise for our custom application. So, I referred to the official docs for installation guides, installed Elasticsearch cluster and Kibana. Then comes the question: the documentation says I can process the logs from any custom app if I would like to (if built-in modules are not suitable for me), and I should just configure Filebeat so it can harvest these logs as an input. But what should be an output for Filebeat? I've heard that Elasticsearch should get processed, structured logs (for example, in JSON format) as an input; but our application produces plain text logs (as it's Java app, logs can include stack traces and other mixed data), and they should be processed and structured first... Or shouldn't they?
So, here are my questions regarding this situation:
Do I need to set Filebeat output as Logstash input to format and structure logs, and then set Logstash output as Elasticsearch input? Or I can forward logs from Filebeat straight to Elasticsearch?
Do I really need Filebeat in this situation, or maybe Logstash can be configured to read log files by its own?
Filebeat and Logstash can both work either on their own or in concert together. If all you have to do is to tail your log files and send them to Elasticsearch, without performing any processing on them, then I'd say go for Filebeat as it's more lightweight than Logstash.
If you need to perform some processing and transformation on your log files, then you have a few options depending on which solution you pick. You can leverage:
Filebeat processors
Logstash filters
Elasticsearch ingest processors
As a side note, I draw your attention on the fact that your Java app doesn't necessarily have to produce plain text logs. Using ecs-logging-java, it can also produce JSON logs ready to be ingested into Elasticsearch.
If you use the above logging library, then Filebeat would be perfectly suitable for your use case, but it depends of course on whether you need to parse and process the message field in your logs or not.

Visualize Kafka Logs in Kibana

I am trying to visualise Kafka logs through ELK stack. I particularly need to see number of messages unread by consumers in real-time. I have seen the log folder in Kafka but wasn't able to understand.
Where would I find information related to offsets to consumers and how do i upload it in elasticsearch ?
Is there a documentation about logs in kafka (e.g server.log, controller.log)
Log folder of Kafka doesn't hold consumer lag.
You'll want to export this data from consumer applications themselves or install external monitoring like Prometheus, Burrow, or Remora, then scrape and index into Elasticsearch

Difference between using Filebeat and Logstash to push log file to Elasticsearch

I am trying out the ELK to visualise my log file. I have tried different setups:
Logstash file input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
Logstash Beats input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html with Filebeat Logstash output https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html
Filebeat Elasticsearch output https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
Can someone list out their differences and when to use which setup? If it is not for here, please point me to the right place like Super User or DevOp or Server Fault.
1) To use logstash file input you need a logstash instance running on the machine from where you want to collect the logs, if the logs are on the same machine that you are already running logstash this is not a problem, but if the logs are on remote machines, a logstash instance is not always recommended because it needs more resources than filebeat.
2 and 3) For collecting logs on remote machines filebeat is recommended since it needs less resources than a logstash instance, you would use the logstash output if you want to parse your logs, add or remove fields or make some enrichment on your data, if you don't need to do anything like that you can use the elasticsearch output and send the data directly to elasticsearch.
This is the main difference, if your logs are on the same machine that you are running logstash, you can use the file input, if you need to collect logs from remote machines, you can use filebeat and send it to logstash if you want to make transformations on your data, or send directly to elasticsearch if you don't need to make transformations on your data.
Another advantage of using filebeat, even on the logstash machine, is that if your logstash instance is down, you won't lose any logs, filebeat will resend the events, using the file input you can lose events in some cases.
An additional point for large scale application is that if you have a lot of Beat (FileBeat, HeartBeat, MetricBeat...) instances, you would not want them altogether open connection and sending data directly to Elasticsearch instance at the same time.
Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.
And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.
Figures are captured from Logz.io. They also have a good
article on this topic.
Not really familiar with (2).
But,
Logstash(1) is usually a good choice to take a content play around with it using input/output filters, match it to your analyzers, then send it to Elasticsearch.
Ex.
You point the Logstash to your MySql which takes a row modify the data (maybe do some math on it, then Concat some and cut out some words then send it to ElasticSearch as processed data).
As for Logbeat(2), it's a perfect choice to pick up an already processed data and pass it to elasticsearch.
Logstash (as the name clearly states) is mostly good for log files and stuff like that. usually you can do tiny changes to those.
Ex. I have some log files in my servers (incl errors, syslogs, process logs..)
Logstash listens to those files, automatically picks up new lines added to it and sends those to Elasticsearch.
Then you can filter some things in elasticsearch and find what's important to you.
p.s: logstash has a really good way of load balancing too many data to ES.
You can now use filebeat to send logs to elasticsearch directly or logstash (without a logstash agent, but still need a logstash server of course).
Main advantage is that logstash will allow you to custom parse each line of the logs...whereas filebeat alone will simply send the log and there is not much separation of fields.
Elasticsearch will still index and store the data.

How to push performance test logs to kibana via elastic search

Is there a possibility to push the analysis report taken from the Performance Center to Logstash and visualize them in Kibana? I just wanted to automate the task of checking each vuser log file and then push errors to ELK stack. How can I retrieve the files by script and automate this. I can't get any direction on this because I need to automate the task of automatically reading from each vuser_log file.
Filebeat should be your tool to get done what you mentioned.
To automatically read entries you write in a file (could be a log file) you simply need a shipper tool which can be Filebeat (It integrates well with ELK stack. Logstash can also do the same thing though but that's heavy and requires JVM )
To do this in ELK stack you need following :
Filebeat should be setup on "all" instances where your main application is running- and generating logs.
Filebeat is simple lightweight shipper tool that can read your logs entries and then send them to Logstash.
Setup one instance of Logstash (that's L of ELK) which will receive events from Filebeat. Logstash will send data to Elastic Search
Setup one instance of Elastic Search (that's E of ELK) where your data will be stored
Setup one instance of Kibana (that's K of ELK). Kibana is the front end tool to view and interact with Elastic search via Rest calls
Refer following link for setting up above mentioned:
https://logz.io/blog/elastic-stack-windows/

Can Kafka be used as a messaging service between oracle and elasticsearch

Can Kafka be used as a messaging service between oracle and elastic search ? any downsides of this approach?
Kafka Connect provides you a JDBC Source and an Elasticsearch Sink.
No downsides that I am aware of, other than service maintenance.
Feel free to use Logstash instead, but Kafka provides better resiliency and scalability.
I have tried this in the past with Sql server instead of Oracle and it works great, and I am sure you could try the same approach with Oracle as well since I know the logstash JDBC plugin that I am going to describe below has support for Oracle DB.
So basically you would need a Logstash JDBC input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html that points to your Oracle DB instance and pushes the rows over to Kafka using the Kafka Output plugin https://www.elastic.co/guide/en/logstash/current/plugins-outputs-kafka.html.
Now to read the contents from Kafka you would need, another Logstash instance(this is the indexer) and use the Kafka input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html. And finally use the Elastic search output plugin in the Logstash indexer configuration file to push the events to Elastic Search.
So the pipeline would look like this,
Oracle -> Logstash Shipper -> Kafka -> Logstash Indexer -> Elastic search.
So overall I think this is a pretty scalable way to push events from your DB to Elastic search. Now, if you look at downsides, at times you can feel that there are one too many components in your pipeline and can be frustrating especially when you have failures. So you need to put in appropriate controls and monitoring at every level to make sure you have a functioning data aggregation pipeline that is described above. Give it a try and good luck!

Resources