why do we need filebeat when we can ship logs to Logstatsh - elasticsearch

Hi as a newbie to elastic I have a doubt on why we need fileBeat to ship logs to ElasticSearch(ES) or Logstatsh.
As far as I knew we can directly read logs from files and send to logstash and from there to ES. If the former is allowed why we need FileBeat to be a intermediary layer between logs and logstash.
What i knew : xyzlogfile--->logstash-file--->ES--->kibana
Why do we need FileBeat between : xyzlogfile--->fileBeat--->logstash-file--->ES--->kibana

I assume you are talking about File Input Plugin vs Filebeat.
Some points to note:
Logstash is much heavier in terms of memory and CPU usage than Filebeat. It requires a JVM which might be fine if you deploy java software but for many projects a JVM is an unnecessary overhead. Filebeat is just a light native executable.
You might not need Logstash at all
If your logs are JSON
If you don't need any parsing and you are ok with timestamps generated by Filebeat ([EDIT 2021-01-01] Filebeat has various processors, it can even do arbitrary script execution, pure go implementation of ECMASCRIPT 5.1, https://www.elastic.co/guide/en/beats/filebeat/current/processor-script.html)
If you have simple regex parsing (e.g. grok filter) you can just use Ingest Nodes (https://www.elastic.co/guide/en/elasticsearch/reference/5.0/ingest.html)
For more complex parsing/event cloning/grouping Logstash will probably be needed. Just writing a ruby filter for example is super easy and you can prototype fast. For optimizing super high production loads you might need to write a custom filter plugin, or perhaps you can try writing your own custom Processor to be used with Ingest Nodes (but I haven't tried that yet, I can tell you that writing a custom Logstash filter is pretty straightforward)
All the above points are related to ingesting file contents, but Logstash has many input/output plugins that you might need and are only available with Logstash
If all your files are located on the same node as the logstash process, than using the File Input Plugin could be an option ("xyzlogfile--->logstash-file--->ES--->kibana").
However for most deployments you want to collect data from many nodes with different roles and software stacks deployed on them. You do not want to deploy a Logstash instance on all those nodes, so "xyzlogfile--->fileBeat--->logstash-beats--->ES--->kibana" should be used (or another option is "xyzlogfile--->fileBeat--->ES--->kibana" with Ingest Node).

Based On Mastering Elastic Stack by Packt:
Beats are data shippers shipping data from a variety of inputs such as files, data streams, or logs whereas Logstash is a data parser. Though Logstash can ship data, it's not its primary usage.
Logstash consumes a lot of memory and requires a higher amount of resources , whereas Beats requires fewer resources and consumes low memory.

Related

ELK (Elasticsearch, Logstash, Kibana) stack - Do I really need both Logstash and Filebeat configured?

I would like to deploy ELK stack on-premise for our custom application. So, I referred to the official docs for installation guides, installed Elasticsearch cluster and Kibana. Then comes the question: the documentation says I can process the logs from any custom app if I would like to (if built-in modules are not suitable for me), and I should just configure Filebeat so it can harvest these logs as an input. But what should be an output for Filebeat? I've heard that Elasticsearch should get processed, structured logs (for example, in JSON format) as an input; but our application produces plain text logs (as it's Java app, logs can include stack traces and other mixed data), and they should be processed and structured first... Or shouldn't they?
So, here are my questions regarding this situation:
Do I need to set Filebeat output as Logstash input to format and structure logs, and then set Logstash output as Elasticsearch input? Or I can forward logs from Filebeat straight to Elasticsearch?
Do I really need Filebeat in this situation, or maybe Logstash can be configured to read log files by its own?
Filebeat and Logstash can both work either on their own or in concert together. If all you have to do is to tail your log files and send them to Elasticsearch, without performing any processing on them, then I'd say go for Filebeat as it's more lightweight than Logstash.
If you need to perform some processing and transformation on your log files, then you have a few options depending on which solution you pick. You can leverage:
Filebeat processors
Logstash filters
Elasticsearch ingest processors
As a side note, I draw your attention on the fact that your Java app doesn't necessarily have to produce plain text logs. Using ecs-logging-java, it can also produce JSON logs ready to be ingested into Elasticsearch.
If you use the above logging library, then Filebeat would be perfectly suitable for your use case, but it depends of course on whether you need to parse and process the message field in your logs or not.

Difference between using Filebeat and Logstash to push log file to Elasticsearch

I am trying out the ELK to visualise my log file. I have tried different setups:
Logstash file input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
Logstash Beats input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html with Filebeat Logstash output https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html
Filebeat Elasticsearch output https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
Can someone list out their differences and when to use which setup? If it is not for here, please point me to the right place like Super User or DevOp or Server Fault.
1) To use logstash file input you need a logstash instance running on the machine from where you want to collect the logs, if the logs are on the same machine that you are already running logstash this is not a problem, but if the logs are on remote machines, a logstash instance is not always recommended because it needs more resources than filebeat.
2 and 3) For collecting logs on remote machines filebeat is recommended since it needs less resources than a logstash instance, you would use the logstash output if you want to parse your logs, add or remove fields or make some enrichment on your data, if you don't need to do anything like that you can use the elasticsearch output and send the data directly to elasticsearch.
This is the main difference, if your logs are on the same machine that you are running logstash, you can use the file input, if you need to collect logs from remote machines, you can use filebeat and send it to logstash if you want to make transformations on your data, or send directly to elasticsearch if you don't need to make transformations on your data.
Another advantage of using filebeat, even on the logstash machine, is that if your logstash instance is down, you won't lose any logs, filebeat will resend the events, using the file input you can lose events in some cases.
An additional point for large scale application is that if you have a lot of Beat (FileBeat, HeartBeat, MetricBeat...) instances, you would not want them altogether open connection and sending data directly to Elasticsearch instance at the same time.
Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.
And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.
Figures are captured from Logz.io. They also have a good
article on this topic.
Not really familiar with (2).
But,
Logstash(1) is usually a good choice to take a content play around with it using input/output filters, match it to your analyzers, then send it to Elasticsearch.
Ex.
You point the Logstash to your MySql which takes a row modify the data (maybe do some math on it, then Concat some and cut out some words then send it to ElasticSearch as processed data).
As for Logbeat(2), it's a perfect choice to pick up an already processed data and pass it to elasticsearch.
Logstash (as the name clearly states) is mostly good for log files and stuff like that. usually you can do tiny changes to those.
Ex. I have some log files in my servers (incl errors, syslogs, process logs..)
Logstash listens to those files, automatically picks up new lines added to it and sends those to Elasticsearch.
Then you can filter some things in elasticsearch and find what's important to you.
p.s: logstash has a really good way of load balancing too many data to ES.
You can now use filebeat to send logs to elasticsearch directly or logstash (without a logstash agent, but still need a logstash server of course).
Main advantage is that logstash will allow you to custom parse each line of the logs...whereas filebeat alone will simply send the log and there is not much separation of fields.
Elasticsearch will still index and store the data.

What is a most benefit way to gather server hardware utilization, app logs, app jvm metrics, using Elastic-Stack?

Besides ELK standard goal for gathering application logs data i want to leverage this stack for advanced data collection such as JVM metrics (via JMX) and host's cpu/ram/disk/network utilization.
The most suitable one i thought is using metricbeat, but i doubt if metricbeat is enough for purposes described above.
Since i aiming at minimal stack of things to configure, will Metricbeat-Elasticsearch-Kibana be enough for collecting app logs,app jvm metrics,host's hardware utilization or there are some more suitable alternatives ?
UPDATE
Oh, i see now, that i need also filebeat besides metricbeat for gathering app logs.
Is there any out of the box single solution that combines filebeat and metricbeat agents ?
Currently Filebeat and Metricbeat are separate binaries and you need to run both:
Filebeat to collect your logs (and potentially parse them with Elasticsearch Ingest node).
Metricbeat with the system module for cpu/ram/disk/network and we also have a JMX / Jolokia module for that functionality.

Why install logstash if I can just send the data through REST to elasticsearch?

I installed elasticsearch and kibana, and I'm following the tutorial.
https://www.elastic.co/guide/en/elasticsearch/reference/current/_index_and_query_a_document.html
And I'm perfectly inserting and reading data, e.g.:
PUT /customer/external/1?pretty
{
"name": "John Doe"
}
So, that makes me wonder, what do I need logstash or filebeats for?
My plan is to log each web request on a website to elasticsearch for analytics.
Do I need to install logstash? I don't understand what would I need it for.
(I don't plan to store it on a file)I will read the request info(e.g. ip address, time, user_id, etc) from a PHP script and simply send it through a HTTP REST REQUEST...as the example above to the elasticsearch server which will save the data anyway. So, I don't see any reason to store the data on the webserver(that is data duplicity), and If I wanted to, why would I need logstash anyway...I can just read a .log file and send it to elasticsearch....like this example: https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_data.html
No, you do not have to install Logstash, if you plan to collect, normalize and write your application data yourself. As you correctly assumed, Logstash would be a replacement for your PHP script.
Nevertheless, you might still consider to have a look at Logstash. Since it is developed and maintained by same company taking care of Elastic Search, you could benefit from upcoming changes and optimizations.
As you can read from the introduction, Logstash is a tool to read data from multiple sources, normalize it and write the result to multiple destinations. For more details on which sources, filters and oputputs Logstash offers, you should also take a look at the pipeline documentation.

FileBeat directly to ELS or via LogStash?

We are installing ELS and Kibana for log aggregation/analysis. The first system to use it is greenfield so we output structured logs from the services that make up our system. Given that we don't need to add structure to our logs I was planning on using FileBeat to ship the logs directly to ELS and not use LogStash. Is this a sensible option or does LogStash have value over and above parsing that we might need? If we do use LogStash can I use that to harvest log files or should I still use FileBeat to pump the logs to LogStash?
Logstash is useful if you need to aggregate logs from many servers and apply some common transformations and filtering to your events.
If your log events are already structured and you are ok with indexing them directly, then you can definitely have Filebeat send them directly to ES. If ES goes down (e.g. for maintenance), Filebeat will retry until it can successfully send the events.
Is this a sensible option or does LogStash have value over and above parsing that we might need?
Deciding to use Logstash or not, in your case, depends on whether you need to treat the logs before inserting them into ES.
In addition to parsing (which is apparently useless in your use case), you can use Logstash to add a location with the geoip filter, parse a date with the date filter, replace a word with another, replace a field with a hash, etc...
You can have a look at the available filters here.
If we do use LogStash can I use that to harvest log files or should I still use FileBeat to pump the logs to LogStash?
If you need Logstash and can afford to run it on the machine where your logs are, you can avoid using Filebeat, by using the file input.
But keep in mind that Logstash, especially if used for parsing, can consume a lot of resources. It is better to have it on another machine and use Filebeat to pump the logs to Logstash.

Resources