Where will E L K and filebeat reside - elasticsearch

I am working in a distributed environment.. I have a central machine which needs to monitor some 100 machines.
So I need to use ELK stack and keep monitoring the data.
Since elasticsearch, logstash,kibana and filebeat are independent softwares, i want to know where should i ideally place them in my distributed environment.
My approach was to keep kibana, elasticsearch in the central node and keep logstash and filebeat at individual nodes.
Logstash will send data to central node's elasticsearch search which kibana displays it.
Please let me know if this design is right.

Your design is not that bad but if you install elasticsearch on only one server, with time you will face the problem of availability.
You can do this:
Install filebeat and logstash on all the nodes.
Install elasticsearch as a cluster. That way if one node of elasticsearch goes down, another node can easily take over.
Install Kibana on the central node.
NB:
Make sure you configure filebeat to point to more than one logstash server. By so doing, if one logstash fails, filebeat can still ships logs to another server.
Also make sure your configuration of logstash points to all the node.data of your elasticsearch cluster.
You can also go further by installing kibana on says 3 nodes and attaching a load balancer to it. That way your load balancer will choose the instance of kibana that is healthy and display it.
UPDATE
With elasticsearch configured, we can configure logstash as follows:
output {
elasticsearch{
hosts => ["http://123.456.789.1:9200","http://123.456.789.2:9200"]
index => "indexname"
}
}
You don't need to add stdout { codec => rubydebug } in your configuration.
Hope this helps.

Related

How to configure filebeat for logstash cluster environment?

I am missing something very basic when I think of how Filebeat will be configured in a clustered logstash setup.
As per the article
https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
and this architecture diagram
I think that there is some kind of load balancer in front of the logstash cluster. However, the Filebeat output documentation suggests that there must be an array of all the Logstatsh nodes specified. Using this list of nodes, Filebeat will do the load balancing from the client-side.
Also as per this GitHub issue, there is no native logstash clustering available yet.
So, my question is, what kind of setup do I need to be able to point my multiple Filebeat to one logstash service endpoint without specifying the logstash nodes in the cluster?
Is it possible?
Would having load balancer in front of Logstash cluster be of any help?
Thanks,
Manish
Since the Logstash clustering feature is still in the works and you don't want to specify all the Logstash hosts inside all your Beats configurations, then the only solution I see is to use a TCP load balancer in front of Logstash.
All your Beats would point to that load balancer endpoint and you can manage your Logstash cluster behind that load balancer as you see fit. Be aware, though, that you're adding a hop (hence latency) between your Beats and your Logstash cluster.

How to gather logs to Elasticsearch

I have logs of web apps in different servers (many machines). How can I gather these logs in a system where I have Elastic search and Kibana installed. When I searched I only found tutorials that show setup where logs, logstash, beats, elasticsearch and kibana are all together.
Since you have many machines which produce logs, you need to setup ELK stack with Filebeat, Logstash, Elasticsearch and Kibana.
You need to setup filebeat instance in each machine.
It will listen to your log files in each machine and forward them to the logstash instance you would mention in filebeat.yml configuration file like below:
#=========================== Filebeat inputs =============================
filebeat.inputs:
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /path_to_your_log_1/ELK/your_log1.log
- /path_to_your_log_2/ELK/your_log2.log
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["private_ip_of_logstash_server:5044"]
Logstash server listens to port 5044 and stream all logs through logstash configuration files:
input {
beats { port => 5044 }
}
filter {
# your log filtering logic is here
}
output {
elasticsearch {
hosts => [ "elasticcsearch_server_private_ip:9200" ]
index => "your_idex_name"
}
}
In logstash you can filter and split your logs into fields and send them to elasticsearch.
Elasticsearch saves all the data we send through logstash in indexes.
All data in elasticsearch database can be readable through Kibana. We can create dashboards with many types of charts based on our data using kibana.
Below is the basic architecture for ELK with filebeat:
You need to install Filebeat first which collects logs from all the web servers.
After that need to pass logs from Filebeat -> Logstash.
In Logstash you can format and drop unwanted logs based on Grok pattern.
Forward logs from Logstash -> Elasticsearch for storing and indexing.
Connect Kibana with Elasticsearch to add Index and view logs in Matrix based on selected Index.
As mentioned in other answers you will need to install Filebeat on all of your instances to listen of your log file and ship the logs.
Your Filebeat configuration will depend on your log format (for example log4j) and where you want to ship it (for example: Kafka, Logstash, Elasticsearch).
Config example:
filebeat.inputs:
- type: log
paths:
- /var/log/system.log
multiline.pattern: '^\REGEX_TO_MATCH_YOUR_LOG_FORMAT'
multiline.negate: true
multiline.match: after
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "filebeat_internal"
password: "YOUR_PASSWORD"
Also Logstash is not mandatory if you don't want to use it, logs can be sent directly to Elasticsearch, but you will need to setup ingest pipeline in your Elasticsearch cluster to process incoming logs, more on ingest piplelines here.
Also one more useful link: Working With Ingest Pipelines In ElasticSearch And Filebeat
In order to grab all your web application logs you need to setup ELK stack. Right now you have elastic search setup which is just a database where all logs data are saved. In order to view those logs, you need Kibana which is UI and then you need Logstash and Filebeat to read those logs of your application and transfer it to Logstash or directly to Elasticsearch.
If you want a proper centralized logs system then I recommend you to use Logstash with Filebeat also. As you have different servers than on each server you install Filebeat and on your main server where you have kibana and Elasticsearch install Logstash and point all Filebeats to that server.
FileBeats are lightweight data shippers that we install as agents on servers to send specific types of operational data to Logstash and then logstash do the filter and send that logs data to elasticsearch.
Check How To Setup ELK follow the instruction on this website. Also, look
FileBeat + ELK Setup
You can use Splunk and Splunk forwarder to gather all the logs together.
Use Splunk forwarder in your web servers to forward all the logs to your centralized server which has Splunk.
If you don't want to add another tool to Elasticsearch and Kibana stack you can directly send logs to Elasticsearch, but you should be careful while constructing your pipeline to have more stable system.
To gather logs you can use python or another language but for python I would use this library:
https://elasticsearch-py.readthedocs.io/en/master/
There is also another medium tutorial for python:
https://medium.com/naukri-engineering/elasticsearch-tutorial-for-beginners-using-python-b9cb48edcedc
If you prefer other languages to push your logs to elasticsearch, for sure you can use them too. I just suggested python because I am more familiar with it and also you can use it to create a fast prototype before make it live product.

Logstash - is Pull possible?

We are trying to build up an ElasticSearch data collector. The ElasticSearch cluster should receive data from different servers. These servers are at other locations (and networks) than the ElasticSearch cluster. The clients are connected to the ElasticCluster via a one-way VPN connections.
As a first attempt we installed logstash on each client server to collect the data, filter it and send it to the ElasticCluster. So far it was no problem in a test environment. The problem is now that the LogStash from the client tries to establish a connection to ElasticSearch. However, this attempt is blocked by the firewall. It is however possible to open a connection from the ElasticCluster side to each client and receive the data. What we need is a way to get the data from LogStash so that we open a connection and pull the data from LogStash (PULL). Is there a way to do this without changing the VPN configuration?
Logstash push events, if your logstash instances can't initiate the connection with the elasticsearch nodes, you will need something in the middle or allow the traffic on the firewall/VPN.
For example, you can have a elasticsearch to where the logstash servers can push data and then another logstash in your main cluster environment where you will have a pipeline in which the input will be the elasticsearch in the middle, this way the data will be pulled from the elasticsearch.
edit:
As I've said in the comment, you need to have something like this image.
Here you have your servers sending data to a logstash instance, this logstash has an output to an elasticsearch instance, so it starts the connection pushing the data.
On your main cluster, where you have your elasticsearch cluster and an one way VPN that only can start a connection, you will have another logstash, this logstash will then have an input that will query the outside elasticsearch node, pulling the data.
In the logstash pipeline you can have a elasticsearch input, which queries a elasticsearch node, then send the data received to filters and outputs.
input {
elasticsearch { the elasticsearch in the middle }
}
filter {
your filters
}
output {
elasticsearch { your cluster nodes }
}
Is it clearly now?

What happens if logstash sends data to elasticsearch at a rate faster than it can index?

So I have multiple hosts with logstash installed on each host. Logstash on all these hosts reads from the log files generated by the host and sends data to my single aws elasticsearch cluster.
Now considering a scenario where large quantities of logs are being generated by each host at the same time. Since logstash is installed on each host and it just forwards the data to the es cluster I assume that even if my elasticsearch cluster is not able to index it, my hosts won't be affected. Are the logs just loss in such a scenario?
Can my host machines get affected in any way?
In short, you may lose some logs on the host machines, and that's why messaging solutions like kafka are used https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html#deploying-message-queueing

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Resources