how to configure filebeat configured as agent in kubernetes cluster - elasticsearch

I am trying to add ELK to my project which is running on kubernetes. I want to pass by filebeat -> logstach then elastic search. I prepared my filebeat.yml file and in my company the filebeat is configured as an agent in the cluster which i don't realy know what it means? I want to know how to configure the filebeat in this case ? just adding the file in the project and it will be taken into considiration once the pod started or how does it work ?

You can configure the Filebeat in some ways.
1 - You can configure it using the DeamonSet, meaning each node of your Kubernetes architecture will have one POD of Filebeat. Usually, in this architecture, you'll need to use only one filebeat.yaml configuration file and set the inputs, filters, outputs (output to Logstash, Elasticsearch, etc.), etc. In this case, your filebeat will need root access inside your cluster.
2 - Using Filebeat as a Sidecar with your application k8s resource. You can configure an emptyDir in the Deployment/StatefulSet, share it with the Filebeat Sidecar, and set the Filebeat to monitor this directory.

Related

toggle specific plugin in FluentD (through k8s manifest file)

I have an EFK stack that outputs EKS cluster logs to both Elasticsearch and S3. I wonder if there's a way to add a switch to enable/disable outputs to S3 .. maybe using an ENV in the FluentD manifest file. Would appreciate help if anyone knows how to implement this feature.
P.S: can share files as needed

How to gather logs to Elasticsearch

I have logs of web apps in different servers (many machines). How can I gather these logs in a system where I have Elastic search and Kibana installed. When I searched I only found tutorials that show setup where logs, logstash, beats, elasticsearch and kibana are all together.
Since you have many machines which produce logs, you need to setup ELK stack with Filebeat, Logstash, Elasticsearch and Kibana.
You need to setup filebeat instance in each machine.
It will listen to your log files in each machine and forward them to the logstash instance you would mention in filebeat.yml configuration file like below:
#=========================== Filebeat inputs =============================
filebeat.inputs:
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /path_to_your_log_1/ELK/your_log1.log
- /path_to_your_log_2/ELK/your_log2.log
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["private_ip_of_logstash_server:5044"]
Logstash server listens to port 5044 and stream all logs through logstash configuration files:
input {
beats { port => 5044 }
}
filter {
# your log filtering logic is here
}
output {
elasticsearch {
hosts => [ "elasticcsearch_server_private_ip:9200" ]
index => "your_idex_name"
}
}
In logstash you can filter and split your logs into fields and send them to elasticsearch.
Elasticsearch saves all the data we send through logstash in indexes.
All data in elasticsearch database can be readable through Kibana. We can create dashboards with many types of charts based on our data using kibana.
Below is the basic architecture for ELK with filebeat:
You need to install Filebeat first which collects logs from all the web servers.
After that need to pass logs from Filebeat -> Logstash.
In Logstash you can format and drop unwanted logs based on Grok pattern.
Forward logs from Logstash -> Elasticsearch for storing and indexing.
Connect Kibana with Elasticsearch to add Index and view logs in Matrix based on selected Index.
As mentioned in other answers you will need to install Filebeat on all of your instances to listen of your log file and ship the logs.
Your Filebeat configuration will depend on your log format (for example log4j) and where you want to ship it (for example: Kafka, Logstash, Elasticsearch).
Config example:
filebeat.inputs:
- type: log
paths:
- /var/log/system.log
multiline.pattern: '^\REGEX_TO_MATCH_YOUR_LOG_FORMAT'
multiline.negate: true
multiline.match: after
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "filebeat_internal"
password: "YOUR_PASSWORD"
Also Logstash is not mandatory if you don't want to use it, logs can be sent directly to Elasticsearch, but you will need to setup ingest pipeline in your Elasticsearch cluster to process incoming logs, more on ingest piplelines here.
Also one more useful link: Working With Ingest Pipelines In ElasticSearch And Filebeat
In order to grab all your web application logs you need to setup ELK stack. Right now you have elastic search setup which is just a database where all logs data are saved. In order to view those logs, you need Kibana which is UI and then you need Logstash and Filebeat to read those logs of your application and transfer it to Logstash or directly to Elasticsearch.
If you want a proper centralized logs system then I recommend you to use Logstash with Filebeat also. As you have different servers than on each server you install Filebeat and on your main server where you have kibana and Elasticsearch install Logstash and point all Filebeats to that server.
FileBeats are lightweight data shippers that we install as agents on servers to send specific types of operational data to Logstash and then logstash do the filter and send that logs data to elasticsearch.
Check How To Setup ELK follow the instruction on this website. Also, look
FileBeat + ELK Setup
You can use Splunk and Splunk forwarder to gather all the logs together.
Use Splunk forwarder in your web servers to forward all the logs to your centralized server which has Splunk.
If you don't want to add another tool to Elasticsearch and Kibana stack you can directly send logs to Elasticsearch, but you should be careful while constructing your pipeline to have more stable system.
To gather logs you can use python or another language but for python I would use this library:
https://elasticsearch-py.readthedocs.io/en/master/
There is also another medium tutorial for python:
https://medium.com/naukri-engineering/elasticsearch-tutorial-for-beginners-using-python-b9cb48edcedc
If you prefer other languages to push your logs to elasticsearch, for sure you can use them too. I just suggested python because I am more familiar with it and also you can use it to create a fast prototype before make it live product.

Need to ship logs to elastic from EKS

We have an EKS cluster running and we are looking for best practices to ship application logs from pods to Elastic.
In the EKS workshop there is an option to ship the logs to cloudwatch and then to Elastic.
Wondered if there is an option to ship the logs directly to Elastic, or to understand best practices.
Additional requirement:
We need the logs to determine from which namespace the logs is coming from and to deliver a dedicated index
You can deploy EFK stack in kubernetes cluster. Follow the reference --> https://github.com/acehko/kubernetes-examples/tree/master/efk/production
Fluentd would be deployed as DaemonSet so that one replica is run on each node collecting the logs from all pods and push them to elasticsearch

How to change default GKE stackdriver logging to fluentd

My GKE cluster is currently in default settings and is logging to stack driver. However, I would like to be able to log to elastic stack that I am deploying at elastic.co.
https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd
I see that I am able to customize filtering and parsing of default fluentd daemonset but how do I install elasticsearch output plugin so that I can stream logs to my elasticsearch endpoint instead of Stackdriver?
The tutorial you linked to answers your question. You need to create a GKE cluster without the built-in fluentd (by passing the --no-enable-cloud-logging flag when creating the cluster) and then install a custom daemon set with the fluentd configuration you want to use.

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Resources