Can I use a single elasticsearch/kibana for multiple k8 clusters? - elasticsearch

Do you know of any gotcha's or requirements that would not allow using a single ES/kibana as a target for fluentd in multiple k8 clusters?
We are engineering rolling out a new kubernetes model. I have requirements to run multiple kubernetes clusters, lets say 4-6. Even though the workload is split in multiple k8 clusters, I do not have a requirement to split the logging and believe it would be easier to find the logs for pods in all clusters in a centralized location. Also less maintenance for kibana/elasticsearch.
Using EFK for Kubernetes, can I point Fluentd from multiple k8 clusters at a single ElasticSearch/Kibana? I don't think I'm the first one with this thought however I haven't been able to find any discussion of doing this. Found lots of discussions of setting up efk but all that I have found only discuss a single k8 to its own elasticsearch/kibana.
Has anyone else gone down the path of using a single es/kibana to service logs from multiple kubernetes clusters? We'll plunge ahead with testing it out but seeing if anyone else has already gone down this road.

I dont think you should create an elastic instance for each kubernetes cluster, you can run a main elastic instance and index it all logs.
But even if you don`t have an elastic instance for each kubernetes client, i think you sohuld have a drp, so lets says instead moving your logs of all pods to elastic directly, maybe move it to kafka, and then split it to two elastic clusters.
Also it is very depend on the use case, if every kubernetes cluster is on different regions, and you need the pod`s logs in low latency (<1s), so maybe one elastic instance is not the right answer.

Based on [1] we can read:
Fluentd collects logs from pods running on cluster nodes, then routes
them to a central​​​​​​ized Elasticsearch.
Then Elasticsearch ingests these logs from Fluentd and stores them in a central location. It is also used to efficiently search text files.
Kibana is the UI; the user can visualize the collected logs and metrics and create custom dashboards based on queries.
There are several ways in which they can solve your dilemma:
a) Create a centralized dashboard and use each cluster’s Elasticsearch as backend. So you can see all your clusters logs in one place.
b) Create an Elasticsearch cluster and add each Elasticsearch into it. This is NOT the best option since you will duplicate your data several times, you will need to handle each index shards and you will need to fight with the split brain dilemma but it’s great for data resiliency.
c) Use another solution like an APM (New Relic, Instana, etc) to fully centralize your logs in one place.
[1] https://techbeacon.com/enterprise-it/9-top-open-source-tools-monitoring-kubernetes

Related

Unknow source of daily clean up of indices

I have two separate elastic clusters, each one of elastic node is docker container, which live in docker swarm. I aggregate logs from various microservices in indices, and one of them is in format "logs-timestamp".
In one of cluster I have those indices from previous days, in other one I have only from present day.
This affect only those ones in "logs-timestamp" format.
Do you have any idea? or point from I can start to lookup?
Does elastic has some form of builtin garbage collector?
Ps. I didn't start this project so basiclly I have quite small knowledge about whole infrastructure.
You should check the ILM policies documentation (here) which is one way of automatically removing old indices.
In short, check the result of this command in kibana
GET _ilm/policy
It will tell you if you have some policy configured.
The other way I know for automatic indices curation is Curator ( see here and here). You should check if Curator is installed somewhere in your infrastructure and check the configuration.
Hope it helps.

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

Can I run multiple instances of elasticsearch on same machine for log aggregation?

I'm little confused here. Please help me out.
I have a spring-boot application which feeds the data into elasticsearch. This spring-boot runs on AWS instance. Right now, I do not have proper log aggregation and I want to use ELK stack for it.
Please help me out with these concerns...
Can I make a new log cluster on the same elasticsearch instance and feed the log data into it? Is it a good idea?
Should I use a different elasticsearch instance on the same machine with different port and direct all the log traffic to this instance?
Should I host my elasticsearch onto a new aws server and direct all the traffic? Will latency cause problems on later stages when the log data feed is huge?
The set of questions you've asked will have broad and varied answers depending on factors such as volume, velocity and capacity.
Think of an ES cluster as a database. If you have multiple log files/sources, you can insert them into different indexes on the same cluster.

Running multiple elasticsearch instances

I need to setup 2 Elasticsearch instances:
one for kibana logs (my separate application will throw logs at it)
one for search for my production application
My plan is to create a separate folders with elasticsearch in them. They dont talk to each other which means they are separate databases and if one goes down, the other still runs. Is this good solution or should I use only one elasticsearch folder with muliple elasticsearch.yaml configuration files? What is the best practice for multiple elasticsearch instances?
The best practice is to NOT run two Elasticsearch instances on the SAME server.
Your production search will probably need a lot of ram to work fast and stay responsive. You don't want your logging system interfere with that.

Sharing elasticsearch between Logstash/graylog2 and my own application

Would it be safe to share an elasticsearch cluster (or single-node elasticsearch cluster) between Logstash or graylog2 and my own application? what configuration changes/additions should be made for accomodating that? what kind of name-spacing would the application require for storing its own data in separation from graylog/Logstash?
I'd rather avoid maintaining separate clusters, especially on dev boxes but also in general - if the architecture allows.
It is technically possible but not recommended. You will experience load on the logging cluster that you want to decouple from the other applications using ES.
Graylog2 supports defining an index prefix for having multiple setups running in one ES cluster.
We have both(Kibana and Graylog) running with shared Elasticsearch. It's just that the indexing pattern is something different we have to add Circuit breakers in Elasticsearch so that Kibana search query for logs would not expand beyond a certain size.

Resources