How to view application specific logs while running services using docker-compose - elasticsearch

How to view application specific logs while running services using docker-compose, without getting into each of the containers. We have microservices running in Rails, Python, Java in a single docker-compose environment. What would be a cost effective open source solution which we can use for monitoring + searching logs by the Operations team. We would want to avoid Elasticsearch for this as we don't have a big budget, appreciate your inputs

Elastic search provides free tier as well. ELK - subscriptions. You can use BASIC - FREE AND OPEN
You can use easily set up logging infrastructure using
ELK - Elastic Search, Logstash, Kibana
filebeat - Log shipper for docker containers - filebeat
metricbeat - metricbeat for docker - containers
The infrastructure would scale irrespective of how many containers you have.
You can check out some basic monitoring and logging examples here - link

As well as the Free license mentioned in the other answer, most Elastic tools are available in apache-licensed OSS versions.
Beats agents mostly support autodiscovery in docker and docker-compose, making them really easy to use on an ongoing basis, even with short-lived containers.
It would help if you specify whether the budget constraints are around a) licensing costs, b) time and effort for your Operations team, or c) something else.

Related

Can I use a single elasticsearch/kibana for multiple k8 clusters?

Do you know of any gotcha's or requirements that would not allow using a single ES/kibana as a target for fluentd in multiple k8 clusters?
We are engineering rolling out a new kubernetes model. I have requirements to run multiple kubernetes clusters, lets say 4-6. Even though the workload is split in multiple k8 clusters, I do not have a requirement to split the logging and believe it would be easier to find the logs for pods in all clusters in a centralized location. Also less maintenance for kibana/elasticsearch.
Using EFK for Kubernetes, can I point Fluentd from multiple k8 clusters at a single ElasticSearch/Kibana? I don't think I'm the first one with this thought however I haven't been able to find any discussion of doing this. Found lots of discussions of setting up efk but all that I have found only discuss a single k8 to its own elasticsearch/kibana.
Has anyone else gone down the path of using a single es/kibana to service logs from multiple kubernetes clusters? We'll plunge ahead with testing it out but seeing if anyone else has already gone down this road.
I dont think you should create an elastic instance for each kubernetes cluster, you can run a main elastic instance and index it all logs.
But even if you don`t have an elastic instance for each kubernetes client, i think you sohuld have a drp, so lets says instead moving your logs of all pods to elastic directly, maybe move it to kafka, and then split it to two elastic clusters.
Also it is very depend on the use case, if every kubernetes cluster is on different regions, and you need the pod`s logs in low latency (<1s), so maybe one elastic instance is not the right answer.
Based on [1] we can read:
Fluentd collects logs from pods running on cluster nodes, then routes
them to a central​​​​​​ized Elasticsearch.
Then Elasticsearch ingests these logs from Fluentd and stores them in a central location. It is also used to efficiently search text files.
Kibana is the UI; the user can visualize the collected logs and metrics and create custom dashboards based on queries.
There are several ways in which they can solve your dilemma:
a) Create a centralized dashboard and use each cluster’s Elasticsearch as backend. So you can see all your clusters logs in one place.
b) Create an Elasticsearch cluster and add each Elasticsearch into it. This is NOT the best option since you will duplicate your data several times, you will need to handle each index shards and you will need to fight with the split brain dilemma but it’s great for data resiliency.
c) Use another solution like an APM (New Relic, Instana, etc) to fully centralize your logs in one place.
[1] https://techbeacon.com/enterprise-it/9-top-open-source-tools-monitoring-kubernetes

Elasticsearch on Kubernetes - 'Elastic Cloud (ECK)' vs 'Helm charts'

For the purpose of log file aggregation, I'm looking to setup a production Elasticsearch instance on an on-premise (vanilla) Kubernetes cluster.
There seems to be two main options for deployment:
Elastic Cloud (ECK) - https://github.com/elastic/cloud-on-k8s
Helm Charts - https://github.com/elastic/helm-charts
I've used the old (soon to be deprecated) helm charts successfully but just discovered ECK.
What are the benefits and disadvantages of both of these options? Any constraints or limitations that could impact long-term use?
The main difference is that the Helm Charts are pretty unopinionated while the Operator is opinionated — it has a lot of best practices built in like a hard requirement on using security. Also the Operator Framework is built on the reconcilliation loop and will continuously check if your cluster is in the desired state or not. Helm Charts are more like a package manager where you run specific commands (install a cluster in version X with Y nodes, now add 2 more nodes, now upgrade to version Z,...).
If ECK is Cloud-on-Kubernetes, you can think of the Helm charts as Stack-on-Kubernetes. They're a way of defining exact specifications running our Docker images in a Kubernetes environment.
Another difference is that the Helm Charts are open source while the Operator is free, but uses the Elastic License (you can't use it to run a paid Elasticsearch service is the main limitation).
1. Elastic Cloud (ECK):
ADVANTAGES
document oriented (JSON)
multilingual - the ICU plugin is used to index and tokenize
multilingual content which is an elasticsearch plugin based on the
lucene implementation of the unicode text segmentation standard
managing and monitoring multiple clusters
upgrading to new stack versions with ease
scaling cluster capacity up and down
changing cluster configuration
dynamically scaling local storage (includes Elastic Local Volume, a
local storage driver)
scheduling backups
secure by default - have encryption enabled and are protected with a
strong default password right at creation time
free features - Canvas, Maps, Uptime
hot-warm-cold and custom topologies
official GKE support
free tier
DISADVANTAGES
it is not as good at being a data store as some other options like
MongoDB, Hadoop, etc. For smaller use cases, it will perform fine. If
you are streaming TB’s of data every day, you will find that it
either chokes or loses data
it’s learning curve is much
steeper
when you can’t or won’t create a production-worthy setup because of
economics. For test and dev, a single node will work fine. When you
move to production, you should have no less than a 3-node/2-replica
More information you can find here: ECK.
2. Elastic Stack Kubernetes Helm Charts:
ADVANTAGES
huge community
easy to deploy and use in Kubernetes
each component in the stack takes care of a different step in the
logging pipeline, and together, they all provide a comprehensive and
powerful logging solution for Kubernetes
rich analysis capabilities
DISADVANTAGES
difficult to maintain at scale
More information you can find here: open-source-monitoring-tools-for-kubernetes.

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

What is a most benefit way to gather server hardware utilization, app logs, app jvm metrics, using Elastic-Stack?

Besides ELK standard goal for gathering application logs data i want to leverage this stack for advanced data collection such as JVM metrics (via JMX) and host's cpu/ram/disk/network utilization.
The most suitable one i thought is using metricbeat, but i doubt if metricbeat is enough for purposes described above.
Since i aiming at minimal stack of things to configure, will Metricbeat-Elasticsearch-Kibana be enough for collecting app logs,app jvm metrics,host's hardware utilization or there are some more suitable alternatives ?
UPDATE
Oh, i see now, that i need also filebeat besides metricbeat for gathering app logs.
Is there any out of the box single solution that combines filebeat and metricbeat agents ?
Currently Filebeat and Metricbeat are separate binaries and you need to run both:
Filebeat to collect your logs (and potentially parse them with Elasticsearch Ingest node).
Metricbeat with the system module for cpu/ram/disk/network and we also have a JMX / Jolokia module for that functionality.

ELK Stack using Docker on EC2 Container Service

I am contemplating setting up an ELK (ElasticSearch, LogStash and Kibana) stack on AWS using Docker images. But I am unsure about performance and persistent storage.
If I just deploy the docker images to the EC2 Container service with my configuration, then I guess I need to also point to a place for persistent storage for both LogStash and ElasticSearch. Is S3 storage fast enough, or does that even matter when I am talking about logs. I am pretty sure I can live with some minutes delay on the indexing, but using Kibana, I would like to get data reasonably fast.
Is this a viable solution for a production setup with a couple of gigs worth of logs daily. I expect the log volume to rise once we see the value of this and start logging more to get more insight.
So:
Is it fast enough to use S3 for storage of log files?
Is it a viable solution for a production site that produces 5+ gigs of data a day?
You might take a look at AWS Elasticsearch Service. It's Elastic Search and Kibana as a service on AWS that you don't have to manually manage. I've just started using it for application-level events that my (desktop app) users are voluntarily reporting, and it's been really useful.

Resources