For the purpose of log file aggregation, I'm looking to setup a production Elasticsearch instance on an on-premise (vanilla) Kubernetes cluster.
There seems to be two main options for deployment:
Elastic Cloud (ECK) - https://github.com/elastic/cloud-on-k8s
Helm Charts - https://github.com/elastic/helm-charts
I've used the old (soon to be deprecated) helm charts successfully but just discovered ECK.
What are the benefits and disadvantages of both of these options? Any constraints or limitations that could impact long-term use?
The main difference is that the Helm Charts are pretty unopinionated while the Operator is opinionated — it has a lot of best practices built in like a hard requirement on using security. Also the Operator Framework is built on the reconcilliation loop and will continuously check if your cluster is in the desired state or not. Helm Charts are more like a package manager where you run specific commands (install a cluster in version X with Y nodes, now add 2 more nodes, now upgrade to version Z,...).
If ECK is Cloud-on-Kubernetes, you can think of the Helm charts as Stack-on-Kubernetes. They're a way of defining exact specifications running our Docker images in a Kubernetes environment.
Another difference is that the Helm Charts are open source while the Operator is free, but uses the Elastic License (you can't use it to run a paid Elasticsearch service is the main limitation).
1. Elastic Cloud (ECK):
ADVANTAGES
document oriented (JSON)
multilingual - the ICU plugin is used to index and tokenize
multilingual content which is an elasticsearch plugin based on the
lucene implementation of the unicode text segmentation standard
managing and monitoring multiple clusters
upgrading to new stack versions with ease
scaling cluster capacity up and down
changing cluster configuration
dynamically scaling local storage (includes Elastic Local Volume, a
local storage driver)
scheduling backups
secure by default - have encryption enabled and are protected with a
strong default password right at creation time
free features - Canvas, Maps, Uptime
hot-warm-cold and custom topologies
official GKE support
free tier
DISADVANTAGES
it is not as good at being a data store as some other options like
MongoDB, Hadoop, etc. For smaller use cases, it will perform fine. If
you are streaming TB’s of data every day, you will find that it
either chokes or loses data
it’s learning curve is much
steeper
when you can’t or won’t create a production-worthy setup because of
economics. For test and dev, a single node will work fine. When you
move to production, you should have no less than a 3-node/2-replica
More information you can find here: ECK.
2. Elastic Stack Kubernetes Helm Charts:
ADVANTAGES
huge community
easy to deploy and use in Kubernetes
each component in the stack takes care of a different step in the
logging pipeline, and together, they all provide a comprehensive and
powerful logging solution for Kubernetes
rich analysis capabilities
DISADVANTAGES
difficult to maintain at scale
More information you can find here: open-source-monitoring-tools-for-kubernetes.
Related
How to view application specific logs while running services using docker-compose, without getting into each of the containers. We have microservices running in Rails, Python, Java in a single docker-compose environment. What would be a cost effective open source solution which we can use for monitoring + searching logs by the Operations team. We would want to avoid Elasticsearch for this as we don't have a big budget, appreciate your inputs
Elastic search provides free tier as well. ELK - subscriptions. You can use BASIC - FREE AND OPEN
You can use easily set up logging infrastructure using
ELK - Elastic Search, Logstash, Kibana
filebeat - Log shipper for docker containers - filebeat
metricbeat - metricbeat for docker - containers
The infrastructure would scale irrespective of how many containers you have.
You can check out some basic monitoring and logging examples here - link
As well as the Free license mentioned in the other answer, most Elastic tools are available in apache-licensed OSS versions.
Beats agents mostly support autodiscovery in docker and docker-compose, making them really easy to use on an ongoing basis, even with short-lived containers.
It would help if you specify whether the budget constraints are around a) licensing costs, b) time and effort for your Operations team, or c) something else.
I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.
I am trying to deploy production grade Elasticsearch 6.3.0 on Kubernetes.
Came across few articles, but still not sure what is the best approach to go with.
https://github.com/pires/kubernetes-elasticsearch-cluster
It doesn't use stateful set.
https://anchormen.nl/blog/big-data-services/elastic-search-deployment-kubernetes/
This is pretty old.
Using elastic search for App search.
Images from Elasticsearch are
docker pull docker.elastic.co/elasticsearch/elasticsearch:6.3.0
docker pull docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
I would like to go with -oss image and it is the core Apache one.
Is there any good documentation on setting up production grade 6.3.0 version on Kubernetes.
One of the most promising new developments for running Elasticearch on Kubernetes is the Elasticsearch Operator.
Kubernetes Operators allow for more sophistication when it comes to dealing with the requirements of complex tools (and Elasticsearch is definitely one). Especially when considering the need to avoid losing Elasticsearch data, an operator is the way to go.
We have been exploring Apache Ambari with HDP 2.2 to setup a cluster. Our backend features three environments: testing, staging and production which is a standard practice in our industry.
When we would deploy a cluster in the testing environment with Ambari, what is the easiest way to have the same cluster configuration on the staging, and later, production environment ?
The initial step seems easy: you create a cluster in the testing environment using the UI and then you export the configuration as a blueprint. Subsequently, you use the exported blueprint to create a new cluster in the other environments. So far, so good.
Inevitably, we will need to change our Ambari configuration (e.g. deploy a new service, increase heap size for the JVM's,...). I was hoping we could just update the blueprint (using the UI or by hand) and then use the updated blueprint to also update the different clusters. However, this seems not possible unless you destroy and recreate the cluster which seems a bit harsh.. (we don't want to lose our data) ?
Alternatively we could use the REST API of Ambari to do specific updates to the configuration but as configuration changes with respect to the initial blueprint will undoubtedly accumulate, this will prove unwieldy and unmaintainable over time, I am afraid.
Can you suggest us a better solution for this use case?
I believe the easiest way would be to dump each services configuration to a file. Then import each of those configurations into the other clusters. This could be done simply by using the Ambari API or by using the script provided by Ambari to update configurations (/var/lib/ambari-server/resources/scripts/configs.sh).
Would it be safe to share an elasticsearch cluster (or single-node elasticsearch cluster) between Logstash or graylog2 and my own application? what configuration changes/additions should be made for accomodating that? what kind of name-spacing would the application require for storing its own data in separation from graylog/Logstash?
I'd rather avoid maintaining separate clusters, especially on dev boxes but also in general - if the architecture allows.
It is technically possible but not recommended. You will experience load on the logging cluster that you want to decouple from the other applications using ES.
Graylog2 supports defining an index prefix for having multiple setups running in one ES cluster.
We have both(Kibana and Graylog) running with shared Elasticsearch. It's just that the indexing pattern is something different we have to add Circuit breakers in Elasticsearch so that Kibana search query for logs would not expand beyond a certain size.