I am contemplating setting up an ELK (ElasticSearch, LogStash and Kibana) stack on AWS using Docker images. But I am unsure about performance and persistent storage.
If I just deploy the docker images to the EC2 Container service with my configuration, then I guess I need to also point to a place for persistent storage for both LogStash and ElasticSearch. Is S3 storage fast enough, or does that even matter when I am talking about logs. I am pretty sure I can live with some minutes delay on the indexing, but using Kibana, I would like to get data reasonably fast.
Is this a viable solution for a production setup with a couple of gigs worth of logs daily. I expect the log volume to rise once we see the value of this and start logging more to get more insight.
So:
Is it fast enough to use S3 for storage of log files?
Is it a viable solution for a production site that produces 5+ gigs of data a day?
You might take a look at AWS Elasticsearch Service. It's Elastic Search and Kibana as a service on AWS that you don't have to manually manage. I've just started using it for application-level events that my (desktop app) users are voluntarily reporting, and it's been really useful.
Related
My current set-up:
I have an aws ec2 instance for monitoring services which runs dockerized grafana(grafana:8.3.4) and loki(loki:2.5.0). Logs from multiple other services running on other ec2 instances are being sent to this loki instance by dockerized promtail running on those other ec2 instances. Right now I'm using boltdb and filesystem as storage, so the data will be stored inside the container and I'm volume persisting the /loki/data folder inside the container to local filesystem so that I dont lose any data on container restart.
What I'm looking for:
Is it possible to rotate the data when I hit disk usage limit on the ec2 instance, for example move the old loki data to a remote storage like AWS S3 and then loki will continue to use filesystem as storage, and in any case that I want to browse the older logs I just copy the older loki data from S3 onto the loki instance filesystem so that I can browse them? If this is not possible is there another way to rotate the loki data for safe consumption of it later.
Is it also possible to push old logs to loki? For example I've started the grafana-loki service today. But my services have been running and generating logs for a month. So is it possible to push those older logs with their appropriate timestamps to loki?
As title, I need to get data from nginx access log to handle and store in db. So anyone have any ideas about this ? Thank you for reading this post
You should not be storing nginx logs in the DB and trying to read them through Laravel, it will very quickly cause you performance and storage issues especially on production. Other issues will be if you have various servers, how would you aggregate all the logs?
Common practice is to use NoSQL for such tasks. So you can setup another dedicated server where you export all your logs and analyze them. You use an exporter that you install on every one of your servers, point them to your log file and they export the logs to a central logs server. You can set this up yourself using something like ELK stack. With ELK stack you can use filebeat and logstash for this.
Better would be to use some of the services out there such as GCP logging, splunk, etc. You have to pay for them but they offer a lot of benefits. Splunk would provide you with an exporter, with gcp you could use fluentd. If you are using containers, you can also setup a fluentd container and shared volumes to export the logs.
How to view application specific logs while running services using docker-compose, without getting into each of the containers. We have microservices running in Rails, Python, Java in a single docker-compose environment. What would be a cost effective open source solution which we can use for monitoring + searching logs by the Operations team. We would want to avoid Elasticsearch for this as we don't have a big budget, appreciate your inputs
Elastic search provides free tier as well. ELK - subscriptions. You can use BASIC - FREE AND OPEN
You can use easily set up logging infrastructure using
ELK - Elastic Search, Logstash, Kibana
filebeat - Log shipper for docker containers - filebeat
metricbeat - metricbeat for docker - containers
The infrastructure would scale irrespective of how many containers you have.
You can check out some basic monitoring and logging examples here - link
As well as the Free license mentioned in the other answer, most Elastic tools are available in apache-licensed OSS versions.
Beats agents mostly support autodiscovery in docker and docker-compose, making them really easy to use on an ongoing basis, even with short-lived containers.
It would help if you specify whether the budget constraints are around a) licensing costs, b) time and effort for your Operations team, or c) something else.
I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.
I'm little confused here. Please help me out.
I have a spring-boot application which feeds the data into elasticsearch. This spring-boot runs on AWS instance. Right now, I do not have proper log aggregation and I want to use ELK stack for it.
Please help me out with these concerns...
Can I make a new log cluster on the same elasticsearch instance and feed the log data into it? Is it a good idea?
Should I use a different elasticsearch instance on the same machine with different port and direct all the log traffic to this instance?
Should I host my elasticsearch onto a new aws server and direct all the traffic? Will latency cause problems on later stages when the log data feed is huge?
The set of questions you've asked will have broad and varied answers depending on factors such as volume, velocity and capacity.
Think of an ES cluster as a database. If you have multiple log files/sources, you can insert them into different indexes on the same cluster.