My current set-up:
I have an aws ec2 instance for monitoring services which runs dockerized grafana(grafana:8.3.4) and loki(loki:2.5.0). Logs from multiple other services running on other ec2 instances are being sent to this loki instance by dockerized promtail running on those other ec2 instances. Right now I'm using boltdb and filesystem as storage, so the data will be stored inside the container and I'm volume persisting the /loki/data folder inside the container to local filesystem so that I dont lose any data on container restart.
What I'm looking for:
Is it possible to rotate the data when I hit disk usage limit on the ec2 instance, for example move the old loki data to a remote storage like AWS S3 and then loki will continue to use filesystem as storage, and in any case that I want to browse the older logs I just copy the older loki data from S3 onto the loki instance filesystem so that I can browse them? If this is not possible is there another way to rotate the loki data for safe consumption of it later.
Is it also possible to push old logs to loki? For example I've started the grafana-loki service today. But my services have been running and generating logs for a month. So is it possible to push those older logs with their appropriate timestamps to loki?
Related
I've done quite a bit of research and have yet to find an answer to this. Here's what I'm trying to accomplish:
I have an ELK stack container running in a pod on a k8s cluster in GCE - the cluster also contains a PersistentVolume (format: ext4) and a PersistentVolumeClaim.
In order to scale the ELK stack to multiple pods/nodes and keep persistent data in ElasticSearch, I either need to have all pods write to the same PV (using the node/index structure of the ES file system), or have some volume logic to scale up/create these PVs/PVCs.
Currently what happens is if I spin up a second pod on the replication controller, it can't mount the PV.
So I'm wondering if I'm going about this the wrong way, and what is the best way to architect this solution to allow for persistent data in ES when my cluster/nodes autoscale.
Persistent Volumes have access semantics. on GCE I'm assuming you are using a Persistent Disk, which can either be mounted as writable to a single pod or to multiple pods as read-only. If you want multi writer semantics, you need to setup Nfs or some other storage that let's you write from multiple pods.
In case you are interested in running NFS - https://github.com/kubernetes/kubernetes/blob/release-1.2/examples/nfs/README.md
FYI: We are still working on supporting auto-provisioning of PVs as you scale your deployment. As of now it is a manual process.
In the Kubernetes example of Elasticsearch production deployment, there is a warning about using emptyDir, and advises to "be adapted according to your storage needs", which is linked to the documentation of persistent storage on Kubernetes.
Is it better to use a persistent storage, which is an external storage for the node, and so needs (high) I/O over network, or can we deploy a reliable Elasticsearch using multiple data nodes with local emptyDir storage?
Context: We're deploying our Kubernetes on commodity hardware, and we prefer not to use SAN for the storage layer (because it doesn't seem like commodity).
The warning is so that folks don't assume that using emptyDir provides a persistent storage layer. An emptyDir volume will persist as long as the pod is running on the same host. But if the host is replaced or it's disk becomes corrupted, then all data would be lost. Using network mounted storage is one way to work around both of these failure modes. If you want to use replicated storage instead, that works as well.
I am contemplating setting up an ELK (ElasticSearch, LogStash and Kibana) stack on AWS using Docker images. But I am unsure about performance and persistent storage.
If I just deploy the docker images to the EC2 Container service with my configuration, then I guess I need to also point to a place for persistent storage for both LogStash and ElasticSearch. Is S3 storage fast enough, or does that even matter when I am talking about logs. I am pretty sure I can live with some minutes delay on the indexing, but using Kibana, I would like to get data reasonably fast.
Is this a viable solution for a production setup with a couple of gigs worth of logs daily. I expect the log volume to rise once we see the value of this and start logging more to get more insight.
So:
Is it fast enough to use S3 for storage of log files?
Is it a viable solution for a production site that produces 5+ gigs of data a day?
You might take a look at AWS Elasticsearch Service. It's Elastic Search and Kibana as a service on AWS that you don't have to manually manage. I've just started using it for application-level events that my (desktop app) users are voluntarily reporting, and it's been really useful.
I have cron-jobs which will create hive instances using elastic-mapreduce to run my queries to process raw-data from amazon s3 folder and push the data to the database.
After the cron-job runs, it will create an amazon ec2 instance in amazon vm. If there are any errors in the job the logs are stored in the vm and will vanish once the ec2 instance is shut down. Is there any way to send those logs to email or to any other server where I can investigate the logs even after the vm is shut down,.
You can have a look at log rotate, it can zip your log files periodically and you can choose which log file u want to rotate and also you can send the zip file to a desired email also. http://geekospace.com/using-logrotate-to-rotate-log-files/ (Disclaimer: I wrote this ariticle)
Its a pretty quick question - I have setup a pretty simple LAMP based website on EC2. I created an EBS and mounted it to the instance where I'm saving all the mysql data and other backups.
Now in order to connect to the instance - I use WINSCP and use the Elastic IP from where I can view all the data.
Now my question is - say I terminate the instance - the backup data and mysql data which resides on the EBS will still be available right. So how can I access this data.
I mean using WINSCP and the same Elastic IP, I wont be able to connect anymore as the instance is terminated - so how can access the data stored on EBS.
Sorry for the ignorant question but just starting to play with EC2
Thanks
I'm assuming you've created an EBS-backed instance and added to that (attached) a further EBS volume as a chunk of extra storage. In which case, when you terminate the instance, the boot EBS volume is released and deleted, but attached EBS storage is only released - it remains in the 'Available' state after the instance has been destroyed and its' data contents are left intact. You can then access whatever is on it by simply attaching it to another running instance.