How to configure elasticsearch snapshots using persistent volumes as the "shared file system repository" in Kubernetes(on GCP)? - elasticsearch

I have registered the snapshot repository and have been able to create snapshots of the cluster for a pod. I have used a mounted persistent volume as the "shared file system repository" as the backup storage.
However in a production cluster with multiple nodes, it is required that the shared file system is mounted for all the data and master nodes.
Hence I would have to mount the persistent volume for the data nodes and the master nodes.
But Kubernetes persistent volumes don't have a "read write many" option. So can't mount it on all the nodes and hence am unable to register the snapshot repository. Is there a way to use persistent volumes as the backup snapshot storage for a production elastic search cluster in Google Kubernetes Engine?

Reading this, I guess that you are using a cluster created on your own and not GKE, since you cannot install agents on master nodes and workers will get recreated whenever there is a node pool update. Please make this clear since it can be misleading.
There are multiple volumes that allow multiple readers, such as cephfs, glusterfs and nfs. You can take a look at the different volume types on this

Related

change persistent disk type to ssd

I have an elasticsearch running as a ECK on a GKE cluster for production purposes and in order to increase its performance I'm thinking of changing the persistent disk type to ssd. I came accross solutions that incite the need to create a snapshot of the disk in GCE and then create another ssd disk with the data stored in the snapshot. I'm still concerned whether it still has a risk of data loss and if I create another disk will my elastic be able to match it or not as it is statefulset.
Since this is a production deployment I would advise to do as follows:
Create a volume snapshot (doc).
Set up a secondary cluster (doc).
Modify the deployment so that it uses an SSD (doc).
Deploy to the second cluster.
Once this new deployment has been fully tested you can switch over the traffic.

How can I mount a hostpath into a statefulset?

How can I mount a hostPath into each pod in a statefulset when I don't know the names of the nodes in advance (so can't pre-create a PV on each node)?
I want to set up an elasticsearch cluster on a number of nodes, mounting each elasticsearch data directory onto the SSD of the host node...
How can I accomplish this with a statefulset?
Instead of a HostPath Volume, you should use a Local Persistent Volume for this kind of use cases.
The biggest difference is that the Kubernetes scheduler understands which node a Local Persistent Volume belongs to. With HostPath volumes, a pod referencing a HostPath volume may be moved by the scheduler to a different node resulting in data loss. But with Local Persistent Volumes, the Kubernetes scheduler ensures that a pod using a Local Persistent Volume is always scheduled to the same node.
Consider using local static provisioner for this, it has instructions for Baremetal environments.

How to run stateful applications in Apache Mesos?

How can stateful containers be run inside Mesos?
According to the Mesos documentation sandbox can be used to store state:
With the introduction of persistent volumes, executors and tasks
should never create files outside of the sandbox.
At the same time Sandbox files are scheduled for garbage collection when:
An executor is removed or terminated.
A framework is removed.
An executor is recovered unsuccessfully during agent recovery.
Is this the only way? Or can docker containers be used to maintain state (in a similar manner to a VM)?
So for example, can a container be created and run across 2 nodes? Can such a container contain state and not be disposed of after the task is completed?
The key statement in that quote from the Mesos documentation is
With the introduction of persistent volumes...
You're correct that sandboxes can be garbage collected. However, Mesos provides a primitive called persistent volumes which allows you to create volumes that will persist across task failures and agent restarts and will not be garbage collected.
Additionally, Mesos also now provides support for network storage via the Docker volume isolator. This allows you to mount network volumes using Docker volume drivers, which enables the use of a wide variety of storage back-ends.
Docker containers can store persistent state, but they must do so in either a Mesos persistent volume or a network-attached volume via the Docker volume isolator. These volumes live outside the Docker container and are mounted into the container, so they persist after the container has died.
Mesos tasks cannot be run across multiple nodes. Note that it would be possible for multiple tasks on different nodes to access the same network-attached volume via the Docker volume isolator, provided the back-end storage provider supports concurrent access.

How Do I Make A Persistent Volume Accessible to Multiple Kubernetes Pods?

I've done quite a bit of research and have yet to find an answer to this. Here's what I'm trying to accomplish:
I have an ELK stack container running in a pod on a k8s cluster in GCE - the cluster also contains a PersistentVolume (format: ext4) and a PersistentVolumeClaim.
In order to scale the ELK stack to multiple pods/nodes and keep persistent data in ElasticSearch, I either need to have all pods write to the same PV (using the node/index structure of the ES file system), or have some volume logic to scale up/create these PVs/PVCs.
Currently what happens is if I spin up a second pod on the replication controller, it can't mount the PV.
So I'm wondering if I'm going about this the wrong way, and what is the best way to architect this solution to allow for persistent data in ES when my cluster/nodes autoscale.
Persistent Volumes have access semantics. on GCE I'm assuming you are using a Persistent Disk, which can either be mounted as writable to a single pod or to multiple pods as read-only. If you want multi writer semantics, you need to setup Nfs or some other storage that let's you write from multiple pods.
In case you are interested in running NFS - https://github.com/kubernetes/kubernetes/blob/release-1.2/examples/nfs/README.md
FYI: We are still working on supporting auto-provisioning of PVs as you scale your deployment. As of now it is a manual process.

Does elasticsearch need a persistent storage when deployed on kubernetes?

In the Kubernetes example of Elasticsearch production deployment, there is a warning about using emptyDir, and advises to "be adapted according to your storage needs", which is linked to the documentation of persistent storage on Kubernetes.
Is it better to use a persistent storage, which is an external storage for the node, and so needs (high) I/O over network, or can we deploy a reliable Elasticsearch using multiple data nodes with local emptyDir storage?
Context: We're deploying our Kubernetes on commodity hardware, and we prefer not to use SAN for the storage layer (because it doesn't seem like commodity).
The warning is so that folks don't assume that using emptyDir provides a persistent storage layer. An emptyDir volume will persist as long as the pod is running on the same host. But if the host is replaced or it's disk becomes corrupted, then all data would be lost. Using network mounted storage is one way to work around both of these failure modes. If you want to use replicated storage instead, that works as well.

Resources