How Do I Make A Persistent Volume Accessible to Multiple Kubernetes Pods? - elasticsearch

I've done quite a bit of research and have yet to find an answer to this. Here's what I'm trying to accomplish:
I have an ELK stack container running in a pod on a k8s cluster in GCE - the cluster also contains a PersistentVolume (format: ext4) and a PersistentVolumeClaim.
In order to scale the ELK stack to multiple pods/nodes and keep persistent data in ElasticSearch, I either need to have all pods write to the same PV (using the node/index structure of the ES file system), or have some volume logic to scale up/create these PVs/PVCs.
Currently what happens is if I spin up a second pod on the replication controller, it can't mount the PV.
So I'm wondering if I'm going about this the wrong way, and what is the best way to architect this solution to allow for persistent data in ES when my cluster/nodes autoscale.

Persistent Volumes have access semantics. on GCE I'm assuming you are using a Persistent Disk, which can either be mounted as writable to a single pod or to multiple pods as read-only. If you want multi writer semantics, you need to setup Nfs or some other storage that let's you write from multiple pods.
In case you are interested in running NFS - https://github.com/kubernetes/kubernetes/blob/release-1.2/examples/nfs/README.md
FYI: We are still working on supporting auto-provisioning of PVs as you scale your deployment. As of now it is a manual process.

Related

change persistent disk type to ssd

I have an elasticsearch running as a ECK on a GKE cluster for production purposes and in order to increase its performance I'm thinking of changing the persistent disk type to ssd. I came accross solutions that incite the need to create a snapshot of the disk in GCE and then create another ssd disk with the data stored in the snapshot. I'm still concerned whether it still has a risk of data loss and if I create another disk will my elastic be able to match it or not as it is statefulset.
Since this is a production deployment I would advise to do as follows:
Create a volume snapshot (doc).
Set up a secondary cluster (doc).
Modify the deployment so that it uses an SSD (doc).
Deploy to the second cluster.
Once this new deployment has been fully tested you can switch over the traffic.

Image pull over multiple K8s nodes

When I create a pod, a corresponding image is pulled to the node where the pod is created
Can I have those images shared among the cluster nodes, instead of being stored locally on each node?
Thanks a lot
Best Regards
It's possible if you have shared storage across all the Kubernetes nodes. However, it's not a good idea 🙅 since typically the place where images get stored is also the place where the container runtime stores its files when it's actually running the container. For example, if you are using Docker, everything gets stored under /var/lib/docker or in the case of containerd it's /var/lib/containerd
So in summary, it's possible with shared files/cluster file systems like NFS, Ceph, Glusterfs, AWS EFS, etc, but it's not a good idea in my opinion 🚫.
Update (#BMitch):
Make sure that the container storage driver you are using supports the filesystem that you are using.
✌️

Does elasticsearch need a persistent storage when deployed on kubernetes?

In the Kubernetes example of Elasticsearch production deployment, there is a warning about using emptyDir, and advises to "be adapted according to your storage needs", which is linked to the documentation of persistent storage on Kubernetes.
Is it better to use a persistent storage, which is an external storage for the node, and so needs (high) I/O over network, or can we deploy a reliable Elasticsearch using multiple data nodes with local emptyDir storage?
Context: We're deploying our Kubernetes on commodity hardware, and we prefer not to use SAN for the storage layer (because it doesn't seem like commodity).
The warning is so that folks don't assume that using emptyDir provides a persistent storage layer. An emptyDir volume will persist as long as the pod is running on the same host. But if the host is replaced or it's disk becomes corrupted, then all data would be lost. Using network mounted storage is one way to work around both of these failure modes. If you want to use replicated storage instead, that works as well.

High availability issue with rethinkdb cluster in kubernetes

I'm setting up rethinkdb cluster inside kubernetes, but it doesn't work as expected for high availability requirement. Because when a pod is down, kubernetes will creates another pod, which runs another container of the same image, old mounted data (which is already persisted on host disk) will be erased and the new pod will join the cluster as a brand new instance. I'm running k8s in CoreOS v773.1.0 stable.
Please correct me if i'm wrong, but that way it seems impossible to setup a database cluster inside k8s.
Update: As documented here http://kubernetes.io/v1.0/docs/user-guide/pod-states.html#restartpolicy, if RestartPolicy: Always it will restart the container if exits failure. It means by "restart" that it brings up the same container, or create another one? Or maybe because I stop the pod via command kubectl stop po so it doesn't restart the same container?
That's how Kubernetes works, and other solution works probably same way. When a machine is dead, the container on it will be rescheduled to run on another machine. That other machine has no state of container. Event when it is the same machine, the container on it is created as a new one instead of restarting the exited container(with data inside it).
To persistent data, you need some kind of external storage(NFS, EBS, EFS,...). In case of k8s, you may want to look into this https://github.com/kubernetes/kubernetes/blob/master/docs/design/persistent-storage.md This Github issue also has many information https://github.com/kubernetes/kubernetes/issues/6893
And in deed, that's the way to achieve HA in my opinion. Container are all stateless, they don't hold anything inside them. Any configuration needs for them should be store outside such as using thing like Consul or Etcd. By separating this like this, it's easier to restart a container
Try using PetSets http://kubernetes.io/docs/user-guide/petset/
That allows you to name your (pet) pods. If a pod is killed, then it will come back with the same name.
Summary of the petset feature is as follows.
Stable hostname
Stable domain name
Multiple pets of a similar type will be named with a "-n" (rethink-0,
rethink-1, ... rethink-n for example)
Persistent volumes
Now apps can cluster/peer together
When a pet pod dies, a new one will be started and will assume all the same "state" (including disk) of the previous one.

Strategy to persist the node's data for dynamic Elasticsearch clusters

I'm sorry that this is probably a kind of broad question, but I didn't find a solution form this problem yet.
I try to run an Elasticsearch cluster on Mesos through Marathon with Docker containers. Therefore, I built a Docker image that can start on Marathon and dynamically scale via either the frontend or the API.
This works great for test setups, but the question remains how to persist the data so that if either the cluster is scaled down (I know this is also about the index configuration itself) or stopped, and I want to restart later (or scale up) with the same data.
The thing is that Marathon decides where (on which Mesos Slave) the nodes are run, so from my point of view it's not predictable if the all data is available to the "new" nodes upon restart when I try to persist the data to the Docker hosts via Docker volumes.
The only things that comes to my mind are:
Using a distributed file system like HDFS or NFS, with mounted volumes either on the Docker host or the Docker images themselves. Still, that would leave the question how to load all data during the new cluster startup if the "old" cluster had for example 8 nodes, and the new one only has 4.
Using the Snapshot API of Elasticsearch to save to a common drive somewhere in the network. I assume that this will have performance penalties...
Are there any other way to approach this? Are there any recommendations? Unfortunately, I didn't find a good resource about this kind of topic. Thanks a lot in advance.
Elasticsearch and NFS are not the best of pals ;-). You don't want to run your cluster on NFS, it's much too slow and Elasticsearch works better when the speed of the storage is better. If you introduce the network in this equation you'll get into trouble. I have no idea about Docker or Mesos. But for sure I recommend against NFS. Use snapshot/restore.
The first snapshot will take some time, but the rest of the snapshots should take less space and less time. Also, note that "incremental" means incremental at file level, not document level.
The snapshot itself needs all the nodes that have the primaries of the indices you want snapshoted. And those nodes all need access to the common location (the repository) so that they can write to. This common access to the same location usually is not that obvious, that's why I'm mentioning it.
The best way to run Elasticsearch on Mesos is to use a specialized Mesos framework. The first effort is this area is https://github.com/mesosphere/elasticsearch-mesos. There is a more recent project, which is, AFAIK, currently under development: https://github.com/mesos/elasticsearch. I don't know what is the status, but you may want to give it a try.

Resources