Kubernetes - Apply pod affinity rule to live deployment - elasticsearch

I am guess I am just asking for confirmation really. As had some major issues in the past with our elastic search cluster on kubernetes.
Is it fine to add a pod affinity to rule to a already running deployment. This is a live production elastic search cluster and I want to pin the elastic search pods to specific nodes with large storage.
I kind of understand kubernetes but not really elastic search so dont want to cause any production issues/outages as there is no one around that could really help to fix it.
Currently running 6 replicas but want to reduce to 3 that run on 3 worker nodes with plenty of storage.
I have labelled my 3 worker nodes with the label 'priority-elastic-node=true'
This is podaffinity i will add to my yaml file and apply:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: priority-elastic-node
operator: In
values:
- "true"
topologyKey: "kubernetes.io/hostname"
What I assume will happen is nothing after I apply but then when I start scaling down the elastic node replicas the elastic nodes stay on the preferred worker nodes.

Any change to the pod template will cause the deployment to roll all pods. That includes a change to those fields. So it’s fine to change, but your cluster will be restarted. This should be fine as long as your replication settings are cromulent.

Related

How can I deploy a Elasticsearch on K8S with auto scale enabled?

I am planning to deploy Elasticsearch to K8S. I deployed a StatefulSet pods which has replicas: 3 configuration. That means there will be 3 pods deployed and I am hoping these 3 nodes will work as data nodes for Elasticsearch cluster.
But I got an error which is [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured.
It means I need to specify all three hosts ip or name on ES configuration. But I'd like to enable auto scale on the ES cluster which means I don't know all the node name when I deploy. I'd like to make ES cluster pick up new node automatically. How can I achieve that?

Can I use a single elasticsearch/kibana for multiple k8 clusters?

Do you know of any gotcha's or requirements that would not allow using a single ES/kibana as a target for fluentd in multiple k8 clusters?
We are engineering rolling out a new kubernetes model. I have requirements to run multiple kubernetes clusters, lets say 4-6. Even though the workload is split in multiple k8 clusters, I do not have a requirement to split the logging and believe it would be easier to find the logs for pods in all clusters in a centralized location. Also less maintenance for kibana/elasticsearch.
Using EFK for Kubernetes, can I point Fluentd from multiple k8 clusters at a single ElasticSearch/Kibana? I don't think I'm the first one with this thought however I haven't been able to find any discussion of doing this. Found lots of discussions of setting up efk but all that I have found only discuss a single k8 to its own elasticsearch/kibana.
Has anyone else gone down the path of using a single es/kibana to service logs from multiple kubernetes clusters? We'll plunge ahead with testing it out but seeing if anyone else has already gone down this road.
I dont think you should create an elastic instance for each kubernetes cluster, you can run a main elastic instance and index it all logs.
But even if you don`t have an elastic instance for each kubernetes client, i think you sohuld have a drp, so lets says instead moving your logs of all pods to elastic directly, maybe move it to kafka, and then split it to two elastic clusters.
Also it is very depend on the use case, if every kubernetes cluster is on different regions, and you need the pod`s logs in low latency (<1s), so maybe one elastic instance is not the right answer.
Based on [1] we can read:
Fluentd collects logs from pods running on cluster nodes, then routes
them to a central​​​​​​ized Elasticsearch.
Then Elasticsearch ingests these logs from Fluentd and stores them in a central location. It is also used to efficiently search text files.
Kibana is the UI; the user can visualize the collected logs and metrics and create custom dashboards based on queries.
There are several ways in which they can solve your dilemma:
a) Create a centralized dashboard and use each cluster’s Elasticsearch as backend. So you can see all your clusters logs in one place.
b) Create an Elasticsearch cluster and add each Elasticsearch into it. This is NOT the best option since you will duplicate your data several times, you will need to handle each index shards and you will need to fight with the split brain dilemma but it’s great for data resiliency.
c) Use another solution like an APM (New Relic, Instana, etc) to fully centralize your logs in one place.
[1] https://techbeacon.com/enterprise-it/9-top-open-source-tools-monitoring-kubernetes

Elasticsearch in production with kubernetes

I am working on product in which we are using elasticsearch for search. Our production setup is in K8S (1.7.7) and we are able to scale it pretty well. Only thing I am not sure about is whether we should be hosting elasticsearch in k8s (it can go on dedicated host as well using label selector nodes) or it is advisable to host elasticsearch on VM than docker.
Our data set size is 2-3 GB and would go further. But this is the benchmark we can consider.
And elasticsearch cluster I am planning to have ti is - 3 master (with 2 as eligible master), one client node, and one data node. We can scale datanode and client node as data increases.
Is anyone did this before? thanks in advance.
IMO the best resource for Elasticsearch on Kubernetes is https://github.com/pires/kubernetes-elasticsearch-cluster
Note that while there are official Docker containers, no official solution for orchestration is being provided at the moment. This is currently covered by the community only.
3 master (with 2 as eligible master)
This doesn't make much sense. You'll want 3 master eligible nodes with the setting discovery.zen.minimum_master_nodes: 2 and one of the 3 nodes will be the actual master.

how to sync up two ElasticSearch cluster

I need to setup a replicated ES clusterII in data centerII, the ES clusterII just need to sync up with ES clusterI which in data centerI. So far my idea is that store snapshot in custerII and restore the snapshot in order to sync up clusterI. But this way kind of having some delay. Is there any better way please.
The ability to cluster is a concept baked into ElasticSearch. However it was not designed to be scaled across datacenters because this involves network latency, but it can do it.
The idea behind ElasticSearch is to have a highly-available cluster that replicates shards within itself (i.e. a replica level of 2 in a cluster means that you have 2 copies of the data across your cluster). This means one cluster alone is its own backup.
First, if you don't have it configured as a cluster, do so by adding the following to your /etc/elasticsearch/elasticsearch.yml (or wherever you put your config):
/etc/elasticsearch/elasticsearch.yml:
cluster.name: thisismycluster
node.name: ${HOSTNAME}
Alternatively, you can make node.name whatever you want, but it's best to put in your hostname.
You also want to make sure you have the ElasticSearch service bound to a particular address and/or interface, where the interface is probably your best bet because you need a point-to-point link across those datacenters:
/etc/elasticsearch/elasticsearch.yml:
network.host: [_tun1_]
You will need to make sure you set a list of discovery hosts, which means that on every host in the cluster, if their cluster.name parameter name matches, they will be discovered and assigned to that cluster. ElasticSearch takes care of the rest, it's magical!
You may add the host by name (only if defined in your /etc/hosts or DNS across your datacenters can resolve it) or IP:
/etc/elasticsearch/elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["ip1", "ip2", "..."]
Save the config and restart ElasticSearch:
sudo systemctl restart elasticsearch
OR
sudo service elasticsearch restart
If you aren't using systemd (depending on your OS), I would highly suggest using it.
I will tell you though that doing snapshots with ElasticSearch is a terrible idea, and to avoid it at all costs because ElasticSearch built the mentality of high-availability into the application already - this is why this application is so powerful and is being heavily adopted by the community and companies alike.

How to configure kibana for multiple nodes in the same cluster?

I have a ES cluster with three nodes, and the master will produce through election. So I want to know how to make Kibana listen to the cluster rather than only for one node like using elasticsearch.url: "http://localhost:8200". Thanks
This is not possible right now, there's an open issue about it: #214 (opened for 2+ years)
Kibana 4 can currently only reverse-proxy a single ES host. You don't necessarily need to reference a master node. You could create a small client node (with master: false and data: false), add it to your cluster and configure Kibana to point at it. That way you get load-balancing for free and don't need to care if the master changes or goes down.
UPDATE
As of Kibana 6.6, it is possible to provide multiple ES hosts
Kibana suggests to "run an Elasticsearch Coordinating only node on the same machine as Kibana". Load Balancing Across Multiple Elasticsearch Nodes describes the details about how to configure the ElasticSearch Coordinating node for Kibana.
You can use a reverse-proxy/load-balancer but it gets messy. A simpler solution will be to install Elasticsearch with coordinating-only node configuration on same machine/VM/instance as Kibana with following configurations:
1. Add these lines to elasticsearch.yml for coordinating-only node.
node.master: false
node.data: false
node.ingest: false
Update the discovery.zen.ping.unicast.hosts to add the new node (coordinating-only). In your case, there will be 4 hosts.
refer to localhost: in kibana dashboard configuration file.
Hope this helps.

Resources