How do data nodes store data in ECK - elasticsearch

I'm new to ECK and I'm struggling to find information on how data nodes store data within ECK. My kube cluster has multiple ECK clusters in different namespaces. I use a node selector along with a taint to schedule the nodes where I want them to go.
- name: data
count: {{ .Values.dataNodeCount }}
config:
node.roles: ["data", "ingest", "ml", "transform"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ .Values.volumeSize }}
storageClassName: standard
podTemplate:
spec:
nodeSelector:
elasticcluster: 'true'
tolerations:
- key: "elasticapp"
operator: "Equal"
value: "some-app"
effect: "NoSchedule"
This should almost always guarantee that one pod gets scheduled to a node with a taint equal to some-app. But how is data replicated within the data nodes? If I have 3 data nodes is ECK data shared between the 3 nodes? Does each node hold a copy of all of the ECK database and in the event that a node crashes, no data is lost? How often is data replicated between these? Is this something that's controlled within ECK?
I've been poking through documentation and nothing I've been able to locate has been able to give me a clear answer. Ideally I'd like each data node to hold a 1x1 copy of what all the other nodes have so in the even that the pod goes down or one node's data gets corrupted the other node can take over rapidly.

Related

JMeter Worker YAML

I had a question about the properties within the JMeter Worker YAML file. Currently we are using Azure node with the below spec:
B8ms spec
CPU | RAM | Data Disks | Max IOPS | Temp Storage
The properties for the JMeter worker are the following:
# JMeter Worker Deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jmeter-workers
namespace: execution
labels:
jmeter_mode: slave
spec:
replicas: 1
serviceName: jmeter-workers-svc
selector:
matchLabels:
jmeter_mode: slave
template:
metadata:
labels:
jmeter_mode: slave
spec:
containers:
- name: jmslave
image: crpplshduks.azurecr.io/devops-tools/jmeterworker:5.4.3.4
imagePullPolicy: IfNotPresent
ports:
- containerPort: 1099
- containerPort: 50000
resources:
requests:
cpu: 2
memory: 2G
imagePullSecrets:
- name: regcred
nodeSelector:
type: hp
My question was specifically about the values used for these properties:
cpu: 2
memory: 2G
Currently when running tests with 1000-2000 users the node CPU isn't going above 20%. If I removed these properties or left them blank, would that allow the workers to use the full resources available on the node machines? What is the best practice?
From current tests, with these properties, a JMeter worker can only handle 25-30 concurrent users before we start seeing response times being skewed in Grafana. Grafana is reporting 1-2min response times. But when manually going to the application and testing the same pages, lower response times are observed.
Has anyone else experienced this?
As per documentation
By default, a container has no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows
so if you remove the constraints only underlying Linux kernel will decide how many CPU ticks and RAM pages will be allocated to the container.
The other story is JMeter itself, first of all you need to make sure to follow JMeter Best Practices. Also as of JMeter 5.5 it allocates to itself 1 GB of JVM heap so you might want to increase this as well

How to state number of cpus in autoscaler config.yaml for specific node

I am trying to find the command for the config.yml for ray autoscaler
I know there are max_workers but that considers the cluster as a whole. I want to limit the number of cpus launched on each worker node.
for example:
worker_node:
max_cpus: 3
Head_node:
max_cpus: 4
How do I do that?
The number of CPUs per worker is determined by the worker's configuration which is provider specific. This is what the node_config field is for. For example, with AWSm if you wanted to specify a 4 CPU machine you would do something like
available_node_types:
cpu_4_ondemand:
node_config:
InstanceType: m4.xlarge
min_workers: 1
max_workers: 5
notice the InstanceType field, which is specific to EC2 (it's 4 cpus because that's how many cpus are on a m4.xlarge instance).
For Kubernetes, you would place a CRD in the node_config field. For example
node_config:
apiVersion: v1
kind: Pod
metadata:
# Automatically generates a name for the pod with this prefix.
generateName: ray-worker-
# Must match the worker node service selector above if a worker node
# service is required.
labels:
component: ray-worker
spec:
resources:
requests:
cpu: 4000m
memory: 512MiB
For more information, you may be interested in taking a look at the provider specific examples in the ray repo. For example, here are the aws examples: https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/aws

How to make k8s allocate gpu/npu devices following specific rule

I have multiple gpu cards within one machine, and I need to let the k8s allocate gpu/npus device following some rules I set.
For example, supposing there are 8 gpu cards whose id is from 0-7, and only device0、device1、device6 and device7 are available. Now I need to create one pod with 2 devices, these two devices must be either of (device0, device1) or (device6, device7). Other device combinations such as (device0, device6) are not valid.
Is there any way to do that? I am using kubernetes of version 1.18 and implemented my own device plugin.
I don't understand why would you write a rule like this:
every device-id be smaller than 4
If you want to limit the amount of GPUs you should be using limits and requests which is nicely explained on Schedule GPUs.
So you can just limit the resource to only 4 GPUs like so:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 4 # requesting 1 GPU
If you have different types of GPUs on different nodes you can use labels which you can read here Clusters containing different types of GPUs.
# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs. Following is an illustration of this workflow:
As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.
NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0)
source /etc/default/kubelet
KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet
Specify the GPU types a pod can use via Node Affinity rules.
kind: pod
apiVersion: v1
metadata:
annotations:
scheduler.alpha.kubernetes.io/affinity: >
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "alpha.kubernetes.io/nvidia-gpu-name",
"operator": "In",
"values": ["Tesla K80", "Tesla P100"]
}
]
}
]
}
}
}
spec:
containers:
-
name: gpu-container-1
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 2
This will ensure that the pod will be scheduled to a node that has a Tesla K80 or a Tesla P100 Nvidia GPU.
You could find other relevant information on unofficial-kubernetes Scheduling gpus.

Kubernetes: Run Pods only on EC2 Nodes that have GPUs

I am setting up GPU monitoring on a cluster using a DaemonSet and NVIDIA DCGM. Obviously it only makes sense to monitor nodes that have a GPU.
I'm trying to use nodeSelector for this purpose, but the documentation states that:
For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
I intended to check if the label beta.kubernetes.io/instance-type was any of those:
[p3.2xlarge, p3.8xlarge, p3.16xlarge, p2.xlarge, p2.8xlarge, p2.16xlarge, g3.4xlarge, g3.8xlarge, g3.16xlarge]
But I don't see how to make an or relationship when using nodeSelector?
Node Affinity was the solution:
spec:
template:
metadata:
labels:
app: dcgm-exporter
annotations:
prometheus.io/scrape: 'true'
description: |
This `DaemonSet` provides GPU metrics in Prometheus format.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/instance-type
operator: In
values:
- p2.xlarge
- p2.8xlarge
- p2.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
- g3.4xlarge
- g3.8xlarge
- g3.16xlarge

How to require one pod per minion/kublet when configuring a replication controller?

I have 4 nodes (kubelets) configured with a label role=nginx
master ~ # kubectl get node
NAME LABELS STATUS
10.1.141.34 kubernetes.io/hostname=10.1.141.34,role=nginx Ready
10.1.141.40 kubernetes.io/hostname=10.1.141.40,role=nginx Ready
10.1.141.42 kubernetes.io/hostname=10.1.141.42,role=nginx Ready
10.1.141.43 kubernetes.io/hostname=10.1.141.43,role=nginx Ready
I modified the replication controller and added these lines
spec:
replicas: 4
selector:
role: nginx
But when I fire it up I get 2 pods on one host. What I want is 1 pod on each host. What am I missing?
Prior to DaemonSet being available, you can also specify that you pod uses a host port and set the number of replicas in your replication controller to something greater than your number of nodes. The host port constraint will allow only one pod per host.
I was able to achieve this by modifying the labels as follows below
master ~ # kubectl get nodes -o wide
NAME LABELS STATUS
10.1.141.34 kubernetes.io/hostname=10.1.141.34,role=nginx1 Ready
10.1.141.40 kubernetes.io/hostname=10.1.141.40,role=nginx2 Ready
10.1.141.42 kubernetes.io/hostname=10.1.141.42,role=nginx3 Ready
10.1.141.43 kubernetes.io/hostname=10.1.141.43,role=nginx4 Ready
I then created 4 nginx replication controllers each referencing the nginx{1|2|3|4} roles and labels.
Replication controller doesn't guarantee one pod per node as the scheduler will find the best fit for each pod. I think what you want is the DaemonSet controller, which is still under development. Your workaround posted above would work too.

Resources