Support of multi node type in IBM Cloud Private Cluster - ibm-cloud-private

IBM Cloud Private supports multi node type, x, p, x in the same cluster, what should we define in the helm deployment in order to make sure a deployment goes to a particular node type?

IBM Could Private supports mixed architectures on the worker nodes.
For example, if you deploy an z application it will only try to run on z nodes.
All the matter nodes should be one architecture, either x or p.
From ICP app center, you can create different charts for different platforms as following:
app center

We found an issue to enable 'nodeSelector' to select different platforms to deploy. This issue was tracked in both ICP and Kubernetes community as below.
ICP issue: https://github.ibm.com/IBMPrivateCloud/roadmap/issues/1737
Kubernetes Chart issue: https://github.com/kubernetes/charts/issues/1899

You can get the node information from Infrastructure-->Node.
Different platforms have different arch images, so we have to use nodeSelector to enable the pods to be scheduled to different nodes. We are now trying to enable multiarck docker images here https://github.com/docker-library/official-images , if this finished, then we will do not need nodeSelector anymore.
For the usage metric of different nodes in the cluster, you can get resource usage from the dashboard by default. If you install the add-on monitoring framework such as Prometheus etc on ICP, you will get more metrics.

Related

Cannot obtain cAdvisor container metrics on Windows Kubernetes nodes

I have configured a mixed node Kubernetes cluster. Two worker nodes are Unbuntu Server 18.04.4 and two worker nodes are Windows Server 2019 Standard. I have deployed several Docker containers as deployments/pods to each set of worker nodes (.NET Core apps on Ubuntu and legacy WCF apps on Windows). Everything seems to work as advertised.
I am now at the point where I want to monitor the resources of the pod/containers. I have deployed Prometheus, kube-state-metrics, metrics-server. I have Prometheus scraping the nodes. For container metrics, the kubelet/cAdvisor is returning everything I need from the Ubunutu nodes, such as "container_cpu_usage_seconds_total, container_cpu_cfs_throttled_seconds_total, etc". But the kubelet/cAdvisor for the Windows nodes only give me some basic information:
http://localhost:8001/api/v1/nodes/[WINDOWS_NODE]/proxy/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="",kernelVersion="10.0.17763.1012",osVersion="Windows Server 2019 Standard"} 1
# HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise
# TYPE container_scrape_error gauge
container_scrape_error 0
# HELP machine_cpu_cores Number of CPU cores on the machine.
# TYPE machine_cpu_cores gauge
machine_cpu_cores 2
# HELP machine_memory_bytes Amount of memory installed on the machine.
# TYPE machine_memory_bytes gauge
machine_memory_bytes 1.7179398144e+10
So while the cAdvisor on the Ubuntu nodes gives me everything I ever wanted about containers and more, the cAdvisor on the Windows nodes only gives me the above.
I have examined the Powershell scripts that install/configure kubelet on the Windows nodes, but don't see/understand how I might configure a switch or config file if there is a magical setting I am missing that would enable container metrics to be published when kubelet/cAdvisor is scraped. Any suggestions?
There is metrics/resource/v1alpha1 endpoint. But it provides only 4 basic metrics.
Documentation
I think that cAdvisor doesn't support windows nodes properly that you see is just a n emulated interface with limited metrics
Github issue

Elasticsearch on Kubernetes - 'Elastic Cloud (ECK)' vs 'Helm charts'

For the purpose of log file aggregation, I'm looking to setup a production Elasticsearch instance on an on-premise (vanilla) Kubernetes cluster.
There seems to be two main options for deployment:
Elastic Cloud (ECK) - https://github.com/elastic/cloud-on-k8s
Helm Charts - https://github.com/elastic/helm-charts
I've used the old (soon to be deprecated) helm charts successfully but just discovered ECK.
What are the benefits and disadvantages of both of these options? Any constraints or limitations that could impact long-term use?
The main difference is that the Helm Charts are pretty unopinionated while the Operator is opinionated — it has a lot of best practices built in like a hard requirement on using security. Also the Operator Framework is built on the reconcilliation loop and will continuously check if your cluster is in the desired state or not. Helm Charts are more like a package manager where you run specific commands (install a cluster in version X with Y nodes, now add 2 more nodes, now upgrade to version Z,...).
If ECK is Cloud-on-Kubernetes, you can think of the Helm charts as Stack-on-Kubernetes. They're a way of defining exact specifications running our Docker images in a Kubernetes environment.
Another difference is that the Helm Charts are open source while the Operator is free, but uses the Elastic License (you can't use it to run a paid Elasticsearch service is the main limitation).
1. Elastic Cloud (ECK):
ADVANTAGES
document oriented (JSON)
multilingual - the ICU plugin is used to index and tokenize
multilingual content which is an elasticsearch plugin based on the
lucene implementation of the unicode text segmentation standard
managing and monitoring multiple clusters
upgrading to new stack versions with ease
scaling cluster capacity up and down
changing cluster configuration
dynamically scaling local storage (includes Elastic Local Volume, a
local storage driver)
scheduling backups
secure by default - have encryption enabled and are protected with a
strong default password right at creation time
free features - Canvas, Maps, Uptime
hot-warm-cold and custom topologies
official GKE support
free tier
DISADVANTAGES
it is not as good at being a data store as some other options like
MongoDB, Hadoop, etc. For smaller use cases, it will perform fine. If
you are streaming TB’s of data every day, you will find that it
either chokes or loses data
it’s learning curve is much
steeper
when you can’t or won’t create a production-worthy setup because of
economics. For test and dev, a single node will work fine. When you
move to production, you should have no less than a 3-node/2-replica
More information you can find here: ECK.
2. Elastic Stack Kubernetes Helm Charts:
ADVANTAGES
huge community
easy to deploy and use in Kubernetes
each component in the stack takes care of a different step in the
logging pipeline, and together, they all provide a comprehensive and
powerful logging solution for Kubernetes
rich analysis capabilities
DISADVANTAGES
difficult to maintain at scale
More information you can find here: open-source-monitoring-tools-for-kubernetes.

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

Getting list of EMR Release labels via Amazon API

I need to receive the list of available EMR Release labels in order to run my Java application which starts an EC2 instance and executes a hadoop job. The main problem here that EMR Release labels are specific for each region and I need to get this list dynamically. The sample of the code used in my application:
runJobFlowRequest.setReleaseLabel( "emr-4.8.0" );
Does anyone have an idea where I could obtain this list of Release labels programmatically via Amazon API in order to use it in my application?
I do not think we have any API exposed to get this list programatically. Usually most regions will have same release labels and EMR Console should show the most updated list for a specific region.
You can also see the list on the documentation : https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-relguide-links.html

How to deploy a Cassandra cluster on two ec2 machines?

It's a known fact that it is not possible to create a cluster in a single machine by changing ports. The workaround is to add virtual Ethernet devices to our machine and use these to configure the cluster.
I want to deploy a cluster of , let's say 6 nodes, on two ec2 instances. That means, 3 nodes on each machine. Is it possible? What should be the seed nodes address, if it's possible?
Is it a good idea for production?
You can use Datastax AMI on AWS. Datastax Enterprise is a suitable solution for production.
I am not sure about your cluster, because each node need its own config files and it is default. I have no idea how to change it.
There are simple instructions here. When you configure instances settings, you have to write advanced settings for cluster, like --clustername yourCluster --totalnodes 6 --version community etc. You also can install Cassandra manually by installing latest version java and cassandra.
You can build cluster by modifying /etc/cassandra/cassandra.yaml (Ubuntu 12.04) fields like cluster_name, seeds, listener_address, rpc_broadcast and token. Cluster_name have to be same for whole cluster. Seed is master node, which IP you should add for every node. I am confused about tokens

Resources