Azure AKS - splitting node pool over multiple Availability Zones - high-availability

I'm new to Azure so please bear with me! I'm looking to create a HA (99.99%) node pool for AKS. I am more familiar with AWS and availability zones, whereby I'd split the auto scaling group over 3 AZs and that would be that.
It appears that Azure have picked up on AZs and do offer them (https://azure.microsoft.com/en-gb/blog/azure-availability-zones-now-available-for-the-most-comprehensive-resiliency-strategy/) however, I don't see anyway to specify these parameters when creating an AKS cluster - https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create
Am I missing something here? If I use the availability set, there is only a 99.95% availability target which doesn't fulfill what I need. Basically I want to architect so that if an AZ fails in Azure my app keeps running...
Thanks!

Update: AKS with Availability Zone Support is now generally available: https://learn.microsoft.com/en-us/azure/aks/availability-zones
But note that availability zone configuration can only be set during the cluster creation!

Unfortunately,Azure Availability Zones does not support AKS currently. It is just available for some regions and services. For details, see support regions and services.

Related

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

Lifecycle of an EC2 Container Service Instance

In my project I have a constraint where all of the traffic received will go to a certain IP. The Elastic IP feature works well for this.
My question is, considering we are using Amazon's docker service (ECS) without autoscaling (so instances/tasks will be scaled manually), can we treat the instances created by the ECS service as we would treat normal, on-demand instances? As in they won't be terminated/stopped unless explicitly done by a user (or API call or whatever).
As is mentioned in the Scaling a Cluster documentation, if you created your cluster after November 24th, 2015 using either the first run wizard, or the Create Cluster wizard, then an Autoscaling group would have been created to manage the instances backing your cluster.
For the most part, the answer to your question is Yes. The instances wouldn't normally go about getting replaced. It is also important to note that because this is backed by an auto scaling group, AutoScaling might go about Replacing unhealthy instances for you. If an instance fails it EC2 Health Checks for some reason, it will be marked as unhealthy, and scheduled for replacement.
By default, my understanding is there are no CloudWatch Alarms or Scaling Policies effecting this AutoScaling group, so it should just be when an instance becomes healthy that it would get replaced.

VMSS Autoscaling: WADPerformanceCounters

I’ve added the autoscaling settings to my ServiceFabric template and after deploying it, the portal shows that auto scale is configured, but what I am not able to see is the table WADPerformanceCounters; mentioned in the documentation; in my storage account. So how is the auto scaling executed without the information about the couters?
Thanks.
If autoscale cannot find the data it's configured to look at, it will set your capacity equal to the "default" configured in the autoscale rule.
As for what could explain the behavior you're seeing, here are a couple hypotheses:
1) There are two types of metrics in Azure today: host and guest; host metrics live in Azure-internal data stores and as such don't require a storage account to store data in. Guest metrics, however, do live in a storage account. So depending on how you added autoscale, you might have added host metrics instead of guest metrics? For more info, see this doc: https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/insights-autoscale-common-metrics
2) As you can see in this template using guest metrics, for guest metrics the scale set must have the WAD extension configured to point to the storage account; it's probably worth checking that the storage account specified in the WAD extension config is the same storage account you looked for the table in.
For host metrics, you can find the list of supported metrics here:
https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-supported-metrics#microsoftcomputevirtualmachinescalesets
For guest metrics, as mentioned above you need to configure the Windows Azure Diagnostic (WAD) extension correctly on your VMSS. Specifically autoscale engine will query from the WAD{value}PT1M{value} tables in your configured diagnostic storage account. These tables contain the local 1 minute aggregation of the performance counter data.

Datastax Cassandra - Amazon EC2 instance - Cluster with three node spanning across Amazon region

I am planning to create cluster with three nodes and each node will be launched in three different Amazon EC2 zone.
As per Datastax Documentation, I will use Ec2MultiRegionSnitch and replication stragey is NetworkTopologyStrategy. Below is my needs to be achieved
Cluster Size : 3 (Spanning Across Amazon EC2 Region).
Replication Factor: 3
Read and Write Level : QUORUM.
Based on the above configuration, I can survive on single node loss(Meaning that down of any one of amazon region. Correct me if I am wrong).
In order to achieve the above configuration, I have two option
Option-1 : Using Datastax provided Amazon EC2 AMI image.
This option launch the instance with almost all components needed to run cassandra with some monitoring tools(opscenter..etc)
But It store all data on EC2 Instance Store hence data persists only during the life of the instance and the storage size depends upon instance type.
Option-2 : Using Customised installation
In this option, I have to launch Amazon EC2 Ubuntu AMI,installing JAVA,installing Datastax community edition.
This option enable me to store all my data on EBS. Hence I can expand EBS whenever I needed and the same time I can restore any node using EBS snapshot.
My Question:
Which one of the option is suitable for my needs?.
Note:
I read the documentation provided by Datastax and very new to cassandra. Hence, Whatever inputs you provided will be very useful to me.
Thanks
It's not true that you get Datastax AMI only with EC2 ephemeral storage. Starting from version 2.5 they claim you can choose EBS as well: Introducing the DataStax Auto-Clustering AMI 2.5. That's an relatively easy way of getting started which I've personally chosen.
Should you choose EBS or EC2 ephemeral storage?
The answer is: it depends...
The past (~2012-2013):
EC2 instances with ephemeral storage were a better choice. There were detailed performance benchmarks over the years which indicated that EBS is getting better, but still, attached physical drives were better.
The past (~2014):
EC2 choice is still better. Datastax wrote a nice post about pricing, network and failure resilience: What is the story with AWS storage?
Present (~2016):
instaclustr claims:
By running Cassandra on Amazon EBS, you can run denser, cheaper
Cassandra clusters with just as much availability as ephemeral storage
instances.
Nice presentation here: AWS re:Invent 2015 | (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second on 60 Nodes
All in all, I suggest you doing a TCO analysis and if there isn't a big difference in price, choose EBS - because of out of the box ability to make a snapshot. What's more, chances are EBS will be improved over the time.

Clustering in Amazon ec2 using Starcluster

Is it possible to deploy instances from AMI to multiple zones in Amazon EC2 using starcluster?
Can anyone give me your feedback on this please?
I need to deploy instances to various Zones in Amazon EC2.
Yes you can. In the addnode command, there is a flag just for that.
-z ZONE, --availability-zone=ZONE
In the load balancer however you can't. I have a fork of StarCluster that enables it though. When you launch it, you specify --ignore-grp. It's only useful when you work with spot instances as it will bid in the cheapest zone.
This fork also supports multiple zone into a VPC, which is also handy.

Resources