Openshift master slave - continuous-integration

I have Openshift project with 3 pods: FE, BE1, BE2.
FE communicates with BE1 via REST API, BE1 with BE2 via REST API too.
I need to implement replication of pods. I have idea to make copy of pods, and if one of pod in set will not work, traffic will be redirected to another set.
It will be like this:
Set_1 : FEr1 -> BE1r1 -> BE2r1,
Set_2 : FEr2 -> BE1r2 -> BE2r2
FE is React react in container
BE1 and BE2 is Java apps in separate containers.
I don't know how to configure it. Every container contains pipeline configration and application.template files.
Somebody knows how is it possible to do, or maybe some another way to create it?
Thanks!

If I'm understanding you correctly, your question essentially boils down to "How do I run an active-passive K8S Service"? Because if I could give you answer on how to run an "active-passive service" for FEr1 / FEr2 then you could use the same technique for each pod in your "sets". So, to simplify my answer, I'm going to focus on how to have a single "active-passive" service. You can then you can extrapolate on your own how to create a chain of "active-passive" services.
I will begin with the fact there is no such native "active-passive" service object in Kubernetes or Openshift. It's kind of antithetical to most K8S design patterns. So you are going to have either change your architecture or you are going to have build something fairly customized.
When trying to find a link I could share to demonstrate some of your options, I found this blog post from Paul Dally which details most of the the options I was going to outline. It is a great exploration of active-passive services in Kubernetes. For convenience, I'm going to summarize here and add some commentary. But he goes into some great detail and I'd recommend reading the original blog post from Paul.
His option #1, and his recommended approach, is essentially "don't do that". He talks about the disadvantages of an active-passive approach and why K8S patterns generally don't take an active-passive approach. I concur: your best option is just to rearchitect your services so that they are not active-passive.
His option #2 is essentially another recommendation of "don't do that". I will paraphrase his second option as "if you are in a situation where you are forced to only have one active pod the more Kubernetes native approach would be to only run one pod". In this option you use only a single pod, but use Kubernetes native Deployments/Statefulsets and liveness probes to keep the single pod available. Obviously if your pod has slow startup, this has some challenges.
His option #3 is basically his option of last resort. To quote his article, "Make sure that you have fully considered and thoughtfully ruled out the preceding options before continuing with an active/passive load balancing approach." But then he details an approach where you could use a normal K8S Deployment/StatefulSet to create your pods and a normal K8S Service to route traffic between them. But, so that they don't have active-active traffic balancing you add an additional selector to the service e.g. "role=active". Since none of the pods will have this label, the selector will prevent either of the pods from being routed to.
But this leads to the trick: you create an additional Deployment (and Pod) whose sole job is to maintain that "role=active" label. It's perfectly possible to patch the labels of a running pod. So he provides some pseudo-code for a script that you could run in that "failover manager" pod. Essentially the "failover manager" is just checking for availability, by whatever rules you define, and then controls the failover from the active to passive pod by deleting and adding the label.
He does talk about the challenges of this. Including making sure it's hardened enough and has the proper permissions. I'd suggest that if you take this approach that you make it a full-fledged operator. Because essentially that's what this kind of approach is: writing a custom operator.
I will also, however, mention another similar approach that I'll call option #4. Essentially what you are doing with option #3 is create custom routing logic by patching the service. You could just embrace that customer routing approach and deploy something like your own HAProxy. I don't have a sample config for you. But active-passive failover is a fairly well explored area for an HAProxy. You are adding an additional layer of routing, but you are using more off the shelf functionality rather than patching services on-the-fly.

Related

How to provide mutual TLS (mTLS) with Spring application in Kubernetes?

I have an interesting problem, maybe you could help me out.
There are given two spring applications, called app1 and app2. There is plenty of REST calls are happening to both of the services. I need to implement a security solution where both of them can communicate with each other on REST but it is protected by mutual TLS (mTLS where both app has its own cert for each other)
Implementing it the standard way its not that hard, Spring has solutions for it (with keystores and etc.), but the twist is, I have to create it in a Kubernetes environment.
The two app is not in the same cluster, so app1 is in our cluster but app2 deployed in one of our partner's system.
I am pretty new to k8s and not sure what is the best method to achieve this. Should I store the certs or the keystore(s) as secrets? Use and configure nginx ingress somehow, maybe Istio would be useful? I would really want to find the optimal solution but I don't know the right way.
I would really like if I could configure it outside my app and let k8s take care about it but I am not sure if it is the right thing to do.
Any help would be really appreciated, some guidance to find the right path or some kind of real life examples.
Thank you for your help!
Mikolaj has probably covered everything but still let me add my cent
i don't have much experience working with Istio, however i would also suggest checking out the Linkerd service mesh.
Step 1.
Considering if you are on multi could GKE & EKS or so still it will work.
Multicluster guide details and installation details
Linkerd will use the Trust anchor between the cluster so traffic can flow encrypted and not get open to the public internet.
You have to generate the certificate which will form a common base of trust between clusters.
Each proxy will get copy of the certificate and use it for validation.
The answer to your problem will be more complex as there is no one-size-fits-all solution that turns out to be the best. It all depends on what exactly you want to do and what tools you have for it. suren mentioned it very well in the comment:
if you are still in the stage of PoC, then note that there are couple of ways of achieving what you want. Istio would be a valid way, for example. You could have the other service in a ServiceEntry, enable mTLS and there you go. You don't have to even manage secrets for this specific scenario, as it is automatic. But there are other ways. Even with Istio there are other ways. If you are on any cloud provider, you might have some managed services as well
This is a very good comment and I would also recommend an istio based solution to you. First of all check the official mTLS documentation for istio first. You will also find specific usage examples and sample configuration files there.
You also mentioned in the question that your application will run between two clusters. Take a look at this tutorial, which shows exactly how to solve this situation:
Istio injects an envoy sidecar to every pod and makes sure all the traffic goes through the envoy proxy. Envoy proxies compose the data plane. The control plane manages the Envoy sidecars. In previous versions of Istio, the control plane used to have other components, such as Pilot, Citadel, and Galley. These components got consolidated into a single binary called “istiod”. The control plane also deals with the configurations, certificates, secrets, and health checking.
For more information look also at related problem on stackoverflow and another tutorial.
Take into account that in addition to istio itself, you will be able to use ready-made cloud solutions, for example available at GKE i.e. Configuring TLS and mTLS on the Istio ingress .
Another way might be to use a tool Anthos Service Mesh by example: mTLS.

Icinga2 checks over multiple hosts

I have an HPC cluster and I would like to monitor its health with Icinga2. I have a number of checks defined for each node in the cluster, but what I would really like is to get a notification if more than a certain percentage of the nodes are sick.
I notice that is possible to define a dummy host which represents the cluster and use the Icinga domain specific language to achieve something like I'm interested (http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/advanced-topics?highlight-search=up_count#access-object-attributes-at-runtime). However this seems like an inelegant and awkward solution.
Is it possible to define this kind of "aggregate" or "meta check" over a hostgroup?
There wasn't any solution, and such a thing put inside the docs helped quite a few users, even if it isn't that elegant. External addons such as business process can do the same but require additional configuration. The Vagrant box integrates the Icinga Web 2 module for instance.
Other users tend to use check_multi or check_cluster for that. Isn't that elegant either.
There are no immediate plans to implement such a feature although the idea is good and lasts long.

Clustering Microservice Components

We have a set of Microservices collaborating with each other in the eco system. We used to have occasional problems where one or more of these Microservices would go down accidentally. Thankfully, we have some monitoring built around which would realize this and take corrective action.
Now, we would like to have redundancy built around each of those Microservices. I'm thinking more like a master / slave approach where a slave is always on stand by and when the master goes off, the slave picks it up.
Should we consider using any framework that we could use as service registry, where we register each of those Microservices and allow them to be controlled? Any other suggestions on how to achieve the kind of master / slave architecture with the Microservices that would enable us to have failover redundancy?
I thought about this for a couple of minutes and this is what I currently think is the best method, based on experience.
There are a couple of problems you will face with availability. First is always having at least one endpoint up. This is easy enough to do by installing on multiple servers. In the enterprise space, you would use a name for the endpoint and then have it resolve to multiple servers (virtual or hardware). You would also load balance it.
The second is registry. This is a very easy problem with API management software. The really good software in this space is not cheap, so this is not a weekend hobbyist type of software. But there are open source API Management solutions out there. As I work in the Enterprise space, I am very familiar with options like Apigee, CA, Mashery, etc. so I cannot recommend an open source option and feel good about myself.
You could build your own registry, if you desire. Just be careful how you design it, as a "registry of all interface points" leads to a service that becomes more tightly coupled.

How to configure kube-proxy master_url with multiple apiservers

I'm using a cluster setup with multiple apiservers with a loadbalancer in front of them for external access, with an installation on bare metal.
Like mentioned in the High Availability Kubernetes Clusters docs, I would like to use internal loadbalancing utilizing the kubernetes service within my cluster. This works fine so far, but I'm not sure what is the best way to set up the kube-proxy. It obviously cannot use the service IP, since it does the proxying to this one based on the data from the apiserver (master). I could use the IP of any one of the apiservers, but this would cause losing the high availability. So, the only viable option I currently see is to utilize my external loadbalancer, but this seems somehow wrong.
Anybody any ideas or best practices?
This is quite old question, but as the problem persists... here it goes.
There is a bug in the Kubernetes restclient, which does not allow to use more than one IP/URL, as it will pick up always the first IP/URL in the list. This affects to kube-proxy and also to kubelet, leaving a single point of failure in those tools if you don't use a load balancer (as you did) in a multi-master setup. The solution probably is not the most elegant solution ever, but currently (I think) is the easier one.
Other solution (which I prefer, but may not work for everyone and it does not solve all the problems) is to create a DNS entry that will round robin your API servers, but as pointed out in one of the links below, that only solves the load balancing, and not the HA.
You can see the progress of this story in the following links:
The kube-proxy/kubelet issue: https://github.com/kubernetes/kubernetes/issues/18174
The restclient PR: https://github.com/kubernetes/kubernetes/pull/30588
The "official" solution: https://github.com/kubernetes/kubernetes/issues/18174#issuecomment-199381822
I think the way it is meant to be set up is that you have a kube-proxy on each master node, so each kube-proxy points to its master on 127.0.0.1 / localhost
The podmaster determines which api-server should run, which in turns makes use of the local proxy of that master

Marathon vs Aurora and their purposes

Both Marathon and Aurora are built on Mesos and supposedly are engineered for running long running services. My questions are:
What are their differences? I have struggled in finding any good explanations regarding their key differences
Do these frameworks run anything that runs on Linux? For Marathon they state that it can run anything that "is executable in a shell" but this is sort of vague :)
Thanks!
Disclaimer: I am the VP of Apache Aurora, and have been the tech lead of the Aurora team at Twitter for ~5 years. My likely-biased opinions are my own and do not necessarily represent those of Twitter or the ASF.
Do these frameworks run anything that runs on Linux? For Marathon they
state that it can run anything that "is executable in a shell" but
this is sort of vague :)
Essentially, yes. Ultimately these systems are sophisticated machinery to execute shell code somewhere in a cluster :-)
What are their differences? I have struggled in finding any good
explanations regarding their key differences
Aurora and Marathon do indeed offer similar feature sets, both being classified as "service schedulers". In other words, you hand us instructions for how to run your application servers, and we do our best to keep them up.
I'll offer some differences in broad strokes. When it comes to shortcomings mentioned in each, I think it's safe to say that the communities are aware and intend to fix them.
Ease of use
Aurora is not easy to install. It will likely feel like you are trailblazing while setting it up. It exposes a thrift API, which means you'll need a thrift client to interact with it programmatically (a REST-like API is coming, but is vaporware at the moment), or use our command line client. Aurora has a DSL for configuration which can be daunting, but allows you to easily share templates and common patterns as you use the system more.
Marathon, on the other hand, helps you to run 'Hello World' as quickly as possible. It has great docs to do this in many environments and there's little overhead to get going. It has a REST API, making it easier to adapt to custom tools. It uses JSON for configuration, which is easy to start with but more prone to cargo culting.
Targeted use cases
Aurora has always been designed to handle a large engineering organization. The clusters at Twitter have tens of thousands of machines and hundreds of engineers using them. It is critical to Twitter's business. As a result, we take our requirements of scale, stability, and security very seriously. We make sure to only condone features that we believe are trustworthy at scale in production (for example, we have our Docker support labeled as beta because of known issues with Docker itself and the Mesos-Docker integration). We also have features like preemption that make our clusters suitable for mixing business-critical services with prototypes and experiments.
I can't make any claim for or against Marathon's scalability. On the feature front, Marathon has build out features quickly, but this can feel bleeding edge in practice (Docker support is a good example). This is not always due to Marathon itself, but also layers down the stack. Marathon does not provide preemption.
Ownership
To some, ownership and governance of a project is important. It feel that in practice it does not define the openness of a project, but for some people/companies the legal fine print can be a deal-breaker.
Marathon is owned by a company (Mesosphere)
To some, this is beneficial, to others is is not. It means that you can pay for support and features. It also means that there is something to be sold, and the project direction is ultimately decided by Mesosphere's interests.
Aurora is owned by the Apache Software Foundation
This means it is subject to the governance model of the ASF, driven by the community. Aurora does not have paying customers, and there is not currently a software shop that you can pay for development.
tl;dr If you are just getting your feet wet with running services on Mesos, I would suggest Marathon as your first port of call. It will be easier for you to get running and poke around the ecosystem. If you are forming the 'private cloud strategy' for a company, I suggest seriously considering Aurora, as it is proven and specifically designed for that.
So I've been evaluating both and this is my summary.
Aurora
[+] also handles recurring jobs
[+] finer grained, extensive file-based configuration
[+] has namespaces so multiple environments can co-exist
[-] read-only UI, no official API
[~] file based configuration and cli based execution brings overhead (which can be justified with more extensive feature set)
Marathon
[+] very easy to setup and use
[+] UI that provides control and extensive API (even with features missing from UI at the moment)
[+] event bus to listen in on api calls
[-] handles only long-running jobs
[-] does not have separate deployment-run-cleanup steps, these if necessary need to be combined in a script of one-liner
Even though Aurora has better capabilities, I prefer Marathon due to Auroras complexity/overhead and lack of UI (for control) & API
I have more experience with Marathon.
Ideological:
Marathon is a relatively tested product that is used in production at AirBnB. Aurora is an early Apache project (so YMMV).
Both are open source and active. Feel free to contribute pull requests or file issues!
Technical:
Marathon doesn't schedule batch tasks or cron jobs
Marathon has a friendly UI and better health indicators (in 0.8.x)
In regards to your second question, you can run any command or docker container, and Mesos will do the resource isolation for you. If you have 50% CentOS nodes and 50% Ubuntu nodes and you run a task that executes apt-get, the task will have a 50% chance of failure. Mesos and Marathon have no awareness of the actual machines.
Disclaimer: I don't have hands-on experience with Aurora, only with Marathon.
ad Q1: In a nutshell Apache Aurora is capable of doing what Marathon + Chronos can provide, that is, schedule both long-running services and recurring (batch) jobs; see also Aurora user guide.
ad Q2: Yes, anything. Currently based on cgroups and Docker but hey, you can roll your own.

Resources