I need to deploy several containers to a Kubernetes cluster. The objetive is automating the deployment of Kafka, Kafka Connect, PostgreSQL, and others. Some of them already provide a Helm operator that we could use. So my question is, can we somehow use those helm operators inside our operator? If so, what would be the best approach?
The only method I can think of so far is calling the helm setup console commands from within a deployment app.
Another approach, without using those helm files, would be implementing the functionality of each operator in my own operator, which doesn't seem to make much sense since what I need was already developed and is public.
I'm very new to operator development so please excuse me if this is a silly question.
Edit:
The main purpose of the operator is to deploy X databases. Along with that we would like to have a single operator/bundle that deploys the whole system right away. Does it even make sense to use an operator to bundle, even if we have additional tasks for some of the containers? With this, the user would specify in the yaml file:
databases
- type: "postgres"
name: "users"
- type: "postgres"
name: "purchases"
and 2 PostgreSQL databases would be created. Those databases could then be mentioned in other yaml files or further down in the same yaml file. Case on hands: the information from the databases will be pulled by Debezium (another container), so Debezium needs to know their addresses. So the operator should create a service and associate the service address with the database name.
This is part of an ETL system. The idea is that the operator would allow an easy deployment of the whole system by taking care of most of the configuration.
With this in mind, we were thinking if it wasn't possible to pick on existing Helm operators (or another kind of operator) and deploy them with small modifications to the configurations such as different ports for different databases.
But after reading F1ko's reply I gained new perspectives. Perhaps this is not possible with an operator as initially expected?
Edit2: Clarification of edit1.
Just for clarification purposes:
Helm is a package manager with which you can install an application onto the cluster in a bundled matter: it basically provides you with all the necessary YAMLs, such as ConfigMaps, Services, Deployments, and whatever else is needed to get the desired application up and running in a proper way.
An Operator is essentially a controller. In Kubernetes, there are lots of different controllers that define the "logic" whenever you do something (e.g. the replication-controller adds more replicates of a Pod if you decide to increment the replicas field). There simply are too many controllers to list them all and have running individually, that's why they are compiled into a single binary known as the kube-controller-manager.
Custom-built controllers are called operators for easier distinction. These operators simply watch over the state of certain "things" and are going to perform an action if needed. Most of the time these "things" are going to be CustomResources (CRs) which are essentially new Kubernetes objects that were introduced to the cluster by applying CustomResourceDefinitions (CRDs).
With that being said, it is not uncommon to use helm to deploy operators, however, try to avoid the term "helm operator" as it is actually referring to a very specific operator and may lead to confusion in the future: https://github.com/fluxcd/helm-operator
So my question is, can we somehow use those helm operators inside our operator?
Although you may build your own operator with the operator-sdk which then lets you deploy or trigger certain events from other operators (e.g. by editing their CRDs) there is no reason to do so.
The only method I can think of so far is calling the helm setup console commands from within a deployment app.
Most likely what you are looking for is a proper CI/CD workflow.
Simply commit the helm chart and values.yaml files that you are using during helm install inside a Git repository and have a CI/CD tool (such as GitLab) deploy them to your cluster every time you make a new commit.
Update: As the other edited his question and left a comment i decided to update this post:
The main purpose of the operator is to deploy X databases. Along with that we would like to have a single operator/bundle that deploys the whole system right away.
Do you think it makes sense to bundle operators together in another operator, as one would do with Helm?
No it does not make sense at all. That's exactly what helm is there for. With helm you can bundle stuff, you can even bundle multiple helm charts together which may be what you are actually looking for. You can have one helm chart that passes the needed values down to the actual operator helm charts and therefore use something like the service-name in multiple locations.
In the case of operators inside operators, is it still necessary to configure every sub-operator individually when configuring the operator?
As mentioned above, it does not make any sense to do it like that, it is just an over-engineered approach. However, if you truly want to go with the operator approach there are basically two approaches you could take:
Write an operator that configures the other operators by changing their CRs, ConfigMaps etc. ; with this this approach you will have a somewhat lightweight operator, however you will have to ensure it is compatible at all times with all the different operators you want it to interfere with (when they change to a new apiVersion with breaking changes, introduce new CRs or anything of that kind, you will have to adapt again).
Extract the entire logic from the existing operators into your operator (i.e. rebuild something that already already exists); with this approach you will have a big monolithic application that will be huge pain to maintain as you will continuously have to update your code whenever there is an update in the upstream operator
Hopefully it is clear by now that building your own operator for "operating" other operators comes with lot of painful dependencies and should not be the way to go.
Is it possible to deploy different configurations of images? Such as databases configured with different ports?
Good operators and helm charts let you do that out of the box, either via a respective CR / ConfigMap or a values.yaml file, however, that now depends on what solutions you are going to use. So in general the answer is: yes, it is possible if supported.
Related
I have Openshift project with 3 pods: FE, BE1, BE2.
FE communicates with BE1 via REST API, BE1 with BE2 via REST API too.
I need to implement replication of pods. I have idea to make copy of pods, and if one of pod in set will not work, traffic will be redirected to another set.
It will be like this:
Set_1 : FEr1 -> BE1r1 -> BE2r1,
Set_2 : FEr2 -> BE1r2 -> BE2r2
FE is React react in container
BE1 and BE2 is Java apps in separate containers.
I don't know how to configure it. Every container contains pipeline configration and application.template files.
Somebody knows how is it possible to do, or maybe some another way to create it?
Thanks!
If I'm understanding you correctly, your question essentially boils down to "How do I run an active-passive K8S Service"? Because if I could give you answer on how to run an "active-passive service" for FEr1 / FEr2 then you could use the same technique for each pod in your "sets". So, to simplify my answer, I'm going to focus on how to have a single "active-passive" service. You can then you can extrapolate on your own how to create a chain of "active-passive" services.
I will begin with the fact there is no such native "active-passive" service object in Kubernetes or Openshift. It's kind of antithetical to most K8S design patterns. So you are going to have either change your architecture or you are going to have build something fairly customized.
When trying to find a link I could share to demonstrate some of your options, I found this blog post from Paul Dally which details most of the the options I was going to outline. It is a great exploration of active-passive services in Kubernetes. For convenience, I'm going to summarize here and add some commentary. But he goes into some great detail and I'd recommend reading the original blog post from Paul.
His option #1, and his recommended approach, is essentially "don't do that". He talks about the disadvantages of an active-passive approach and why K8S patterns generally don't take an active-passive approach. I concur: your best option is just to rearchitect your services so that they are not active-passive.
His option #2 is essentially another recommendation of "don't do that". I will paraphrase his second option as "if you are in a situation where you are forced to only have one active pod the more Kubernetes native approach would be to only run one pod". In this option you use only a single pod, but use Kubernetes native Deployments/Statefulsets and liveness probes to keep the single pod available. Obviously if your pod has slow startup, this has some challenges.
His option #3 is basically his option of last resort. To quote his article, "Make sure that you have fully considered and thoughtfully ruled out the preceding options before continuing with an active/passive load balancing approach." But then he details an approach where you could use a normal K8S Deployment/StatefulSet to create your pods and a normal K8S Service to route traffic between them. But, so that they don't have active-active traffic balancing you add an additional selector to the service e.g. "role=active". Since none of the pods will have this label, the selector will prevent either of the pods from being routed to.
But this leads to the trick: you create an additional Deployment (and Pod) whose sole job is to maintain that "role=active" label. It's perfectly possible to patch the labels of a running pod. So he provides some pseudo-code for a script that you could run in that "failover manager" pod. Essentially the "failover manager" is just checking for availability, by whatever rules you define, and then controls the failover from the active to passive pod by deleting and adding the label.
He does talk about the challenges of this. Including making sure it's hardened enough and has the proper permissions. I'd suggest that if you take this approach that you make it a full-fledged operator. Because essentially that's what this kind of approach is: writing a custom operator.
I will also, however, mention another similar approach that I'll call option #4. Essentially what you are doing with option #3 is create custom routing logic by patching the service. You could just embrace that customer routing approach and deploy something like your own HAProxy. I don't have a sample config for you. But active-passive failover is a fairly well explored area for an HAProxy. You are adding an additional layer of routing, but you are using more off the shelf functionality rather than patching services on-the-fly.
While building operators using OperatorSDK: Go framework, we end up creating Kubernetes resources such Deployments, Services etc programmatically by leveraging structs from k8s modules/packages. Compared to creating these manifests in yaml/json formats, this is quite cumbersome and requires quite a bit of coding. And any changes to the manifest would require code changes and the new version of the operator needs to be rolled out.
I am wondering whether existing templating/overlay tools such as Helm or Kustomize can be used for building these k8s resources within the operator code. This would also enable you to externalise the manifest/template files from the operator code. I couldn't find any good examples of how these tools can be used as modules/libraries within a Go program. Please provide any pointers, suggestions or alternate approaches.
Related question: Kubernetes operator create Deployment using yaml template
This talks about how you can read a yaml file and unmarshal it into a Deployment object. Here, I would still need to code templating/overlay logic within the operator.
You can use the helm engine programmatically, by calling engine.Render.
func Render(chrt *chart.Chart, values chartutil.Values) (map[string]string, error)
I have multiple instances of the same engine running as windows services on the same environment and system that just have slightly different connection strings as they point to different queues. Other than a couple of lines in the conifg (XML) the rest of the application is exactly the same (config and binaries). When config changes are made this is done to all instances which is time consuming so I am doing some research into the best method of managing the config files in a scalable and version controlled way. Currently I use a batchfile to copy the default engine directory and config over and then find and replace the individual strings. I'd prefer to have a template config that can be updated that pulls in set variables for the connection strings depending on the instance and environment. I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files? Does anyone know if this is possible with gitlab or AWS? Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files?
Managing individual files, including details of their contents, is a common facet of configuration management. Chef, Puppet, and Ansible can all do this with relative ease.
Does anyone know if this is possible with gitlab or AWS?
No doubt, someone does. And I anticipate, but cannot confirm, that the answer is "yes" for both.
Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
A configuration management system would almost certainly be overkill if the particular task you describe is the only thing you are considering them for.
Currently I use a batchfile to copy the default engine directory and
config over and then find and replace the individual strings. I'd
prefer to have a template config that can be updated that pulls in set
variables for the connection strings depending on the instance and
environment.
In the first place, if it ain't broke, don't fix it. On the other hand, if it is broke, and switching to a template-based approach is a reasonable method to resolve the issue, then you can certainly implement that with a for-purpose local script without bringing in all the apparatus of a configuration management system.
In the event that you do decide that the current mechanism needs to be replaced, do, for goodness sake, ditch batchfile. It's one of the worst scripting languages ever inflicted on humanity. PowerShell would be a natural replacement on Windows, but you might also consider Python, or pretty much any programming language you know.
Kubernetes has a rapidly evolving API and I am trying to find best practices, recommendations, or really any kind of guidance about how to write Go software that gracefully handles supporting its evolving API and supports multiple versions simultaneously. I am sure I am not the first person to attempt this, but so far I have not found any guidance about Kubernetes specifically, and what I have read about polymorphism in Go has not inspired a great solution yet.
Kubernetes is written in Go and provides Go packages like k8s.io/api/extensions/v1beta1 and k8s.io/api/networking/v1beta1. Kubernetes resources, for example Ingress, are first released in one API group (extensions) and as they become more mature, get moved to another API group (networking) and can also change versions (e.g. go from v1beta1 to plain v1). Kubernetes also provides k8s.io/client-go for interacting with a Kubernetes cluster.
I am an experienced object-oriented (and other types of) programmer, but fairly new to Go and completely new to the Kubernetes packages. What I want to accomplish is a program architecture that allows me to write code once and have it work on any version of the Kubernetes resource, at least as long as the resource contains all the features I care about. In a typical object-oriented environment, I would create a base Ingress class and have all these various versions derive from it, and package up operations so that I could just work on Ingress everywhere. My sense is that Go intends for people to take a different approach, and in any case there are complications because of the client/server aspect.
Client/server and APIs
My Go program is a client of the Kubernetes server. Various version of the server will support various version of the Kubernetes API, and therefor various versions of the Ingress resource. So my first problem is that I have to do something like this to get a list of all the Ingresses:
ingressesExt, err := il.kubeClient.ExtensionsV1beta1().Ingresses(namespace).List(metav1.ListOptions{})
ingressesNet, err := il.kubeClient.NetworkingV1beta1().Ingresses(namespace).List(metav1.ListOptions{})
I have to gracefully handle errors about the API not being supported. Because the return types are different, AFAIK there is no unified interface where I can just make one call and get the results in a single list. It seems like this is the sort of thing someone should have solved and provided a solution for, but so far I have not found anything.
Type conversion
I also have to find some way to merge ingressesExt and ingressesNet into a single usable list, with an eye toward maintainability/extensibility now that Ingress has graduated to NetworkingV1.
Kubernetes utilities
I see that Kubernetes provides a lot of auto-generated code and utilities, but I have not found a lot of documentation about how to use them. For example, Ingress has functions like
DeepCopy
Marshal
XXX_DiscardUnknown
XXX_Merge
XXX_Unmarshal
Maybe I can use these to do the type conversion? Combine marshal, unmarshall, discard, and merge somehow to take the data from on version and import it into another?
Questions
Hopefully you see the issue and understand what I am trying to achieve.
Are there packages from Kubernetes or other open source authors that make some progress in unifying the APIs like I need?
Are any of the Kubernetes auto-generated functions meant for general use (as opposed to internal use) and helpful to my challenge? I have not found documentation for any but DeepCopy.
What is the "Go way" of abstracting out the differences between the various versions of the Ingress object such that I can write the rest of the code to work on any version? Keep in mind that I may need to make another API call for further processing, in which case I would need to know the concrete type of the object and select the right API call. It is not obvious to me that client-go provides any support for such auto-selection of API calls.
We are struggling with trying to figure out the best approach for updating processor configurations as a flow progresses through the dev, test, and prod stages. We would really like to avoid manipulating host, port, etc. references in the processors when the flow is deployed to the specific environment. At least in our case, we will have different hosts for things like ElasticSearch, PostGres, etc. How have others handled this?
Things we have considered:
Pull the config from a properties file using expression language. This is great for processors that have EL enabled, but not the case for those where it isn't.
Manipulate the flow xml and overwrite the host, port, etc. configurations. A bit concerned about inadvertently corrupting the xml and how portable this will be across NIFI versions.
Any tips or suggestions would be greatly appreciated. There is a good chance that there is an obvious solution we have neglected to consider.
EDIT:
We are going with the templates that Byran suggested. They will definitely meet our needs and appear to be a good way for us to control configurations across numerous environments.
https://github.com/aperepel/nifi-api-deploy
This discussion comes up frequently, and there is definitely room for improvement here...
You are correct that currently one approach is to extract environment related property values into the bootstrap.conf, and then reference them through expression language so the flow.xml.gz can be moved from one environment to the other. As you mentioned this only works well with properties that support expression language.
In order to make this easier in the future, there is a feature proposal for an idea called a Variable Registry:
https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
An interesting approach you may want to look at is using templates. There is a GitHub project that can be used to help with this:
https://github.com/aperepel/nifi-api-deploy
You can loook at this post automating NIFI template deployment
For automating NIFI template deployment, there is a tool that works well : https://github.com/hermannpencole/nifi-config
Prepare your nifi development
Create a template on nifi
and download it
Extrac a sample configuration with the tools
Deploy it on production
undeploy the old version with the tools
deploy the template with the tools
update the production configuration with the tools