We have a quite big and old SFCC project, which started on Pipelines and should now finally be migrated to Controllers. For this we need to identify, which Pipelines are easy candidates for an initial migration.
We are doing it based in the following criteria:
which other Pipelines are calling this pipeline
how many Pipeline.execute calls are in existing controllers to this pipeline
Is the pipeline using a custom hooking mechanism (identified on a certain include pipeline) - I guess it can be abstracted by "What other pipelines is this Pipeline calling"
Is there already something out there that does something close to this?
So in case somebody else has this challenge, to get eventually rid of Pipelines in a legacy Project, you can use my node analyser script also:
https://github.com/Andreas-Schoenefeldt/SFCCAnalyser
after installation, start it with npm run
Related
I need to deploy several containers to a Kubernetes cluster. The objetive is automating the deployment of Kafka, Kafka Connect, PostgreSQL, and others. Some of them already provide a Helm operator that we could use. So my question is, can we somehow use those helm operators inside our operator? If so, what would be the best approach?
The only method I can think of so far is calling the helm setup console commands from within a deployment app.
Another approach, without using those helm files, would be implementing the functionality of each operator in my own operator, which doesn't seem to make much sense since what I need was already developed and is public.
I'm very new to operator development so please excuse me if this is a silly question.
Edit:
The main purpose of the operator is to deploy X databases. Along with that we would like to have a single operator/bundle that deploys the whole system right away. Does it even make sense to use an operator to bundle, even if we have additional tasks for some of the containers? With this, the user would specify in the yaml file:
databases
- type: "postgres"
name: "users"
- type: "postgres"
name: "purchases"
and 2 PostgreSQL databases would be created. Those databases could then be mentioned in other yaml files or further down in the same yaml file. Case on hands: the information from the databases will be pulled by Debezium (another container), so Debezium needs to know their addresses. So the operator should create a service and associate the service address with the database name.
This is part of an ETL system. The idea is that the operator would allow an easy deployment of the whole system by taking care of most of the configuration.
With this in mind, we were thinking if it wasn't possible to pick on existing Helm operators (or another kind of operator) and deploy them with small modifications to the configurations such as different ports for different databases.
But after reading F1ko's reply I gained new perspectives. Perhaps this is not possible with an operator as initially expected?
Edit2: Clarification of edit1.
Just for clarification purposes:
Helm is a package manager with which you can install an application onto the cluster in a bundled matter: it basically provides you with all the necessary YAMLs, such as ConfigMaps, Services, Deployments, and whatever else is needed to get the desired application up and running in a proper way.
An Operator is essentially a controller. In Kubernetes, there are lots of different controllers that define the "logic" whenever you do something (e.g. the replication-controller adds more replicates of a Pod if you decide to increment the replicas field). There simply are too many controllers to list them all and have running individually, that's why they are compiled into a single binary known as the kube-controller-manager.
Custom-built controllers are called operators for easier distinction. These operators simply watch over the state of certain "things" and are going to perform an action if needed. Most of the time these "things" are going to be CustomResources (CRs) which are essentially new Kubernetes objects that were introduced to the cluster by applying CustomResourceDefinitions (CRDs).
With that being said, it is not uncommon to use helm to deploy operators, however, try to avoid the term "helm operator" as it is actually referring to a very specific operator and may lead to confusion in the future: https://github.com/fluxcd/helm-operator
So my question is, can we somehow use those helm operators inside our operator?
Although you may build your own operator with the operator-sdk which then lets you deploy or trigger certain events from other operators (e.g. by editing their CRDs) there is no reason to do so.
The only method I can think of so far is calling the helm setup console commands from within a deployment app.
Most likely what you are looking for is a proper CI/CD workflow.
Simply commit the helm chart and values.yaml files that you are using during helm install inside a Git repository and have a CI/CD tool (such as GitLab) deploy them to your cluster every time you make a new commit.
Update: As the other edited his question and left a comment i decided to update this post:
The main purpose of the operator is to deploy X databases. Along with that we would like to have a single operator/bundle that deploys the whole system right away.
Do you think it makes sense to bundle operators together in another operator, as one would do with Helm?
No it does not make sense at all. That's exactly what helm is there for. With helm you can bundle stuff, you can even bundle multiple helm charts together which may be what you are actually looking for. You can have one helm chart that passes the needed values down to the actual operator helm charts and therefore use something like the service-name in multiple locations.
In the case of operators inside operators, is it still necessary to configure every sub-operator individually when configuring the operator?
As mentioned above, it does not make any sense to do it like that, it is just an over-engineered approach. However, if you truly want to go with the operator approach there are basically two approaches you could take:
Write an operator that configures the other operators by changing their CRs, ConfigMaps etc. ; with this this approach you will have a somewhat lightweight operator, however you will have to ensure it is compatible at all times with all the different operators you want it to interfere with (when they change to a new apiVersion with breaking changes, introduce new CRs or anything of that kind, you will have to adapt again).
Extract the entire logic from the existing operators into your operator (i.e. rebuild something that already already exists); with this approach you will have a big monolithic application that will be huge pain to maintain as you will continuously have to update your code whenever there is an update in the upstream operator
Hopefully it is clear by now that building your own operator for "operating" other operators comes with lot of painful dependencies and should not be the way to go.
Is it possible to deploy different configurations of images? Such as databases configured with different ports?
Good operators and helm charts let you do that out of the box, either via a respective CR / ConfigMap or a values.yaml file, however, that now depends on what solutions you are going to use. So in general the answer is: yes, it is possible if supported.
The question is tied more to CI/CD practices and infrastructure. In the release we follow, we club a set of microservices docker image tags as a single release, and do CI/CD pipeline and promote that version.yaml to staging and production - say a sort of Mono-release pattern. The problem with this is that at one point we need to serialize and other changes have to wait, till a mono-release is tested and tagged as ready for the next stage.A little more description regarding this here.
An alternate would be the micro-release strategy, where each microservice release in parallel through production through the CI/CD pipeline. But then would this mean that there would be as many pipelines as there are microservices? An alternate could have a single pipeline, but parallel test cases and a polling CD - sort of like GitOps way which takes the latest production tagged Docker images.
There seems precious little information regarding the way MS is released. Most talk about interface level or API level versioning and releasing, which is not really what I am after.
Assuming your organization is developing services in microservices architecture and is deploying in a kubernetes cluster, you must use some CD tool (continuous delivery tool) to release new microservices services, or even update a microservice.
Take a look in tools like Jenkins (https://www.jenkins.io), DroneIO (https://drone.io)... Some organizations use Python scripts, or Go and so on... I, personally, do not like this approch, I think the best solution is to pick a tool from CNCF Landscape (https://landscape.cncf.io/zoom=150) in Continuous Integration & Delivery group, these are tools test and used in the market.
An alternate would be the micro-release strategy, where each microservice release in parallel through production through the CI/CD pipeline. But then would this mean that there would be as many pipelines as there are microservices?
It's ok in some tools you have a parameterized pipeline thats build projects based in received parameters, but I think the best solution is to have one pipeline per service, and some parameterized pipelines to deploy, or apply specific tests, archive assets and so on... Like you say micro-release strategy
Agreed, there is little information about this out there. From all I understand the approach to keep one pipeline per service sounds reasonable. With a growing amount of microservices you will run into several problems:
how do you keep track of changes in the configuration
how do you test your services efficiently with regression and integration tests
how do you efficiently setup environments
The key here is most probably that you make better use of parameterized environment variables that you then look to version in an efficient manner. This will allow you to keep track of the changes in an efficient manner. To achieve this make sure to a.) strictly paramterize all variables in the container configs and the code and b.) organize the config variables in a way that allows you to inject them at runtime. This is a piece of content that I found helpful in regard to my point a.);
As for point b.) this is slightly more tricky. As it looks you are using Kubernetes so you might just want to pick something like helm-charts. The question is how you structure your config files and you have two options:
Use something like Kustomize which is a configuration management tool that will allow you to version to a certain degree following a GitOps approach. This comes (in my biased opinion) with a good amount of flaws. Git is ultimately not meant for configuration management, it's hard to follow changes, to build diffs, to identify the relevant history if you handle that amount of services.
You use a Continuous Delivery API (I work for one so make sure you question this sufficiently). CDAPIs connect to all your systems (CI pipelines, clusters, image registries, external resources (DBs, file storage), internal resources (elastic, redis) etc. They dynamically inject environment variables at run-time and create the manifests with each deployment. They cache these as so called "deployment sets". Deployment Sets are the representation of the state of an environment at deployment time. This approach has several advantages: It allows you to share, version, diff and relaunch any state any service and application were in at any given point in time. It provides a very clear and bullet proof audit auf anything in the setup. QA environments or test-feature environments can be spun of through the API or UI allowing for fully featured regression and integration tests.
I use DMSDK to ingest data; I have multiple custom flows to run following data ingestion. Instead of manually running the flows one by one, What is the best way to orchestrate MarkLogic data hub flows?
gradle, trigger or other scheduling tools?
I concur with Dave Cassel that NiFi, or perhaps something like MuleSoft, or maybe even Camel is a great way to manage running your flows. Particularly if you are talking about operational management.
To answer on other mechanisms:
Crontab doesn't connect to MarkLogic itself. You'd have to write scripts or code to make something actually happen. You won't have much control either, nor logging, unless you add that as well.
We have great plugins for Gradle that make running flows real easy. Great during development and such, but perhaps less suited for scheduling or operational tasking.
Triggers inside MarkLogic only respond to insertion of data, so you'd still have to initiate an update from outside anyhow.
Scheduled Tasks inside MarkLogic has similar limitations to Crontab and Gradle. It doesn't do much by itself, so you have to write code anyhow. It provides no logging by itself, nor ways to operationally manage the tasks, other than through Admin ui.
JAR package might depend on what JAR package you actually mean. You can create a JAR of your ml-gradle project, but that doesn't give you a lot of gain over calling Gradle itself.
Personally, I'd have a close look at the operational requirements. Think of for instance: need to get status overview, interrupt schedules, loops to retry at failure, built-in logging, and facilities to send notifications when attention is needed.
HTH!
There are a variety of answers that will work, of course; my preference is NiFi. This keeps any scheduling overhead outside of MarkLogic, with the trade-off that you'll need to have NiFi running.
So I've been investigating ci/cd pipelines using concourse and cloud foundry lately, and I've been confused about what the best way to do this is. So I've been thinking about how the overall flow would go from development to release. There are a lot of talks and videos that discuss this at a very high level, but often they abstract away too much of the actual implementation details for it to be useful. Like how do people actually roll this out in actual companies? I have a lot of questions, so I will try to list a few of them here in the hope that someone could enlighten me a little.
What does the overall process and pipeline look like conceptually from development to prod? So far I have something along the lines of :
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
code repo [master] -> build -> snapshot artifact repo -> deploy to test space -> run functional tests -> deploy to staging space -> run smoke tests and maybe other regression tests -> deploy artifact to prod -> monitoring/rollbacks (?)
What other things could/should be added to this pipeline or any part of this process?
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
I've been playing with the idea of creating spaces temporarily and then tearing them down for the functional testing phase, would there be any benefit to doing that? The idea is that the apps being deployed would have their own clean environments to use, but this could also potentially be slow, and it is difficult to know what services are required inside of each space. You would have to read the manifest, which only specifies service-names, which seems to necessitate some sort of canonical way of naming service instances within the same space? The alternative is managing a pool of spaces which also seems complicated...
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
I have many more burning questions, but I will cut it off here for now. I know the subjects have kind of bounced back and forth between Concourse and Cloud Foundry, but it seems when discussing CI/CD concepts the nitty gritty implementation details are often the actual tricky bits which tangle the two rather tightly together. I am also aware that the specific implementation details are often very specific to each company, but it would be great if people could talk about how they have implemented these pipelines / automated pipelines using Concourse and Cloud Foundry at their companies (if you can spare the details of course). Thanks everyone!
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Honestly it doesn't matter if you create multiple orgs in your CloudFoundry. If your CI/CD system runs on the same director that is (ab)used by other developers you going to have a hard time probably (i was there).
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
We are doing almost the exact thing. For PR's checkout jtarchie's PR resource here https://github.com/jtarchie/github-pullrequest-resource.
The only difference is that we are not using Github checks. The problem with them is, that you have to select a set of checks fixed for a branch.
But in case i just changed manifest xyz in the PR, i don't want to run all tests. You can overcome that problem by using the Github Status API only with the pending and successful status.
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
We make PR's into the develop branch and following the Git Flow system. Our releases are merged into master manually.
You want to check first which updates you want to carry out before you merge every PR into master and trigger an update of the production system. Your test cases might be good, but you can always miss something.
What other things could/should be added to this pipeline or any part of this process?
You can have a pipeline which updates releases/stemcells in your manifests automatically.
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
Test your stuff on a staging system before you go to production. Otherwise you a) won't know if the update is happening at zero downtime and b) to prevent a potential problem in production is always better than doing rollbacks.
Ofc you can also create a rollback pipeline, but if you come to that point something else might be wrong with your setup.
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
We write our manifests by ourselves and use the CI/CD system to update/deploy/test them.
But if you find a valid case and a concept which lets you apply your manifest generating pipeline for many cases, i would just try it out.
At the very end you have to decide if a certain atomisation holds a business value for your company.
cheers, Dennis
I've been researching cloud based CI systems for a while now and cannot seem to find any systems that can address a major need of mine.
I'm building CI processes for development on Salesforce, but this question is more generally about builds which rely on an external resource. In our builds, we deploy code into a cloud hosted Salesforce instance and then run the tests in that instance. During a build, the external resource is effectively locked and build failures will occur if two builds target the same external resource at the same time. This means that the normal concurrency model of cloud based CI systems would start tripping over the Salesforce instance (external resource) with a concurrency greater than 1.
To complicate things a bit more, we actually have 5 different external resources for each project (feature, master, packaging, beta, and release) and need to control the concurrency of any builds relying on an external resource to 1. For example, all our feature branches build against the feature external resource. We can identify these builds by the branch name which uses the pattern feature/* and need to ensure that only one feature build runs at a time. However, the feature build doesn't tie up the other 4 external resources so ideally any builds that would need those resources should still be able to run concurrently.
I currently accomplish this in Jenkins using the Throttle Concurrent Builds plugin and assign a throttle group to each build identifying the external resource it relies on. This has been successful at preventing concurrent builds from tripping over external resources.
A few clarifications:
I'm not asking how to reduce concurrency to 1 at the repo level. I know every cloud CI system can do that. I should be able to set repo concurrency to N external resources (in my case, 5).
Ideally, I'd like to be able to use a regex pattern on branch name as the "group" with which to block concurrence. So, a setting like: If branch name matches 'feature/.*' then limit concurrency to 1. I want to avoid having to manually configure new feature branches in the build system and instead match on pattern.
I have to say, it's been nearly impossible to find a restrictive Google search term that would help me answer this question. Hopefully someone out there has faced this problem before and can shed some light for me :)
With Jenkins Pipeline plugin you can set the stage concurrency to 1 - and only 1 thing will pass through that stage at a time. The stage was designed to be able to represent things like this.
https://www.cloudbees.com/blog/parallelism-and-distributed-builds-jenkins
stage "build"
node {
sh './test-the-awesome'
}
stage name: "environment test", concurrency: 1
node {
sh 'tests that lock the environment'
}
You can put the build pipeline in a Jenkinsfile in a repo too: https://documentation.cloudbees.com/docs/cookbook/pipeline-as-code.html (so any branches that build, also obey that lock).
As pointed out by #Jesse Glick in the comments below, perhaps a more general solution (not yet compatible with pipeline) is to use the Lockable Resources Plugin - which will then work across jobs, of any type.
I accomplish this with a Drone.io setup.
Essentially, I use a grunt plugin to access a Redis db hosted externally. It provides semaphore locking on any param you'd like.
Determine if the lock is free for that Env.
If so that Env's Key with a reasonable timeouts
Run the tests
Clear the lock
If the lock is held, get it's expiration time, and sleep until then.
I am not aware of any cloud based CI tools that can manage external resources the way you want to, unless you include the logic as part of the build script, which you've already said you'd prefer not to do. If you decide you want to do that you could do it with Snap CI or Drone or any of the other cloud tools I imagine.
In this sort of situation, I would usually recommend an agent-based system such as Go.cd