I am training models using MLFlow on DataBricks and outputing the final models onto S3. Than, using Seldon-Core to to package AND deploy the models to AWS EKS.
I am looking for the tool that bridges the gap by taking the model from S3, packages it into a docker container, and using Seldon-Core K8S template to push it to AWS EKS.
I believe the tool that seem to fit the job is Kubeflow Pipelines. Other contenders are Jenkins, Gitlab, and TravisCI.
Is Kubeflow the absolute right tool for the job and what are the pros / cons of Kubeflow vs the other guys? if anyone has already done the research of maybe even built the pipeline...
GitLab actually does exactly what Kubeflow Pipelines out of the box, it is similar Yaml to CircleCI or TravisCI. I ended up using that for an alternative to Kubeflow Pipelines.
Regarding Kubeflow...
After experimenting with Kubeflow at version 0.5 and 0.6 our feeling was that is quite unstable yet. Installation never went smooth neither into MiniKube ( local K8S ) not into the AWS EKS. For MiniKube the install scripts from the documentation are broken and you will be able to see many people having issues and editing the install scripts by hand ( which is what I had to do to get it install properly ). On EKS we were not able to install 0.5 and had to install a much older version. Kubeflow wants to manage worker nodes in a particular manner and our security policies to not allow that, only in an order version you can overwrite that option.
Kubeflow is also switching to Kuztomize and it is not stable yet, so if you use it now you will be using Ksonnet which is not supported anymore and you will learn a tool that you will through out the window sooner or later.
All in all, should wait for version 1.0 but Gitlab does an awesome job as an alternative to kubeflow Pipelines.
Hope this help other who have the same thoughts
Related
Both Github Actions and Bitbucket Pipelines seem to fill similar functions at a surface level. Is it trivial to migrate the YAML for Actions into a Pipeline - or do they operate fundamentally differently?
For example: running something simple like SuperLinter (used on Github Actions) on Bitbucket Pipelines.
I've searched for examples or explanations of the migration process but with little success so far - perhabs they're just not compatible or am I missing something. This is my first time using Bitbucket over Github. Any resources and tips welcome.
They are absolutely unrelated CI systems and there is no straightforward migration path from one to another.
Both systems base their definitions in YAML, just like GitLab-CI, but the only thing that can be reused is your very YAML knowledge (syntax and anchors).
As CI systems, both will start some kind of agent to run a list of instructions, a script, so you can probably reuse most of the ideas of your scripts. But the execution environment is very different so be ready to write tons of tweaks like Benjamin commented.
E.g: about that "superlinter", just forget about it. Instead, Bitbucket Pipelines has a concept of pipes which have a similar purpose but are implemented in a rather different approach.
Another key difference: GHA runs on VMs and you configure whatever you need with "setup-actions". BBP runs on docker containers that should feature most of the runtime and tooling you will need upfront, as "setup-pipes" can not exist. So you will end up installing tooling on every run (via apt, yum, apk, wget...) so as to not maintain and keep updated a crazy amount of images with tooling and language runtimes: https://stackoverflow.com/a/72959639/11715259
I've been looking through the site and I have found some information with regards to this topic, but most of the information is old and possibly outdated.
example: Continuous Integration tools
We are: We're a SaaS product with a microservice (200+) architecture.
We have: We currently do our building through bamboo, and we use nexus as an artifact manager with proper versioning. We deploy those artifacts using bamboo to many different machines. For our frontend deployment we build our code through continua and use AWS codedeploy to handle the deployment. We use Bitbucket and Jira for our development. We have done a POC with bitbucket pipelines but we were lacking proper version management there as well as proper environment management. Setting up 10 servers for every repository manually is just something that we don't want to do.
We want: Since bamboo is EOL next year and since there are many alternatives with different levels of complexity we are currently unsure about the tools that are most suited to our needs. We are currently running everything on dedicated linux machines, but we want to switch to docker containers in AWS in the near future. Support for running gulp scripts etc. would be great since that could help us move from continua and bamboo to one single solution.
The setup of bamboo has been a struggle in the past due to difficulties with the software itself. A nice balance between features and complexity would be best. Does anybody have experience with one or more of the options out there? Some that come to mind are CircleCi, teamCity, GitLab, Jenkins and AWS codePipeline.
Many thanks,
Kenny
Bamboo doesn't EOL next year, but Atlassian forces to switch from perpetual licenses to DC licences to be renewed every year. You can get discount prices when switch to Server to DC licenses. See details at https://www.atlassian.com/licensing/data-center
I would propose Kraken CI. It is open-source and can work on-premise but in the cloud as well. In the cloud, it has support for AWS and Azure, and can do autoscaling depending on a number of tasks.
If you are interested please contact me.
I have a node.js server that communicates with a MongoDB database. As part of the continuous-integration process I'd like to spin up a MongoDB database and run my tests against the server + DB.
With bitbucket pipelines I can spin up a container that has both node.js and MongoDB. I then run my tests against this setup.
What would be the best way to achieve this with Visual Studio Team Services? Some options that come to mind:
1) Hosted pipelines seem easiest but they don't have MongoDB on them. I could use Tool Installers, but there's no mention of a MongoDB installer, and in fact I don't see any tool installer in my list of available tasks. Also, it is mentioned that there is no admin access to the hosted pipeline machines and I believe MongoDB requires admin access. Lastly, downloading and installing Mongo takes quite a bit of time.
2) Set up my own private pipeline - i.e. a VM with Node + Mongo, and install the pipeline agent on it. Do I have to spin up a dedicate Azure instance for this? Will this instance be torn down and set up again on each test run, or will it remain up between test runs (meaning I have to take extra care to clean it up)?
3) Magically use a container in the pipeline through an option that I haven't yet discovered...?
I'd really like to use a container to run my tests because then I can use the same container locally during the development process, rather than having to maintain multiple environments. Can this be done?
So as it turns out, VSTS now has Docker support in its pipeline (when I wrote my question it was in beta and I didn't find it for whatever reason). It can be found at https://marketplace.visualstudio.com/items?itemName=ms-vscs-rm.docker.
This command allows you to spin up a container of your choice and run a single command on it. If this command is to be synchronously run as part of the pipeline, then Run in Background needs to be unchecked (this will be the case for regular build commands, I guess). I ended up pushing a build script into my git repository and running it on a container.
And re. my question in (2) above - machines in private pipelines aren't cleaned up between pipeline runs.
We have a distributed system with many services which talk to each other.
Sometimes a code change in one service will require a feature to have been deployed in another service.
We use octopus to deploy all the things which is cool but we really want to prevent services from being deployed before the things they depend on are deployed.
Is there a way we can do this with octopus deploy?
For example can I make the nuget package for one service depend on an explicit version range of another package?
If you don't want to deploy all your projects as one massive deployment with a series of steps that push your different services to different machines, then I don't think there's a built-in way to make your deployments dependent on each other's version numbers like that. (see this uservoice suggestion in Octopus asking for that very feature)
However, I do think that you could write a powershell script that ran as a pre-deployment step and checked the version number of one nuget package against a version range stored in another. Then the ps script could halt or allow deployment accordingly.
We develop a software for Linux and Mac using C++ and python. So far we have installed all required packages into virtualenv using pip. Now 3rdParty libraries take substantial amount of time to compile. We want to speed up build process on the build servers.
One way is not to wipe out build agent workspace between builds. Is it possible when using Amazon EC2 servers?
The following Jenkins plugin can be used to copy files into the slave workspace.
https://wiki.jenkins-ci.org/display/JENKINS/Copy+To+Slave+Plugin
After you can get instance to its base state, you can use that to create an AMI. Now if you launch with that AMI in the future, all the libraries should be in place. At that point you can do any additional bootstrapping you need.
It will use your existing key unless you prep the instance before creating an AMI to use the key provided at launch.