Handling multiple branches in continuous integration - continuous-integration

I've been dealing with the problem of scaling CI at my company and at the same time trying to figure out which approach to take when it comes to CI and multiple branches. There is a similar question at stackoverflow, Multiple feature branches and continuous integration. I've started a new one because I'd like to get more of discussion and provide some analysis in the question.
So far I've found that there are 2 main approaches that I can take (or maybe some others???).
Multiple set of jobs (talking about Jenkins/Hudson here) per branch
Write tooling to manage the extra jobs
Create/modify/delete Jobs in bulk
Custom settings for each job per branch (SCM url, dep management repos duplications)
Some examples of people tackling this problem with shell tools, ant scripts and Jenkins CLI. See:
http://jenkins.361315.n4.nabble.com/Multiple-branches-best-practice-td2306578.html
http://jenkins.361315.n4.nabble.com/Is-it-possible-to-handle-multiple-branches-where-some-jobs-should-run-on-each-one-without-duplicatin-td954729.html
http://jenkins.361315.n4.nabble.com/Parallel-development-with-branches-td1013013.html
Configure or Create hudson job automatically
Will cause more load on your CI cluster
Feedback cycle for devs slows down (if the infrastructure cannot handle the new load)
Multiple set of jobs per 2 branches (dev & stable)
Manage the two sets manually (if you change the conf of a job then be sure to change in the other branch)
PITA but at least so few to manage
Other extra branches won't get a full test suite before they get pushed to dev
Unsatisfied devs. Why should a dev care about CI scaling problems. He has a simple request, when I branch I would like to test my code. Simple.
So it seems if I want to provide devs with CI for their own custom branches I need special tooling for Jenkins (API or shellscripts or something?) and handle scaling. Or I can tell them to merge more often to DEV and live without CI on custom branches. Which one would you take or are there other options?

When you talk about scaling CI you're really talking about scaling the use of your CI server to handle all your feature branches along with your mainline. Initially this looks like a good approach as the developers in a branch get all the advantages of the automated testing that the CI jobs include. However, you run into problems managing the CI server jobs (like you have discovered) and more importantly, you aren't really doing CI. Yes, you are using a CI server, but you aren't continuously integrating the code from all of your developers.
Performing real CI means that all of your developers are committing regularly to the mainline. Easy to say, but the hard part is doing it without breaking your application. I highly recommend you look at Continuous Delivery, especially the Keeping Your Application Releasable section in Chapter 13: Managing Components and Dependencies. The main points are:
Hide new functionality until it's finished (A.K.A Feature Toggles).
Make all changes incrementally as a series of small changes, each of which is releasable.
Use branch by abstraction to make large-scale changes to the codebase.
Use components to decouple parts of your application that change at different rates.
They are pretty self explanatory except branch by abstraction. This is just a fancy term for:
Create an abstraction over the part of the system that you need to change.
Refactor the rest of the system to use the abstraction layer.
Create a new implementation, which is not part of the production code path until complete.
Update your abstraction layer to delegate to your new implementation.
Remove the old implementation.
Remove the abstraction layer if it is no longer appropriate.
The following paragraph from the Branches, Streams, and Continuous Integration section in Chapter 14: Advanced Version Control summarises the impacts.
The incremental approach certainly requires more discipline and care - and indeed more creativity - than creating a branch and diving gung-ho into re-architecting and developing new functionality. But it significantly reduces the risk of your changes breaking the application, and will save your and your team a great deal of time merging, fixing breakages, and getting your application into a deployable state.
It takes quite a mind shift to give up feature branches and you will always get resistance. In my experience this resistance is based on developers not feeling safe committing code the the mainline and this is a reasonable concern. This in turn usually stems from a lack of knowledge, confidence or experience with the techniques listed above and possibly with the lack of confidence with your automated tests. The former can be solved with training and developer support. The latter is a far more difficult problem to deal with, however branching doesn't provide any extra real safety, it just defers the problem until the developers feel confident enough with their code.

I would set up separate jobs for each branch. I've done this before and it isn't hard to manage and set up if you've set up Hudson/Jenkins correctly. A quick way to create multiple jobs is to copy from an existing job that has similar requirements and modify them as needed. I'm not sure if you want to allow each developer to setup their own jobs for their own branches, but it isn't much work for one person (i.e. a build manager) to manage. Once the custom branches have been merged into stable branches, corresponding jobs can be removed when they are no longer necessary.
If you're worried about the load on the CI server, you could set up separate instances of the CI or even separate slaves to help balance the load across multiple servers. Make sure that the server you are running Hudson/Jenkins on is adequate. I've used Apache Tomcat and just had to ensure that it had enough memory and processing power to process the build queue.
It's important to be clear on what you want to achieve using CI and then figure out a way to implement it without much manual effort or duplication. There's nothing wrong with using other external tools or scripts that are executed by your CI server that help simplify your overall build management process.

I would choose dev+stable branches. And if you still want custom branches and afraid of the load, then why not move these custom ones to the cloud and let developers manage it themselves, e.g. http://cloudbees.com/dev.cb
This is the company where Kohsuke is now.
There is an Eclipse Tooling also, so if you are on Eclipse, you will have it tightly integrated right into dev env.

Actually what is really problematic is build isolation with feature branches. In our company we have a set of separate maven projects all be part of a larger distribution. These projects are maintained by different teams but for each distribution all projects need to be released. A featurebranch may now overlap from one project to another and thats when build isolation gets painfully. There are several solutions we've tried:
create separate snapshot repositories in nexus for each feature branch
share local repositories on dedicated slaves
use the repository-server-plugin with upstream repositories
build all within one job with one private repository
As a matter of fact, the last solution is the most promising. All other solutions lack in one or another way. Together with the job-dsl plugin it is easy to setup a new feature branch. simply copy and paste the groovy script, adapt branches and let the seed job create the new jobs. Make sure that the seed job removes nonmanaged jobs. Then you can easily scale with feature branches over different maven projects.
But as tom said well above, it would be nicer to overcome the necessity of feature branches and teach devs to integrate cleanly, but that is a longer process and the outcome is not clear with many legacy system parts you won't touch any more.
my 2 cents

Related

Best Practice for Having a Base Project and Multiple Similar Sub-projects

I have been writing an E-shop project for a customer and now I have signed a new similar contract with another customer. I was wondering what would be the best practice to continue the first project while staring the second so that the reusability is at maximum?
One way would be to change the first project to read all menu items, slider pictures, ... from the database so that I can deliver the same project to both customers with different databases. The benefit of this approach is that I have to manage only one project, but it leads me to gradually write a CMS, which is a time-consuming task.
The other solution would be to use Git. For example, I would fork the base project into two different projects. If the functionality I am writing is the base one, then I would push it into the base project; otherwise, I push it into the appropriate forked project.
Which one is a better approach in your opinion? Or you guys have any better idea?
Cheers,
Habib
There are a few things that need to be considered.
First of all, This project as you said has the capability to be sold more. So, you must think about how much is possible to make it dynamic via Configuration files, Hooks & Plugins to make the modification to the functionalities of the project through that. I know you have considered this already.
Second, Using a Core Repository and different forks for customization. (It's a great idea but needs proper discipline, workflow and manpower to make sure everything is fine-tuned and works properly )
It's highly recommended to make your application cloud-native and provide proper UAT/QAT Environment for test before launching on the production, And also implementing Test cases to be checked within the Git and CI/CD pipelines in order to prevent issues in the merge process.
I'm not certain about what you want, but if you want to develop an enterprise project that contains many features such as wallet, tracking, payment,... I think you can implement each service as a microservice and integrate all of them.
About git, I think it's better just for handling the source code and you had better use git module for handling microservice and just using branches for developing process
I have finally found some solutions that I would like to share with you guys. Let's divide differences into 2 big categories of data differences and code differences:
Differences in data
If the database in each project is different (e.g., the product has some features in one project and some other features in another project), then the best solution is to use NoSQLs such as MongoDB. In the first place, NoSQLs are designated to support databases that don't have well-defined data structures, and you don't know what features you may add to each entity at present or in the future. It completely applies to my scenario that each shop may have a different data structure. However, since my project is based on Laravel and it does not have built-in support for MongoDB, I have decided to design some key-value tables that haven't been so bad so far.
Differences in the code
Regarding differences in the code, I would definitely suggest branches in Git and other functionalities provided by Git repositories such as Gitlab repository mirroring. Each feature has a different branch in my code, and I can provide each customer with different functionalities by merging those branches I want to deliver to the customer.
All in all, you may take as much business logic as you can into the database since changing it in the future is more straightforward. On the other hand, you'd better keep themes in the code because every customer likes a different theme, and changing them in the code is easier than taking them to the database.

TFS Team Build - Testing to Production

I have scoured the internet to find out what I can on this, but have come away short. I need to know two things.
Firstly, is there a best practice for how TFS & Team Build should be used in a Development > Test > Production environment? I currently have my local VS get the latest files. Then I work on them & check them in. This creates a build that then pushes the published files into a location on the test server which IIS references. This creates my test environment. I wonder then what is the best practice for deploying this to a Live environment once testing is complete?
Secondly, off the back of the previous - my web application is connected to a database. So, the test version will point to a test database. But when this is then tested and put live, I will need that process to also make sure that any data connections are changed to the live database.
I am pretty much doing all this from scratch and am learning as I go along.
I'd suggest you to look at Microsoft Release Management since it's the tool that can help you to do exactly the things you mentioned. It can also be integrated with TFS.
In general, release management is:
the process of managing, planning, scheduling and controlling a
software build through different stages and environments; including
testing and deploying software releases.
Specifically, the tool that Microsoft offers would enable you to automate the release process, from development to production, keeping track of what and how everything is done when a particular stage is reached.
There's an MSDN article, Automate deployments with Release Management, that gives a good overview:
Basically, for each release path, you can define your own stages, each one made of a workflow (the so-called deployment sequence) containing the activities you want to perform using pre-defined machines from a pool.
It's possible to insert manual interventions/approvals if necessary, and the whole thing can be triggered automatically once your build is done.
Since you are pretty much in control of the actions performed on each machine in each stage (through the use of built-in or custom actions/components) it is also certainly possible to change configuration files, for example to test different scenarios, etc..
Another image to give you and idea of how it can be done:

Single package in continuous delivery pipeline when building in parallel

My company is using Jenkins for continuous integration and I'm trying to move towards CD. I'm using git hub as a code repository. Right now we are merging feature branches into a uat environment and when a particular feature has been accepted the feature branch will be merged to our production branch.
This is obviously dangerous because two changes could be tested together and deployed separately.
Ideally we would have a package tested and deployed without rebuilding but I'm having trouble seeing how this is possible. If two people work on two different features, the first is finished, packaged and goes into testing, the second is then finished and packaged without the first? But then how can I deploy the package without invalidating the testing of the other feature?
I'm not sure on the correct way to integrate features with a single deployable package.
Any help would be greatly appreciated.
Further,
If you look at http://ptgmedia.pearsoncmg.com/images/chap5_9780321601919/elementLinks/fig5_6.jpg
my concern is that check-in 1 can be deployed when it passes acceptance testing and that package will be deployed, but what if acceptance testing failed? Check-in 5 contains the same problem as check-in 1 so no deployment to production can be done until check-in 1 is fixed or removed. Removing the change would be annoying as there could be multiple commits to be removed, and a fix + testing could take a long time.
Continuous Delivery is an extension of Continuous Integration. CI is all about evaluating your changes in the context of everyone else's on a frequent basis (if you commit less than once per day it can't count as CI)
Branching, of any kind, is all about isolating change and so is fundamentally at odds with CI. Feature branching and CI are opposed.
What most organisations do is merge branches before testing. This compromises the value of the feature branch, but retains the value of CI. If you don't do this then the CI has little real value for the reasons that you describe - you are not evaluating changes in a realistic context.
Sorry but you can't have both, they are opposites!
Regarding the difference in cycle time of hotfixes vs less critical things have you looked into feature toggles? http://martinfowler.com/bliki/FeatureToggle.html
If you want to do Continuous Delivery then branching is a no-no. Well, mostly. Releases should be tagged in SCM, the fix applied to release and merged back into HEAD.
You should also have automated tests to prove the fix actually fixes the problem. This might be hard in some circumstances. In that case the minimum you should do is verify the fix doesn't break existing behaviour (if that's the intention of the fix).
Feature toggles are good, so is branching by abstraction, however in practice this is adopted only by the most mature and experienced teams who have adopted CD. I suspect you're not at that point yet, so this will help you overcome your bump until you're more comfortable with CD.
If two features are supposed to be deployed at the same time, then I guess you should use the TDD principle of creating a FAILING test first, then implementing code to make it go green. Check that test in, so no build can move forward until you've got it implemented. This will make it absolutely clear this build isn't destined for production, as the feature isn't complete. Not a good idea for this test to be a CI, but at a latest phase of testing... providing you have multiple test phases that is!

Benefits of CI for highly modularized projects

There has been some discussion in abandoning our CI system (Hudson FWIW) due to the fact that our projects are somewhat segmented. Without revealing too much, you can think of each project as similar to a web site project: it has dependencies, its own unit tests, etc.
It seems like one of the major benefits of CI is to make sure that each component of a project works together, but aside from project inheritance most of our projects are standalone and unit tested fairly well.
Given what I have explained here (the oddity in our project organization); can anyone explain any benefits of CI for segmented\modular\many projects?
So far as I can tell, this is the only good reason I've found:
“Bugs are also cumulative. The more bugs you have, the harder it is to remove each one. This is partly because you get bug interactions, where failures show as the result of multiple faults - making each fault harder to find. It's also psychological - people have less energy to find and get rid of bugs when there are many of them - a phenomenon that the Pragmatic Programmers call the Broken Windows syndrome.”
From here: http://martinfowler.com/articles/continuousIntegration.html#BenefitsOfContinuousIntegration
I would use Hudson for the following reasons:
Ensuring that your projects build/compile properly.
Building jobs dependent on the build success of other jobs.
Ensuring that your code adheres to agreed-upon coding standards.
Running unit tests.
Notifying development team of any issues found.
If the number of projects steadily increases, you will find the need to be able to manage each one effectively, especially considering the above reasons for doing so.
In your situation, you can benefit from CI in (at least) these two ways:
You can let the CI server run certain larger test suites automatically after each subversion/... check-in. Especially those which test the interaction of different modules, hence the name continuous integration. This takes away the maintenance work and waiting time from the developers when they consider a check-in. Some CI (e.g. Hudson) also can be configured to automatically build modules when a depending module is build. This way you can let it automatically test if depending modules are compatible with the new version of the changed one.
You can let the CI server publish the new artifacts to the repository of a dependency resolver (e.g., Ivy, Maven). This way, the various modules can automatically download the latest (stable) revisions of the modules they depend on. Combine this point with the previous one and imagine the possibilities (!!!).

How to migrate from "Arcane Integration" to Continuous Integration?

Right now a project I'm working on has reached a level of complexity that requires more than a few steps (actually its become arcane!) to produce a complete/usable product. And unfortunately we didn't start out with a Continuos Integration mindset, so as you can imagine its kind of painful at times, and at others I can easily waste half a day trying to get a clean/tested build.
Anyways as any HUGE project it consists of many components in many different languages (not only enterprise style Java or C# for example), as well as many graphical, and textual resources. Now the problem is that when I look for Continuos Integration, I always find best practices and techniques that assume one is starting a new project, from the ground up. However this isn't a new project, so I was wondering what are some good resources to proactively start migrating from Arcane Integration towards Continuos Integration :)
Thanks in advance!
Here it is in two simple (hah) steps.
Go for the repeatable build:
Use source control, get all code checked in.
Establish and document all tools used to build (mainly, which compiler version). Have a repeatable deployment and set up process for these tools.
Establish and document clearly any resources which are necessary to build, but are not checked in (third party installations, service packs, etc). Have a repeatable deployment and set up process for these dependencies.
Before commiting to source control, developers must
update their working copy
successfully build
run and pass automated tests
These steps can be done 1 at a time, sort of a path to follow. You'll get benefits at each stage. For example, if you aren't using source control at all, just getting the code into source control (without anything else) is a big step forward. Also, if there are no automated tests, then developers can't run them - but they can still get the prior commits and get the compiler to check their work.
If you can do all of these, you'll get to a nice sane place.
The goals are repeatable build processes and developers that are plugged in to how their changes affect the build and other developers.
Then you can reap the bonuses by establishing higher compliance:
Developers establish a frequent commit habit. Code that is in the working copy should never be more than 1 day old.
Automated build process monitors source control for check-ins and gets the results to a place where the users can accept them (such as a test environment, a preview website, or even simply placing an .exe where the user can find it).
The same way you eat an elephant (one bite at a time) ;-) Continuous integration requires an automated build. Start with that. Automate the building of each piece. Ant or NAnt is a great way to do this. Have each component's construction be a NAnt task. Then your entire system build can aggregate those individual tasks.
From there, you can add tasks for deployment, for unit testing, etc. If you want to use a CI technology, you can wire it up to your NAnt build.
I would start by first writing down all the steps it takes you to do the build and test manually. After that you at least have a guide for doing it the old way, and writing things down gives you the chance to look at it as a complete process.
Then look for parts to script.
Ideally you want to trigger a build and test from a code commit and only rebuild and retest the changed parts, with perhaps a full build and test nightly or weekly. You'll need log files or database entries and reports on the build success or lack of it.
You'll want to search out and evaluate pre-built products and open-source build-your-own kits. You can certainly write all the scripting and reporting yourself, but it will take a while and you'll probably end up with a just barely good enough reporting system since your job is coding the product, not coding the build system. :-)
I would guess that migrating isn't really an option--Half-ass solutions will only make it worse.
My approach would be to take one creative engineer who understands the build process, sit him down and say "Fix this". Give him a week or two.
The end goal would be a process that runs beginning to end with a single make command.
I also recommend an automated "Setup" procedure where you simply do a checkout and run a batch file from a network share to install and build all your tools. The amount of time this will save overall is staggering if you bring in new programmers. Most projects take one to three days to get set up on a new computer--and it's always the "new" programmer who doesn't know what's going on doing the installs on his own system...
In short: Incrementally
Choose a framework that will work across the diverse range of projects.
One by one, add components to the framework.
If you are not familiar with the framework, tackle a couple of the easier components first, to reduce risk of screwing up.
If you do understand the framework, tackle some of the more difficult and/or commonly built components first, so your team (and management) will appreciate the benefits early, and support the effort more.
Be sure to have a plan to include all of your components, because that's when the full benefit will be realized.
Bring your team with you; make sure you have consensus that this is going to be valuable, or people won't maintain it as the components change.

Resources