Using CI to Build Interdependent Projects in Isolation

Using CI to Build Interdependent Projects in Isolation - continuous-integration

So, I have an interesting situation. I have a group that is interested in using CI to help catch developer errors. Great - we are all for that. The problem I am having wrapping my head around things is that they want to build interdependent projects isolated from one another.
For example, we have a solution for our common/shared libraries that contains multiple projects, some of which depend on others in the solution. I would expect that if someone submits a change to one of the projects in the solution, the CI server would try to build the solution. After all, the solution is aware of dependencies and will build things, optimized, in the correct order.
Instead, they have broken out each project and are attempting to build them independently, without regard for dependencies. This causes other errors because dependent DLLs might not yet exist(!) and a possible chicken-and-egg problem if related changes were made in two or more projects.
This results in a lot of emails about broken builds (one for each project), even if the last set of changes did not really break anything! When I raised this issue, I was told that this was a known issue and the CI server would "rebuild things a few times to work out the dependencies!"
This sounds like a bass-ackwards way to do CI builds. But maybe that is because I am older and ignorant of newer trends - so, as anyone known of a CI setup that builds interdependent projects independently? Any for what good reason?
Oh, and they are expecting us to use the build outputs from the CI process, and each built DLL gets a potentially different version number. So we no longer have a single version number for the output of all related DLLs. The best we can ascertain is that something was built on a specific calendar day.
I seem to be unable to convince them that this is A Bad Thing.
So, what am I missing here?
Thanks!
Peace!

I second your opinion.
You risk spending more time dealing with the unnecessary noise than with the actual issues. And repeating portions of the CI verification pipeline only extends the overall CI execution time, which goes against the CI goal of reducing the feedback loop.
If anything you should try to bring as many dependent projects as possible (ideally all of them) under the same CI umbrella, to maximize the benefit of fast feedback of breakages/regressions on any part of the system as a whole.
Personally I also advocate using a gating CI system (based on pre-commit verifications) to prevent regressions rather than just detecting them and relying on human intervention for repairs.

Related

Single package in continuous delivery pipeline when building in parallel

My company is using Jenkins for continuous integration and I'm trying to move towards CD. I'm using git hub as a code repository. Right now we are merging feature branches into a uat environment and when a particular feature has been accepted the feature branch will be merged to our production branch.
This is obviously dangerous because two changes could be tested together and deployed separately.
Ideally we would have a package tested and deployed without rebuilding but I'm having trouble seeing how this is possible. If two people work on two different features, the first is finished, packaged and goes into testing, the second is then finished and packaged without the first? But then how can I deploy the package without invalidating the testing of the other feature?
I'm not sure on the correct way to integrate features with a single deployable package.
Any help would be greatly appreciated.
Further,
If you look at http://ptgmedia.pearsoncmg.com/images/chap5_9780321601919/elementLinks/fig5_6.jpg
my concern is that check-in 1 can be deployed when it passes acceptance testing and that package will be deployed, but what if acceptance testing failed? Check-in 5 contains the same problem as check-in 1 so no deployment to production can be done until check-in 1 is fixed or removed. Removing the change would be annoying as there could be multiple commits to be removed, and a fix + testing could take a long time.

Continuous Delivery is an extension of Continuous Integration. CI is all about evaluating your changes in the context of everyone else's on a frequent basis (if you commit less than once per day it can't count as CI)
Branching, of any kind, is all about isolating change and so is fundamentally at odds with CI. Feature branching and CI are opposed.
What most organisations do is merge branches before testing. This compromises the value of the feature branch, but retains the value of CI. If you don't do this then the CI has little real value for the reasons that you describe - you are not evaluating changes in a realistic context.
Sorry but you can't have both, they are opposites!

Regarding the difference in cycle time of hotfixes vs less critical things have you looked into feature toggles? http://martinfowler.com/bliki/FeatureToggle.html

If you want to do Continuous Delivery then branching is a no-no. Well, mostly. Releases should be tagged in SCM, the fix applied to release and merged back into HEAD.
You should also have automated tests to prove the fix actually fixes the problem. This might be hard in some circumstances. In that case the minimum you should do is verify the fix doesn't break existing behaviour (if that's the intention of the fix).
Feature toggles are good, so is branching by abstraction, however in practice this is adopted only by the most mature and experienced teams who have adopted CD. I suspect you're not at that point yet, so this will help you overcome your bump until you're more comfortable with CD.
If two features are supposed to be deployed at the same time, then I guess you should use the TDD principle of creating a FAILING test first, then implementing code to make it go green. Check that test in, so no build can move forward until you've got it implemented. This will make it absolutely clear this build isn't destined for production, as the feature isn't complete. Not a good idea for this test to be a CI, but at a latest phase of testing... providing you have multiple test phases that is!

To Clean or not to Clean

I work on a medium sized project that uses continuous integration to perform regular builds. Our project has quite a long build time at the moment (45-55 mins) and we have been looking at what optimizations can be made to reduce this time.
One of the optimizations that has been suggested is to eliminate the clean step that we have at the start of every build ie delete the entire build directory and get all source files from source control. Instead, just retrieve the files that have been changed and start a new build. Rough estimates put this at saving us 10-20 mins per build but the suggestion made me a little uncomfortable.
So I turn to the Stack Overflow community to see what the best practise is... does your continuous integration always do a clean build? Are there particular reasons for and/or against this?
I'll post my thoughts below in an attempt not to bias anyone but am really interested to hear some other opinions.

For continuous integration, it's important to have a rapid turnaround. I have made that trade-off before, and I would say it's worth it. Occasionally, it will allow things to slip through, but the gains in getting feedback sooner are worth it.
In addition to performing frequent incremental builds, have less frequent clean builds. This gives you most of the benefits of both approaches.

The reason for continuous integration is to identify all problems early, not some of the problems quickly and the others - well, when ever.
Not cleaning provides an opportunity for a problem to go undetected for a significant amount of time. Most partial build systems still rely on file time stamps for integrity, need I say more.
Cleaning is often the only way to be certain the build is good. An alternate may be to clean periodically (say nightly), so the worst case is a day before a problem is detected (is that early enough?).
Whats you budget improved build servers. Can you make your build go faster - optimization, more/faster hardware, parallel build steps, faster compiler etc. Can you go to a faster build tool such as scons or similar, that will make use of the all 8 CPUs in your build server (particularly if you use make)?
I would Clean.

Continuous Integration is all about repeatability. If you can produce a reliable build each time without cleaning do it. The problem with not removing the build directory is that file that are removed from SCM might not get removed from the build directory, and as a result could mess up deployments and testing.
Personally I would recommend cleaning your build directory, but not deleting your source. This assumes your SCM can sync your source correctly.
It takes ~15 minutes to clean? Thats a pretty long time, I would be interested in knowing what is taking so long.

Benefits of CI for highly modularized projects

There has been some discussion in abandoning our CI system (Hudson FWIW) due to the fact that our projects are somewhat segmented. Without revealing too much, you can think of each project as similar to a web site project: it has dependencies, its own unit tests, etc.
It seems like one of the major benefits of CI is to make sure that each component of a project works together, but aside from project inheritance most of our projects are standalone and unit tested fairly well.
Given what I have explained here (the oddity in our project organization); can anyone explain any benefits of CI for segmented\modular\many projects?
So far as I can tell, this is the only good reason I've found:
“Bugs are also cumulative. The more bugs you have, the harder it is to remove each one. This is partly because you get bug interactions, where failures show as the result of multiple faults - making each fault harder to find. It's also psychological - people have less energy to find and get rid of bugs when there are many of them - a phenomenon that the Pragmatic Programmers call the Broken Windows syndrome.”
From here: http://martinfowler.com/articles/continuousIntegration.html#BenefitsOfContinuousIntegration

I would use Hudson for the following reasons:
Ensuring that your projects build/compile properly.
Building jobs dependent on the build success of other jobs.
Ensuring that your code adheres to agreed-upon coding standards.
Running unit tests.
Notifying development team of any issues found.
If the number of projects steadily increases, you will find the need to be able to manage each one effectively, especially considering the above reasons for doing so.

In your situation, you can benefit from CI in (at least) these two ways:
You can let the CI server run certain larger test suites automatically after each subversion/... check-in. Especially those which test the interaction of different modules, hence the name continuous integration. This takes away the maintenance work and waiting time from the developers when they consider a check-in. Some CI (e.g. Hudson) also can be configured to automatically build modules when a depending module is build. This way you can let it automatically test if depending modules are compatible with the new version of the changed one.
You can let the CI server publish the new artifacts to the repository of a dependency resolver (e.g., Ivy, Maven). This way, the various modules can automatically download the latest (stable) revisions of the modules they depend on. Combine this point with the previous one and imagine the possibilities (!!!).

Recommended number of projects in Visual Studio Solution

We are starting to develop new application that will include something like 30-50 projects developed by about dozen of developers with C# in MS Visual Studio.
I am working on componentize the application modules in order to support the architecture and enable parallel work.
We have argue: how many solutions should we have?
Some claim that we should have 1-2 solutions with 15-30 projects each. Some claim that we need a solution per component that means about 10-12 solutions with about 3-6 projects each.
I would be happy to hear pros/cons and experience with each direction (or other direction thereof)

I've worked on products on both extremes: one with ~100 projects in a single solution, and one with >20 solutions, of 4-5 projects each (Test, Business Layer, API, etc).
Each approach has its advantages and disadvantages.
A single solution is very useful when making changes - its easier to work with dependencies, and allows refactoring tools to work well. It does however, result in longer load times and longer build times.
Multiple solutions can help enforce separation of concerns, and keep build/load times low, and may be well suited to having multiple teams with narrower focus, and well defined service boundaries. They do however, have a large drawback when it comes to refactoring, since many references are file, not project references.
Maybe there's room for a hybrid approach use smaller solutions for the most part, but create a single including all projects for times when larger scale changes are required. Of course, you then have to maintain two separate solutions...
Finally, the structure of your projects and dependencies will have some influence on how you organize your solutions.
And keep in mind, in the long run RAM is cheaper than programmer time...

Solutions are really there for dependency management, so you can have project in more that one solution, if more than one thing depends on it. The number of solutions should really depend on your dependency graph.
Edit: This means you shouldn't be sticking projects that are not dependent on each other into the same solution, as it creates the illusion of dependency which means someone could create a real dependency when two projects should really be independent.

I've worked on a solution with close to 200 projects. It's not a big deal if you have enough RAM :).
One important thing to remember is that is projects depend on each other (be it with Dependencies or References), they should probably be in the same solution. Otherwise you get strange behavior when different projects have different dependencies in different solutions.

You want to maintain project references. If you can safely break up your solution with two or more discrete sets of projects that depend on each other, then do it. If you can't, then they all belong together.

We have a solution that has approximately 130 projects. About 3 years ago when we are using vs.net 2003 it was a terrible problem. Sometimes solution and VSS were crashing.
But now with VS.NET 2005 it's ok. Only loading is taking much time. Some of my coworkers unloading projects that they don't use. It's another option to speed up.
Changing build type to release is an another problem. But we have MSBuild scripts now. We do not use relese build of VS.NET no more.

I think you should not exaggerate your number of projects/solutions. Componentize what can
and will be reused, otherwise don't componentize!
It will only make things less transparent and increase build times. Partitioning can also be done within a project using folder or using a logical class structure.

When deciding what number of projects vs solutions do you need, you need to concider some questions:
logical layers of your application;
dependency between projects;
how projects are built;
who works with what projects;
Currently we have 1 solution with 70 projects.
For our continous integration we created 5 msbuild projects, so CI does not build our development solution.
Previously, we had separate solution for presentation (web and related projects) layer in separate git repository. This solution was used by outsource and freelance web developers.

I am working with a solution that has 405 projects currently. On a really fast machine this is doable, but only with current Visual Studio 2017 or 2012. Other versions crash frequently.

I don't think the actual number of solutions matters. Much more important is that you break the thing up along functional lines. As a silly, contrived example if you have a clutch of libraries that handles interop with foreign web services, that would be a solution; an EXE with the DLLs it needs to work would be another.

Only thing about so many projects in one solution is that the references and build order start to get confusing.
As a general rule I'd gravitate toward decreasing the number of projects (make the project a little more generic) and have devs share source control on those projects, but separate the functionality within those projects by sub-namespaces.

You should have as many as you need. There is no hard limit or best practice. Practice and read about what projects and solutions are and then make the proper engineering decisions about what you need from there.

It has been said in other answers however it is probably worth saying again.
Project dependencies are good in that they can rebuild dependent projects if the dependency has a change.
If you do a assembly file dependency there is no way that VS or MSBuild is going to know that a string of projects need to be built. What will happen is that on the first build the project that has changed will be rebuilt. If you have put the dependency on the build output then at least on the second build the project dependent on that will build. But then you have an unknown number of builds needed to get to the end of the chain.
With project dependencies it will sort it all out for you.
So the answer that says have as many (or few) as needed to ensure project dependencies are used.
If your team is broken down into functional areas that have a more formal release mechanism rather than checkin of source code then splitting it down those lines would be the way to go, otherwise the dependency map is your friend.

How to migrate from "Arcane Integration" to Continuous Integration?

Right now a project I'm working on has reached a level of complexity that requires more than a few steps (actually its become arcane!) to produce a complete/usable product. And unfortunately we didn't start out with a Continuos Integration mindset, so as you can imagine its kind of painful at times, and at others I can easily waste half a day trying to get a clean/tested build.
Anyways as any HUGE project it consists of many components in many different languages (not only enterprise style Java or C# for example), as well as many graphical, and textual resources. Now the problem is that when I look for Continuos Integration, I always find best practices and techniques that assume one is starting a new project, from the ground up. However this isn't a new project, so I was wondering what are some good resources to proactively start migrating from Arcane Integration towards Continuos Integration :)
Thanks in advance!

Here it is in two simple (hah) steps.
Go for the repeatable build:
Use source control, get all code checked in.
Establish and document all tools used to build (mainly, which compiler version). Have a repeatable deployment and set up process for these tools.
Establish and document clearly any resources which are necessary to build, but are not checked in (third party installations, service packs, etc). Have a repeatable deployment and set up process for these dependencies.
Before commiting to source control, developers must
update their working copy
successfully build
run and pass automated tests
These steps can be done 1 at a time, sort of a path to follow. You'll get benefits at each stage. For example, if you aren't using source control at all, just getting the code into source control (without anything else) is a big step forward. Also, if there are no automated tests, then developers can't run them - but they can still get the prior commits and get the compiler to check their work.
If you can do all of these, you'll get to a nice sane place.
The goals are repeatable build processes and developers that are plugged in to how their changes affect the build and other developers.
Then you can reap the bonuses by establishing higher compliance:
Developers establish a frequent commit habit. Code that is in the working copy should never be more than 1 day old.
Automated build process monitors source control for check-ins and gets the results to a place where the users can accept them (such as a test environment, a preview website, or even simply placing an .exe where the user can find it).

The same way you eat an elephant (one bite at a time) ;-) Continuous integration requires an automated build. Start with that. Automate the building of each piece. Ant or NAnt is a great way to do this. Have each component's construction be a NAnt task. Then your entire system build can aggregate those individual tasks.
From there, you can add tasks for deployment, for unit testing, etc. If you want to use a CI technology, you can wire it up to your NAnt build.

I would start by first writing down all the steps it takes you to do the build and test manually. After that you at least have a guide for doing it the old way, and writing things down gives you the chance to look at it as a complete process.
Then look for parts to script.
Ideally you want to trigger a build and test from a code commit and only rebuild and retest the changed parts, with perhaps a full build and test nightly or weekly. You'll need log files or database entries and reports on the build success or lack of it.
You'll want to search out and evaluate pre-built products and open-source build-your-own kits. You can certainly write all the scripting and reporting yourself, but it will take a while and you'll probably end up with a just barely good enough reporting system since your job is coding the product, not coding the build system. :-)

I would guess that migrating isn't really an option--Half-ass solutions will only make it worse.
My approach would be to take one creative engineer who understands the build process, sit him down and say "Fix this". Give him a week or two.
The end goal would be a process that runs beginning to end with a single make command.
I also recommend an automated "Setup" procedure where you simply do a checkout and run a batch file from a network share to install and build all your tools. The amount of time this will save overall is staggering if you bring in new programmers. Most projects take one to three days to get set up on a new computer--and it's always the "new" programmer who doesn't know what's going on doing the installs on his own system...

In short: Incrementally
Choose a framework that will work across the diverse range of projects.
One by one, add components to the framework.
If you are not familiar with the framework, tackle a couple of the easier components first, to reduce risk of screwing up.
If you do understand the framework, tackle some of the more difficult and/or commonly built components first, so your team (and management) will appreciate the benefits early, and support the effort more.
Be sure to have a plan to include all of your components, because that's when the full benefit will be realized.
Bring your team with you; make sure you have consensus that this is going to be valuable, or people won't maintain it as the components change.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio