To Clean or not to Clean - continuous-integration

To Clean or not to Clean - continuous-integration

I work on a medium sized project that uses continuous integration to perform regular builds. Our project has quite a long build time at the moment (45-55 mins) and we have been looking at what optimizations can be made to reduce this time.
One of the optimizations that has been suggested is to eliminate the clean step that we have at the start of every build ie delete the entire build directory and get all source files from source control. Instead, just retrieve the files that have been changed and start a new build. Rough estimates put this at saving us 10-20 mins per build but the suggestion made me a little uncomfortable.
So I turn to the Stack Overflow community to see what the best practise is... does your continuous integration always do a clean build? Are there particular reasons for and/or against this?
I'll post my thoughts below in an attempt not to bias anyone but am really interested to hear some other opinions.

For continuous integration, it's important to have a rapid turnaround. I have made that trade-off before, and I would say it's worth it. Occasionally, it will allow things to slip through, but the gains in getting feedback sooner are worth it.
In addition to performing frequent incremental builds, have less frequent clean builds. This gives you most of the benefits of both approaches.

The reason for continuous integration is to identify all problems early, not some of the problems quickly and the others - well, when ever.
Not cleaning provides an opportunity for a problem to go undetected for a significant amount of time. Most partial build systems still rely on file time stamps for integrity, need I say more.
Cleaning is often the only way to be certain the build is good. An alternate may be to clean periodically (say nightly), so the worst case is a day before a problem is detected (is that early enough?).
Whats you budget improved build servers. Can you make your build go faster - optimization, more/faster hardware, parallel build steps, faster compiler etc. Can you go to a faster build tool such as scons or similar, that will make use of the all 8 CPUs in your build server (particularly if you use make)?
I would Clean.

Continuous Integration is all about repeatability. If you can produce a reliable build each time without cleaning do it. The problem with not removing the build directory is that file that are removed from SCM might not get removed from the build directory, and as a result could mess up deployments and testing.
Personally I would recommend cleaning your build directory, but not deleting your source. This assumes your SCM can sync your source correctly.
It takes ~15 minutes to clean? Thats a pretty long time, I would be interested in knowing what is taking so long.

Related

Using CI to Build Interdependent Projects in Isolation

So, I have an interesting situation. I have a group that is interested in using CI to help catch developer errors. Great - we are all for that. The problem I am having wrapping my head around things is that they want to build interdependent projects isolated from one another.
For example, we have a solution for our common/shared libraries that contains multiple projects, some of which depend on others in the solution. I would expect that if someone submits a change to one of the projects in the solution, the CI server would try to build the solution. After all, the solution is aware of dependencies and will build things, optimized, in the correct order.
Instead, they have broken out each project and are attempting to build them independently, without regard for dependencies. This causes other errors because dependent DLLs might not yet exist(!) and a possible chicken-and-egg problem if related changes were made in two or more projects.
This results in a lot of emails about broken builds (one for each project), even if the last set of changes did not really break anything! When I raised this issue, I was told that this was a known issue and the CI server would "rebuild things a few times to work out the dependencies!"
This sounds like a bass-ackwards way to do CI builds. But maybe that is because I am older and ignorant of newer trends - so, as anyone known of a CI setup that builds interdependent projects independently? Any for what good reason?
Oh, and they are expecting us to use the build outputs from the CI process, and each built DLL gets a potentially different version number. So we no longer have a single version number for the output of all related DLLs. The best we can ascertain is that something was built on a specific calendar day.
I seem to be unable to convince them that this is A Bad Thing.
So, what am I missing here?
Thanks!
Peace!

I second your opinion.
You risk spending more time dealing with the unnecessary noise than with the actual issues. And repeating portions of the CI verification pipeline only extends the overall CI execution time, which goes against the CI goal of reducing the feedback loop.
If anything you should try to bring as many dependent projects as possible (ideally all of them) under the same CI umbrella, to maximize the benefit of fast feedback of breakages/regressions on any part of the system as a whole.
Personally I also advocate using a gating CI system (based on pre-commit verifications) to prevent regressions rather than just detecting them and relying on human intervention for repairs.

Is CI worth implementing for a one or two man project?

At work where we do LOB .NET/MSSQL developement, many projects we have are 2 person or even 1 person projects that have development life cycles of 1-3 months. The developers serve as business analyst/project managers/QA so things get done fast with minimal 'BS time' spent. We do get the bigger projects that can take 6 months and have a team of 5 devs on it, but these are more uncommon.
We're doing a push to initiate everyone doing TDD going forward (my most recent project has full code coverage and was developed solely), and I was doing research on the architecture required to take maximum benefit of it. It seems that most people doing TDD are doing CI, have a build server and are doing automated builds and have some kind of automated client build tool (FinalBuilder or nAnt) etc.
So my questions - I see the obvious benefits on the uncommon large projects where you have 5 people working on the same codebase at once - but will we see much benefit from doing CI on the small 2 man projects? What about on a 1 man project - for those is it just a complete waste since you're really not 'integrating' with anyone? And, how would you pitch CI / automated builds / build server to management?

Having an automated/repeatable build process, and being able to prove that the current build passes all tests and runs in a server environment is worth the effort on any size project IMHO.
I'd pitch it this way: manual builds are manual. Things can get mixed up even on small projects . An automated build solves this problem. The amount of time it takes to set up the build script will be made up many times over during the lifecycle of the application.
As far as CI with test runs etc... goes: It's a constant health check on the quality of the code base. It's good to know as soon as possible when one thing inadvertently breaks another thing.

On a small project, you don't need most of the infrastructure to do CI, especially the build server. What you do need is the tests, the build automation, revision control, and a controlled build environment. You can just as well have your build and test servers be virtual machine images you run on your workstations... just so long as the images are under revision control like the rest of the project.

Is it safe to use incremental rebuild for generating release build in visual C++?

I sometimes have issues with the incremental rebuild on visual C++ (2003 currently ). Some dependencies does not seem correctly checked and some files aren't build when they should. I suppose thoses issues come from the timestamp approach to incremental rebuild.
I don't consider it a huge issue when building debug build on my desk, however for distribuable build this is a issue.
Is it safe to use incremental build for a build server or is a full build a requirement ?

You need a build you distrubute to be recreatable again should users experience problems that need investigating.
I would not rely on an incremental build. Also, I would always delete all source from the build machine, and fetch it from scratch from the source control system before building a release. This way, you know you can repeat the build process again by fetching the same source code.
If you use an incremental build, the build will build differently each time because only a subset of the system will need to be built. I think its just good to eliminate as many possible differences between release builds as possible. So, for this reason incremental builds are out.
It's a good idea to label or somehow mark the versions of each source file in the source control system with the version number of the build. This enables you to keep track of the exact source that went into building the release. With a decent source code control system the labels can be used to track down all the changes that were made to the code between one release and the next. This can be a real help when trying to track down a bug that you know was introduced between the two releases.
Incremental builds can still be useful on a development machine when you're not distributing the build, just for saving time during the code/debug/test/repeat development cycle.

I'd argue that you should avoid the "is it safe" question. It's simply not worth looking into.
The advantage of incremental builds is the cost saving. Small per build, it adds up for all the debug builds a developer typically makes, and then adds up further for all developers on your team.
On the other hand, release builds are rare. The costs of a release build are not in developers waiting on them, they're found much more in the team of testers that need to validate it.
For that reason, I find that incremental builds are without a doubt cost-saving for debug builds, but I refuse to spend any time calculating the small saving I'd get out of incrementally building a release build.

I would always prefer to do a full clean and rebuild for any release for peace of mind if nothing else. Assuming that you're talking about externally released builds and not just builds using the Release configuration, the fact that these builds are relatively infrequent means that any time saving will be minimal in the long run.

Binaries built using with the incremental build feature are going to be bigger and slower, even for release builds, because they necessarily contain scaffolding to enable the incremental builds that isn't necessary for fully optimized builds. I'd recommend you not to use incremental builds for this reason.

How to migrate from "Arcane Integration" to Continuous Integration?

Right now a project I'm working on has reached a level of complexity that requires more than a few steps (actually its become arcane!) to produce a complete/usable product. And unfortunately we didn't start out with a Continuos Integration mindset, so as you can imagine its kind of painful at times, and at others I can easily waste half a day trying to get a clean/tested build.
Anyways as any HUGE project it consists of many components in many different languages (not only enterprise style Java or C# for example), as well as many graphical, and textual resources. Now the problem is that when I look for Continuos Integration, I always find best practices and techniques that assume one is starting a new project, from the ground up. However this isn't a new project, so I was wondering what are some good resources to proactively start migrating from Arcane Integration towards Continuos Integration :)
Thanks in advance!

Here it is in two simple (hah) steps.
Go for the repeatable build:
Use source control, get all code checked in.
Establish and document all tools used to build (mainly, which compiler version). Have a repeatable deployment and set up process for these tools.
Establish and document clearly any resources which are necessary to build, but are not checked in (third party installations, service packs, etc). Have a repeatable deployment and set up process for these dependencies.
Before commiting to source control, developers must
update their working copy
successfully build
run and pass automated tests
These steps can be done 1 at a time, sort of a path to follow. You'll get benefits at each stage. For example, if you aren't using source control at all, just getting the code into source control (without anything else) is a big step forward. Also, if there are no automated tests, then developers can't run them - but they can still get the prior commits and get the compiler to check their work.
If you can do all of these, you'll get to a nice sane place.
The goals are repeatable build processes and developers that are plugged in to how their changes affect the build and other developers.
Then you can reap the bonuses by establishing higher compliance:
Developers establish a frequent commit habit. Code that is in the working copy should never be more than 1 day old.
Automated build process monitors source control for check-ins and gets the results to a place where the users can accept them (such as a test environment, a preview website, or even simply placing an .exe where the user can find it).

The same way you eat an elephant (one bite at a time) ;-) Continuous integration requires an automated build. Start with that. Automate the building of each piece. Ant or NAnt is a great way to do this. Have each component's construction be a NAnt task. Then your entire system build can aggregate those individual tasks.
From there, you can add tasks for deployment, for unit testing, etc. If you want to use a CI technology, you can wire it up to your NAnt build.

I would start by first writing down all the steps it takes you to do the build and test manually. After that you at least have a guide for doing it the old way, and writing things down gives you the chance to look at it as a complete process.
Then look for parts to script.
Ideally you want to trigger a build and test from a code commit and only rebuild and retest the changed parts, with perhaps a full build and test nightly or weekly. You'll need log files or database entries and reports on the build success or lack of it.
You'll want to search out and evaluate pre-built products and open-source build-your-own kits. You can certainly write all the scripting and reporting yourself, but it will take a while and you'll probably end up with a just barely good enough reporting system since your job is coding the product, not coding the build system. :-)

I would guess that migrating isn't really an option--Half-ass solutions will only make it worse.
My approach would be to take one creative engineer who understands the build process, sit him down and say "Fix this". Give him a week or two.
The end goal would be a process that runs beginning to end with a single make command.
I also recommend an automated "Setup" procedure where you simply do a checkout and run a batch file from a network share to install and build all your tools. The amount of time this will save overall is staggering if you bring in new programmers. Most projects take one to three days to get set up on a new computer--and it's always the "new" programmer who doesn't know what's going on doing the installs on his own system...

In short: Incrementally
Choose a framework that will work across the diverse range of projects.
One by one, add components to the framework.
If you are not familiar with the framework, tackle a couple of the easier components first, to reduce risk of screwing up.
If you do understand the framework, tackle some of the more difficult and/or commonly built components first, so your team (and management) will appreciate the benefits early, and support the effort more.
Be sure to have a plan to include all of your components, because that's when the full benefit will be realized.
Bring your team with you; make sure you have consensus that this is going to be valuable, or people won't maintain it as the components change.

Are code freezes still relevant when using a continuous integration build setup?

I've used a Continuous Integration server in the past with great success, and hadn't had the need to ever perform a code freeze on the source control system.
However, lately it seems that everywhere I look, most shops are using the concept of code freezes when preparing for a release, or even a new test version of their product. This idea runs even in my current project.
When you check-in early and often, and use unit tests, integration tests, acceptance tests, etc., are code freezes still needed?

Continuous integration is a "build" but it's part of the programming part of the development cycle. Just like the "tests" in TDD are part of the programming part of the development cycle.
There will still be builds and testing as part of the overall development cycle.
The point of continuous integration and tests is to shorten the feedback loops and give programmers more visibility. Ultimately, this does mean less problems in testing and builds, but it doesn't mean you no longer do the original parts of your development cycle - they are just more effective and can be raised to a higher level, since more tivial problems are being caught earlier in the development cycle.
So you will still have to have a code freeze (or at least a branch) in order to ensure the baseline for what you are shipping is as expected. Just because someone can implement something with a high degree of confidence does not mean it goes into your release without passing through the same final cycles, and the code freeze is an important part of that.
With CI, your code freezes can be very short, since your final build, testing and release may be very reliable, and code freeze may not even exist on small projects, since there is no need for a branch - you release and go right back into development on the next set of features very quickly.
I'd also like to add that CI and TDD allow the final build and testing phase to revert back closer to the traditional waterfall (do all dev, do all testing, then release), as opposed to the more continual QA which has been done on projects with weekly or monthly builds. Your testers can use the CI builds to give early feedback, but it's effectively a different sort of feedback than in the final testing, where you are looking for stability and reliability as opposed to functionality (which obviously was missed in the unit "tests" which the developers had built).

Code freezes are important, because continues integration does not replace runtime regression testing.
Having an application build and pass unit testing is only a small part of the challenge, ideally, when you freeze code for a release, you are signing off on two things:
This code has been fully regressioned, and is defect free
This code is EXACTLY the code that should be in production (for SOX compliance).
If your using a modern SCM, just fork the code at that point and start work on the next release in a branch, and do a merge when the project is deployed. (Of course, place a label so you can rollback that point if you need to apply a breaking patch).
Once code is in "release mode", it should not be touched.
Our typical process:
Development
||
\/
QAT
||
\/
UAT => Freeze until deploy date => Deploy => Merge and repeat
\ /
\- New Branch for future dev -------/
Of course, we usually have many parallel branches during development, that merge back up into the release stream before UAT.

The code freeze has more to do with QA than it has to do with Dev. The code freeze is the point where QA has said: "Enough. We only have bandwidth to fully test the new features added in so far." That doesn't mean dev doesn't have the bandwidth to add more features, it's just that QA needs time with a quiescent code base to ensure that everything works together.
If you're all in continuous integration mode (QA included) this could be just a freeze of a very short time while QA puts the final seal of approval on the whole package just before it goes out the door.
It all depends on how tightly your QA and regression testing are integrated into the dev cycle.
I'd second the votes already mentioned about SCM branching and allowing dev to continue on a different code branch than what QA is testing. It all goes back to the same thing. QA and regression testing need a quiescent code base for a period of time prior to release.

I think that code freezes are important because each new feature is a potential new source of bugs. Sure regression tests are great and help address this issue. But code freezes allow the developers to focus on fixing currently outstanding bugs and get the current feature set into a release worthy state.
At best, if I really wanted to develop new code during a code freeze, I would fork the frozen tree, do my work there, then after the freeze, merge the forked tree back in.

I'm going to sound like one of the context-driven people but the answer is "it depends".
Code Freeze is a strategy to cope with a problem. If you don't have the problem it is good at addressing, then no, it isn't needed. If you have another technique for addressing the problem, then no, it isn't needed.
Code Freeze is one technique for reducing risk. The advantages if brings are stability and simplicity. The disadvantage it brings are
Another technique is to use Branching, such as with "Feature Branches". The disadvantage of Branching is cost of dealing with the branches, of merging changes.
The technique you're describing for reducing risk is Automated Testing to give fast feedback. The trade-off here is increased velocity for some increased risk (you will miss some bugs).
Of these approaches I'm with you, I prefer the Automated Testing. But there are some situations, such as very high cost of failure, where a Code Freeze does provide a lot of value.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio