Should code coverage be executed EVERY build? [closed] - continuous-integration

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm a huge fan of Brownfield Application Development. A great book no doubt and I'd recommend it to all devs out there. I'm here because I got to the point in the book about code coverage. At my new shop, we're using Team City for automated builds/continuous integration and it takes about 40 minutes for the build to complete. The Brownfield book talks all about frictionless development and how we want to ease the common burdens that developers have to endure. Here's what I read on page 130..
"Code coverage: Two processes for the price of one?
As you can see from the sample target in listing 5.2, you end up with two output files:
one with the test results and one with the code coverage results. This is because you
actually are executing your tests during this task.
You don’t technically need to execute your tests in a separate task if you’re running
the code coverage task. For this reason, many teams will substitute an automated
code coverage task for their testing task, essentially performing both actions in the
CI process. The CI server will compile the code, test it, and generate code coverage
stats on every check-in.
Although there’s nothing conceptually wrong with this approach, be aware of some
downsides. First, there’s overhead to generating code coverage statistics. When
there are a lot of tests, this overhead could be significant enough to cause friction in
the form of a longer-running automated build script. Remember that the main build
script should run as fast as possible to encourage team members to run it often. If
it takes too long to run, you may find developers looking for workarounds.
For these reasons, we recommend executing the code coverage task separately from
the build script’s default task. It should be run at regular intervals, perhaps as a separate scheduled task in your build file that executes biweekly or even monthly, but we
don’t feel there’s enough benefit to the metric to warrant the extra overhead of having
it execute on every check-in."
This is contrary to the practice at my current shop were we execute NCover per build. I want to go to my lead and request we not do this, but the best I can do is tell him "this is what the Brownfield book says". I don't think that's good enough. So I'm relying on you guys to fill me in with your personal experiences and advice on this topic. Thanks.

There are always two competing interests in continuous integration / automated build systems:
You want the build to run as quickly as possible
You want the build to run with as much feedback as possible (e.g. the most number of tests run, the most amount of information available about the build's stability and coverage, etc)
You will always need to make tradeoffs and find a balance between these competing interests. I usually try to keep my build times under 10 minutes, and will consider build systems broken if it takes more than about 20 minutes to give any sort of meaningful feedback about the build's stability. But this doesn't need to be a complete build that tests every case; there may be additional tests that are run later or in parallel on other machines to further test the system.
If you are seeing build times of 40 minutes, I would recommend you do one of the following as soon as possible:
Distribute the build/testing onto multiple machines, so that tests can be run in parallel and you can get faster feedback
Find things that are taking a lot of time in your build but are not giving a great amount of benefit, and only do those tasks as part of a nightly build
I would 100% recommend the first solution if at all possible. However, sometimes the hardware isn't available right away and we have to make sacrifices.
Code coverage is a relatively stable metric, in that it is relatively rare that your code coverage numbers would get dramatically worse within a single day. So if the code coverage is taking a long time to perform, then it's not really critical that it occurs on every build. But you should still try to get code coverage numbers at least once a night. Nightly builds can be allowed to take a bit longer, since there (presumably) won't be anybody waiting on them, but they still provide regular feedback about your project's status and ensure there aren't lots of unforeseen problems being introduced.
That said, if you are able to get the hardware to do more distributed or parallel building/testing, you should definitely go that route - it will ensure that your developers know as soon as possible if they broke something or introduced a problem in the system. The cost of the hardware will quickly pay itself back in the increased productivity that occurs from the rapid feedback of the build system.
Also, if your build machine is not constantly working (i.e. there is a lot of time when it is idle), then I would recommend setting it up to do the following:
When there is a code change, do a build and test. Leave out some of the longer running tasks, including potentially code coverage.
Once this build/test cycle completes (or in parallel), kick off a longer build that tests things more thoroughly, does code coverage, etc
Both of these builds should give feedback about the health of the system
That way, you get the quick feedback, but also get the more extended tests for every build, so long as the build machine has the capacity for it.

I wouldn't make any presumptions about how to fix this - you're putting the cart before the horse a bit here. You have a complaint that the build takes too long, so that's the issue I would ask to resolve, without preconceived notions about how to do it. There are many other potential solutions to this problem (faster machines, different processes, etc.) and you would be wise not to exclude any of them.
Ultimately this is a question of whether your management values the output of the build system enough to justify the time it takes. (And whether any action you might take to remedy the time consumption has acceptable fidelity in output).

This is a per team and per environment decision. You should first determine your threshold for build duration, and then factor out longer running processes into less-frequent occurrences (ideally no fewer than 1 or 2 times a day in CI) once that has been determined.

The objection appears to be that executing all the tests, and collecting code coverage, is expensive, and you don't (well, someone doesn't) want to pay that price for each build.
I cannot imagine why on earth you (or that someone) would not want to always know what the coverage status was.
If the build machine has nothing else to do, then it doesn't matter if it does this too.
If your build machine is too busy doing builds, maybe you've overloaded it by asking it to serve too many masters, or you are doing too many builds (why so many changes? hmm, maybe the tests aren't very good !).
If the problem is that the tests themselves really do take a long time, you can perhaps find a way to optimize the tests. In particular, you shouldn't need to re-run tests for the part of the code that didn't change. Figuring out how to do this (and trusting it) might be a challenge.
Some test coverage tools (such as ours) enable you to track what tests cover which part of the code, and, given a code change, which tests need to be re-run. With some additional scripting, you can simply re-run the tests that are affected first; this enables you to get what amounts to full test results early/fast without running all the tests. Then if there are issues with the build you find out as soon as possible.
[If you are paranoid and don't really trust the incremental testing process, you can run them for the early feedback, and then go on to run all the tests again, giving you full results.]

Related

CI and automation

I'm about to start CI, and I have a fully automated verification system, but as I read, the automation run will start after the developer code is pushed to the cloud (and that happens many times a day). When I run the whole automation bundles it takes around 1 hour to finish the tests.
So I'm wondering if the time is acceptable, if not, what can I do to decrease the time, is there some kind of certain method that could help. Tools, please advise.
Thanks in advance.
Maybe to think about to run automated tests at the end of a day. I also had similar situation, and came to solution to use cron jobs, that are set at midnight.
Maybe to think about that, and to avoid testing for each build.
If automated testing is needed for each build, try to introduce nodes (for eg. for Jenkins) You could add additional nodes and run on several machines, I think I also did for BitRise similar thing.
Divide tests cases with some logic, eg. logins in one run, negative testcases in different and so on.
Chop down testcases to certain smaller sections, use only core tests for each build, and complete run at the end of day.
There is lot of measures how to ensure faster running and not of them is programatically handled.
But also programatically can be drastically increase speed of tests, parallelisation, concurrent runs, grid etc.
Hope this helps,

What would be the best way to add Performance Testing in Continuous Integration/Delivery Environment [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm about to integrate performance testing on CI, but I'm having a hard time to decide which ones I should execute and how to run it:
Should I run load testing on individual APIs or should I run the whole workflow (e.g.: Login -> Home Page -> Search)
How long I should run it?
Should I also add stress, peak and soak testing?
I'm thinking, those should be run out of the CI.
Any comments would be really appreciated.
The main idea of including tests in continuous integration process is protecting yourself from regressions, i.e. to ensure that a new feature or a bug fix doesn't cause performance degradation.
So common practice is to have short-term Load Test covering essential features and conducting anticipated number of users to be run periodically (i.e. on each commit or pull request to master/integration branch).
If you have enough capacity and environment availability it would also make sense to have soak test in place, but it should occur less frequently (you want to keep build time short enough, don't you?), i.e. overnight or over the weekend. This way you will be able to identify possible memory leaks
Both approaches assume having some reference metrics collected by previous ad-hoc runs which can be considered as acceptable pass/fail criteria.
Stress test assumes identifying saturation/breaking points so normally it is not included into CI pipeline and being run manually before major releases.
Check out How to Include Load Testing in your Continuous Integration Environment article for more information.
It's difficult to say what you should test without knowing what's important to your application. In many applications, testing the performance of login would be less important than testing the performance of search and a sub-optimal use of time. However getting an accurate performance of search against test data can be challenging, but hardly impossible. If running on CI, one could develop scripts that test performance in a critical or often changing area. Your CI system could be setup to watch for changes in these areas and then kick off the performance tests if necessary. When done, CI could notify the developer if the performance in an area where he or she made changes doesn't meet a specified threshold. I'd worry about running a few mission-critical performance tests with a tight feedback loop rather than worrying about running many types of tests. Remember, someone has to maintain these tests and supporting infrastructure.
If your developers are committing code continuously and have concerns on large load tests running for every single commit, you can ask them to batch the commits and trigger the build for performance test phase at a later time. However, all the interim builds can be allowed to go through the full QA cycle if the QA phase is well optimized and can finish within minutes. A 10-15 minute period is a good interval for a build to complete full automated tests. Now, the pipeline should mark them as interim builds that are not production ready and should allow them only after the completion of all the performance tests. You can also extend it by short load tests that match with the CI QA tests but defer the larger load tests to end-of day. In summary, the production deployment for the build has to wait until the load and soak tests have been completed. It does limit your ability to deploy several times throughout the day without risk. IF you would rather prefer to accept a certain degree of risk, you can proceed to deploy for minor changes like property and configuration changes.

What’s the ROI of Continuous Integration?

Currently, our organization does not practice Continuous Integration.
In order for us to get an CI server up and running, I will need to produce a document demonstrating the return on the investment.
Aside from cost savings by finding and fixing bugs early, I'm curious about other benefits/savings that I could stick into this document.
My #1 reason for liking CI is that it helps prevent developers from checking in broken code which can sometimes cripple an entire team. Imagine if I make a significant check-in involving some db schema changes right before I leave for vacation. Sure, everything works fine on my dev box, but I forget to check-in the db schema changescript which may or may not be trivial. Well, now there are complex changes referring to new/changed fields in the database but nobody who is in the office the next day actually has that new schema, so now the entire team is down while somebody looks into reproducing the work you already did and just forgot to check in.
And yes, I used a particularly nasty example with db changes but it could be anything, really. Perhaps a partial check-in with some emailing code that then causes all of your devs to spam your actual end-users? You name it...
So in my opinion, avoiding a single one of these situations will make the ROI of such an endeavor pay off VERY quickly.
If you're talking to a standard program manager, they may find continuous integration to be a little hard to understand in terms of simple ROI: it's not immediately obvious what physical product that they'll get in exchange for a given dollar investment.
Here's how I've learned to explain it: "Continuous Integration eliminates whole classes of risk from your project."
Risk management is a real problem for program managers that is outside the normal ken of software engineering types who spend more time writing code than worrying about how the dollars get spent. Part of working with these sorts of people effectively is learning to express what we know to be a good thing in terms that they can understand.
Here are some of the risks that I trot out in conversations like these. Note, with sensible program managers, I've already won the argument after the first point:
Integration risk: in a continuous integration-based build system, integration issues like "he forgot to check in a file before he went home for a long weekend" are much less likely to cause an entire development team to lose a whole Friday's worth of work. Savings to the project avoiding one such incident = number of people on the team (minus one due to the villain who forgot to check in) * 8 hours per work day * hourly rate per engineer. Around here, that amounts to thousands of dollars that won't be charged to the project. ROI Win!
Risk of regression: with a unit test / automatic test suite that runs after every build, you reduce the risk that a change to the code breaks something that use to work. This is much more vague and less assured. However, you are at least providing a framework wherein some of the most boring and time consuming (i.e., expensive) human testing is replaced with automation.
Technology risk: continuous integration also gives you an opportunity to try new technology components. For example, we recently found that Java 1.6 update 18 was crashing in the garbage collection algorithm during a deployment to a remote site. Due to continuous integration, we had high confidence that backing down to update 17 had a high likelihood of working where update 18 did not. This sort of thing is much harder to phrase in terms of a cash value but you can still use the risk argument: certain failure of the project = bad. Graceful downgrade = much better.
CI assists with issue discovery. Measure the amount of time currently that it takes to discover broken builds or major bugs in the code. Multiply that by the cost to the company for each developer using that code during that time frame. Multiply that by the number of times breakages occur during the year.
There's your number.
Every successful build is a release candidate - so you can deliver updates and bug fixes much faster.
If part of your build process generates an installer, this allows a fast deployment cycle as well.
From Wikipedia:
when unit tests fail or a bug emerges, developers might revert the codebase back to a bug-free state, without wasting time debugging
developers detect and fix integration problems continuously - avoiding last-minute chaos at release dates, (when everyone tries to check in their slightly incompatible
versions).
early warning of broken/incompatible code
early warning of conflicting changes
immediate unit testing of all changes
constant availability of a "current" build for testing, demo, or release purposes
immediate feedback to developers on the quality, functionality, or system-wide impact
of code they are writing
frequent code check-in pushes developers to create modular, less
complex code
metrics generated from automated testing and CI (such as metrics for code coverage, code
complexity, and features complete) focus developers on developing functional, quality code, and help develop momentum in a team
well-developed test-suite required for best utility
We use CI (Two builds a day) and it saves us a lot of time keeping working code available for test and release.
From a developer point of view CI can be intimidating when Automatic Build Result, sent by email to all the world (developers, project managers, etc. etc.), says:
"Error in loading DLL Build of 'XYZ.dll' failed." and you are Mr. XYZ and they know who you are :)!
Here's my example from my own experiences...
Our system has multiple platforms and configurations with over 70 engineers working on the same code base. We suffered from somewhere around 60% build success for the less commonly used configs and 85% for the most commonly used. There was a constant flood of e-mails on a daily basis about compile errors or other failures.
I did some rough calculations and estimated that we lost an average of an hour a day per programmer to bad builds, which totals nearly 10 man days of work every day. That doesn't factor in the costs that occur in iteration time when programmers refuse to sync to the latest code because they don't know if it's stable, that costs us even more.
After deploying a rack of build servers managed by Team City we now see an average success rate of 98% on all configs, the average compile error stays in the system for minutes not hours and most of our engineers are now comfortable staying at the latest revision of the code.
In general I would say that a conservative estimate on our overall savings was around 6 man months of time over the last three months of the project compared with the three months prior to deploying CI. This argument has helped us secure resources to expand our build servers and focus more engineer time on additional automated testing.
Our biggest gain, is from always having a nightly build for QA. Under our old system each product, at least once a week, would find out at 2AM that someone had checked in bad code. This caused no nightly build for QA to test with, the remedy was to send release engineering an email. They would diagnose the problem and contact a dev. Sometimes it took as long as noon before QA actually had something to work with. Now, in addition to having a good installer every single night, we actually install it on VM's of all the different supported configurations everynight. So now when QA comes in, they can start testing within a few minutes. Now when you think of the old way, QA came in grabbed the installer, fired up all the vms, installed it, then started testing. We save QA probably 15 minutes per configuration to test on, per QA person.
There are free CI servers available, and free build tools like NAnt. You can implement it on your dev box to discover the benefits.
If you're using source control, and a bug-tracking system, I imagine that consistently being the first to report bugs (within minutes after every check-in) will be pretty compelling. Add to that the decrease in your own bug-rate, and you'll probably have a sale.
The ROI is really an ability to provide what the customer wants. This is of course very subjective but when implemented with involvement of the end customer, you would see that customers starts appreciating what they are getting and hence you tend to see less issues during User Acceptance.
Would it save cost? may be not,
would the project fail during UAT? definitely NO,
would the project be closed in between? - high possibility when the customers find that this would not deliver the
expected result.
would you get real-time and real data about the project - YES
So it helps in failing faster, which helps mitigate risks earlier.

Build Quality

We have 3 branches {Dev,Test,Release} and will have continuous integration set up for each branch. We want to be able to assign build qualities to each branch i.e. Dev - Ready for test...
Has anyone any experience with this that can offer any advice/best practice approach?
We are using TFS 2008 and we are aware that it has Build Qualities built in. It is just when to apply a quality and what kind of qualities people use is what we are looking for.
Thanks
:)
Your goal here is to get the highest quality possible in each branch, balanced against the burden of verifying that level of quality.
Allowing quality to drop in any branch is always harmful. Don't think you can let the Dev branch go to hell and then fix it up before merging. It doesn't work well, for two reasons:
Recovering is harder than you expect. When a branch is badly broken, you don't know how broken it really is. That's because each issue hides others. It's also hard to make any progress on any issue because you'll run in to other problems along the way.
Letting quality drop saves you nothing. People sometimes say "quality, cost, schedule - pick any 2" or something like that. The false assumption here is that you "save" by allowing quality to slip. The problem is that as soon as quality drops, so does "velocity" - the speed at which you get work done. The good news is that keeping quality high doesn't really cost extra, and everyone enjoys working with a high-quality code base.
The compromise you have to make is on how much time you spend verifying quality.
If you do Test Driven Development well, you will end up with a comprehensive set of extremely fast, reliable unit tests. Because of those qualities, you can reasonably require developers to run them before checking in, and run them regularly in each branch, and require that they pass 100% at all times. You can also keep refactoring as you go, which lets you keep velocity high over the life of the project.
Similarly, if you write automated integration / customer tests well, so they run quickly and reliably, then you can require that they be run often, as well, and always pass.
On the other hand, if your automated tests are flaky, if they run slowly, or if you regularly operate with "known failures", then you will have to back off on how often people must run them, and you'll spend a lot of time working through these issues. It sucks. Don't go there.
Worst case, most of your tests are not automated. You can't run them often, because people are really slow at these things. Your non-release branch quality will suffer, as will the merging speed and development velocity.
Assessing the quality of a build in a deterministic and reproducible way is definitely challenging. I suggest the following:
If you are set up to do automated regression testing then all those tests should pass.
Developers should integration test each of their changes using an official Dev build newly installed on an official and clean test rig and give their personal stamp of approval.
When these two items are satisfied for a particular Dev build you can be reasonably certain that promoting this build to Test will not be wasting the time of your QA team.

Policy for fixing broken nightly builds [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I guess everybody agrees that having continuous builds and continuous integration is beneficial for quality of the software product. Defects are found early so they can be fixed ASAP. For continuous builds, which take several minutes, it is usually easy to find the one who caused the defect. However, for nightly integration tests, which take long time to run, this may be a challenge. Here are specifics of the situation, for which I'm looking for an optimal solution:
Running integration tests takes more than 1 hour. Therefore they are run overnight. Multiple check-ins happen every day (team of about 15 developers) so it is sometimes difficult to find the "culprit" (if any).
Integration testing environment depends on other environments (web services and databases), which may fail from time to time. This causes integration tests to fail.
So how to organize the team so that these failures are fixed early? In my opinion, there should be someone appointed to DIAGNOSE the defect(s). This should be the first task in the morning. If he needs an expertise of others, they should be readily available. Once the source (component, database, web service) of the failure is determined, the owner should start fixing it (or another team should be notified).
How to appoint the one who diagnoses the defects? Ideally, someone would volunteer (ha ha). This won't happen very often, I'm afraid. I've heard other option - whoever comes first to the office should check the results of the nightly builds. This is OK, if the whole team agrees. However, this rewards those who come late. I suppose that this role should rotate in the team. The excuse "I don't know much about builds" should not be accepted. Diagnostics of the source of the failure should be rather straightforward. If it is not, then adding more of diagnostics logging to the code should improve the visibility into integration test failures.
Any experience in this area or suggestions for improvements of the above approach?
A famous policy about broken nightly builds, attributed to Microsoft, is that the guy whose commit broke the build becomes responsible for maintaining nightly builds until someone else breaks it.
That makes sense, since
everyone makes mistakes, so the necessary rotation will occur (empowered with Least-Recently-Used choice pattern for ambiguous cases)
it encourages people to write better code
What I generally do (I've done it for a team of between 8 and 10 persons) is two have one guy that checks the build, as the first thing he does in the morning -- some would say he is responsible for QA, I suppose.
If there is a problem, he's responsible for finding out what/how -- of course, he can ask help from the other members of the team, if needed.
This means there's at least one member of the team that has to have a great knowledge of the whole application -- but that's not a bad thing anyway : it'll help diagnose problems the day that application is used in production and suffers a failure.
And instead of having one guy to do that, I like when there are two : one for one week, the other for the second week -- for instance ; this way, there are greater chances of always having someone who can diagnose problems, even if one of them is in holidays.
As a sidenote : the more useful things you log during the build, the easier it is to find out what went wrong -- and why.
Why not let everyone in the team check the build every morning ?
Well, not every one wants to, first of all -- and that will be done better if the one doing it likes what he does
And you don't want 10 people spending half an hour every day on that ^^
In your case I'd suggest whoever is in charge of the CM. If that is the manager or technical lead who has too many responsibilities why not give it to a junior developer? I wish someone had forced me early in my career to get to know source control more thoroughly. Not only that but looking at other people's code to track down a source of error is a real skill building or knowledge learning exercise. They say you gain the most from looking at other people's code and I'm a firm believer of this.
Pair experienced with unexperienced
You may want to consider having pairs of developers diagnose the broken builds. I've had good luck with that. Especially if you pair team members who have little familiarity with the build system and team members who have significant familiarity. This may reduce the possibility of team members saying "I don't know much about builds" as a way to try and get out of the duty, and it will decrease your bus number and increase collective ownership.
Give the team a choice of your assigned solution or one of their own making
You could put the issue to your team and ask them to offer a solution. Tell them that if they don't come up with a workable solution, you will make a weekly schedule, assigning one pair per day and making sure that everyone has the opportunity to participate.
Practice continuous integration so you don't need infrequent mega-builds
** you can distribute builds between machines if it's too slow for one machine to do
Use a build status monitor so that whoever checked something in can be made responsible for build failures.
Have an afternoon check-in deadline
Either:
Nobody checks-in after 5pm
or
Nobody checks-in after 5pm unless they're prepared to stay at work until their build passes as green - even if that means working on, committing a fix and waiting for a rebuild.
It's much easier to enforce and obey the first form, so that's what I'd follow.
Members of a former team of mine actually got phoned up and told to return to work to fix the build... and they did.
I'd be tempted to suggest splitting things up in either of a couple of ways:
Time split - Assuming that the tests could run twice a night, why not run the tests against the code at 2 different time points,i.e. all the check-ins up to X p.m. and then the remainder, so that could help narrow down where the problem is.
Team split - Can the code be split into smaller pieces so that the tests could be run on different machines to help narrow down which group should dig into things?
This assumes you could run the tests multiple times and divide things up in such a way so it is a rough idea.
We have a stand-up meeting every morning, before starting work. One of the things on the checklist is the status of the nightly build. Our build system spits out an email after it's run, reporting the status, so this is easy to find out - as it happens, it goes to one guy, but really it should go to everyone, or be posted onto the project wiki.
If the build is broken, then fixing it becomes a top-priority task, to be handled like any other task, which means that we'll decide at the standup who is going to work on it, and then they go and do so. We do pair programming, and will usually treat this as a pair task. If we're short-staffed, we might assign one person to investigate the breakage, and then pair someone with him to fix it a bit later.
We don't have a formal mechanism for assigning the task: we're a small team (six people, usually), and have collective code ownership, so we just work it out between ourselves. If we think one particular pair's checkin broke the build, it would usually be them who fix it. If not, it could be anyone; it's usually decided by seeing who isn't currently in the middle of some other task.

Resources