I have been educating myself about monorepos as I believe it is a great solution for my team and the current state of our projects. We have multiple web products (Client portal, Internal Portal, API, Core shared code).
Where I am struggling to find the answer that I want to find is versioning.
What is the versioning strategy when all of your projects and products are inside a monorepo?
1 version fits all?
Git sub-modules with independent versioning (kind of breaks the point of having a mono repo)
Other strategy?
And from a CI perspective, when you commit something in project A, should you launch the whole suite of tests in all of the projects to make sure that nothing broke, even though there was no necessarily a change made to a dependency/share module?
What is the versioning strategy when all of your projects and products are inside a monorepo?
I would suggest that one version fits all for the following reasons:
When releasing your products you can tag the entire branch as release-x.x.x for example. If bugs come up you wouldn't need to check "which version was of XXX was YYY using"
It also makes it easier to force that version x.x.x of XXX uses version x.x.x of YYY. In essence, keeping your projects in sync. How you go about this of course depends on what technology your projects are written in.
And from a CI perspective, when you commit something in project A, should you launch the whole suite of tests in all of the projects to make sure that nothing broke, even though there was no necessarily a change made to a dependency/share module?
If the tests don't take particularly long to execute, no harm can come from this. I would definitely recommend this. The more often your tests run the sooner you could uncover time dependent or environment dependent bugs.
If you do not want to run tests all the time for whatever reason, you could query your VCS and write a script which conditionally triggers tests depending on what has changed. This relies heavily on integration between your VCS and your CI server.
Related
There are multiple modules in our applications and each of them have their separate version and depend on other modules(external to our organization). They all have a parent POM which has it's own version, independent from the children's version.
When one of those modules change, they're converted to snapshots.
For the following example:
Parent v14.0
- module1 v1.5.0
- dependency1(module2 v15.0.0)
- dependency2(external-jar v12.0.1)
- module2 v15.0.0
- module3 v3.1.0
If there would be a change in module2, then module2's version would become v15.0-SNAPSHOT, then module1 becomes v1.5-SNAPSHOT. The parent remains the same.
The purpose for not having the same version on parent+modules, is that we want to localize the updates made to some modules and not affect the others' versions.
This has been designed like this a long time ago and there are several bash scripts to support the updates, although they're not handling all the cases. In any case, we don't have a one-click release process and we feel we are quite far from it with this approach.
We don't know how to convince the management towards a single version approach on all modules. How do you feel about the above? Did you ever encountered a project using the above structure and how well did it go?
Thank you!
I've had to deal with such situations before. There is an actual benefit from having decentralized versions, especially in cases where your product is made out of a large number of modules and this is because of the following facts:
You don't have to release all of them as a whole, if only a handful have been changed (which, from my observation is almost always the case).
You don't have to create unnecessary tags in your version control for code which hasn't changed since the previous release.
You don't have to waste an excessive amount of time releasing modules which don't need to be released.
You know with certainty which modules have changed in a release, which helps a lot when you need to investigate a complex bug, which seems to be dating back a while.
You can actually release certain modules/aggregators before the actual release date of the complete product, allowing for more testing time and a feeling of completeness for a given part of the product.
You can make feature branch releases much easier and implement a continuous delivery in a better way.
You can re-use the same code across multiple development branches without wondering if that branched version matches the one for your branch (or at least with less confusion).
What we ended up doing was:
Extract a parent or set of parents (with no sub-modules).
Try to use fixed versions for parents as much as possible. This is a bit of a caveat, as you must change all modules that inherit it, but in the end it improves the stability.
Extract each of the modules whose versions are independent of the rest to separate modules.
Extract sets of modules whose versions must always move along together into aggregators.
Create jobs in your CI server that can do releases or manually release these modules.
Use the versions-maven-plugin.
I think it's a lot more mature of a project and company's development principles to use decentralized versions and I must admit that in the beginning I was very reluctant to this approach. You might not realize or understand the benefits immediately, but with some practice and a proper setup, you will start seeing the upsides. I'm not saying there aren't caveats like... for example bumping the version of a parent, or having to know in which modules to bump the version of one of your modules.
From my experience, this module actually works better in the end, once you've become used to working with it.
From my experience: Everywhere we tried this, it eventually failed.
Although Maven supports this approach it is not advisable because of the additional effort.
I try to use the following criteria when choosing whether to use distinct projects or a multimodule structure:
If all projects have the same release cycle, I put them in a common multi module structure. In that case, I give them all the same version and release them together.
If a part of the project is used by different other projects (organizational projects), I always split them of and give them a separate lifecycle and a separate version.
If one part of my project stabilizes, I split it off, and give it a separate lifecycle (Maven refactoring)
To do it different alway results in homebrew solutions that neither scale well, nor are easy to maintain.
Hope that helps.
I am setting up a CruiseControl.Net server. So far, it only builds a project (.Net website), and I kind-of know how to set up unit testing, code coverage, etc in the future.
What I will need to have soon is this:
The developers commit changes to SVN continually, thus CCNet builds often.
CCNet will publish the latest version to the development server, as soon as a commit is validated (with unit tests etc).
The project manager validates a specific version, in order to publish it to the pre-production server, and create a SVN tag from this revision.
The last point is where my problem lies: how exactly can I set up things so the project manager can, for instance, browse to the CCNet web dashboard, select a previous specific build, and says "this is the build I want to publish" ?
I believe that my thinking is flawed somewhere, but I can't put my finger on it. Maybe CCNet is not the right place to do these manipulations ?
In my mind, I can create a SVN tag using CCNet, and mostly work from the trunk, but maybe I can't ? Maybe it's the other way around, and I should add a CCNet project every time a tag is created under SVN ?
The final goal is that I want to automate the publication process: zip creation (for archiving), web.config modification (using Nant for instance), and website publication (using FTP).
In all these steps, I want to limit the manual intervention to the maximum. If I can avoid to add a new project to CCNet every time a tag or branch is created in SVN, that would be awesome.
Thanks for your help, and sorry if it's not very easy to read, but it's not very clear in my head either...
Since you can create any task, you should be able to achieve the goal, though unfortunately not out-of-the-box.
Since you use SVN, it all depends actually on revision. I think I'd create a separate project for your third scenario and added a parameter where PM would provide revision number. Then based on that I'd tag sources etc. in my own task.
Regarding the other points, I think this is similar. Recently for web projects we started using MSDeploy, and in each stage build the MSDeploy package was created. Then there was a separate build called Deploy, that when forced allows us to select which package we want to deploy using MSDeploy.
Having several environments, however, started a little bit like overkill for managing with CCNet, and I'll be looking into kwakee at some time.
I decided to use the following pattern after reading semantic versioning at http://semver.org/. However, I have some unsolved issues in my mind in terms of automaticng and integrating SDLC tools.
Version Pattern:
major.minor.revision.build
Such that;
Major: major changes, should be increamented manually.
Minor: minor changes, should be increamented automatically, whenever a new feature or an enhancement to existing feature is solved in issue tracking system.
Revision: changes not affecting the minor changes, should be increamented automatically, whenever a bug is solved in issue tracking system.
Assume that developers never commit the source unless an issue has been solved in issue tracking system, and the issue tracking system is JIRA in this configuration. This means that there are bugs, improvements, and new features as issue types by default, apart from the tasks.
Furthermore, I am adding a continous integration tool in this configuration, and assume that it is bamboo (by the way, I never used bamboo before, I used Hudson), and I am using Eclipse IDE with mylyn plugin and plus the project is a Maven project (web).
Now, I want to elucidate what I want to do by illustrating following scenario. Analyst (A) opens an issue (I), which is a new feature, related to Maven project (P). As a developer (D), I receive an email about the issue, and I open the task via Mylyn interface in Eclipse. I understand and develop the new feature related to issue (I). Consider, I am a Test Driven Development oriented developer, thus I wrote the Unit, DBUnit, and User-Acceptance (for example using Selenium) tests correspondingly. Finally, I commit the changes to the source control. I think the rest should be cycled automatically but I don't know how can I achieve this? The auto-cycled part is the following:
The Source Control System should have a post-hook script that triggers the Continous integration tool to build the project (P). While building, in the proper phase the test code should be run, and their reports generated. The user-acceptance test should be performed in a dedicated server (For example, jboss, or Tomcat). The order of this acceptance test should be, up the server, run the UA test, then generate the UA test reports and down the server. If all these steps have been successfuly completed, the versioning should be performed. In versioning part, the Maven plugin, or what so ever, should take the number of issues solved from the Issue Tracking System, and increment the related version fragments (minor and revision), at last appends the build number. The fragments of the version may be saved in manifest file in order to show it in User Interface. Last but not the least, the CI tool should deploy it in Test environment. That's all auto-cycled processes I want.
The deployment of the artifact to the production environment should be done automatically or manually?
Let's start with the side question: On the automatic deployment to production, this requires the sign off of "the business" whomever that is. How good do your tests need to be to automatically push to production? Are they good enough that you trust things to just go live? What's your downtime? Is that acceptable? If your tests miss something, can you rollback? Are you monitoring production so you know if you've introduced problems? Generally, the answers to enough of these questions is negative enough that you can't auto-deploy there as the result of a build / autotest event.
As for the tracking, you'll need a few things. You'll need all your assumptions to be true (which I doubt they are, but if you get there that's awesome). You'll also need a build number that can be incremented after build time based on test results. You'll need source changes to be annotated with bug ids. You'll need the build system to parse the source changes and make associations with issues. You'll need an API into the build system so you can get the count of issues associated with the build. Finally you'll need your own bit of scripting to do the query and update the build number accordingly.
That's totally doable, but is it really worth having? What's the value you attach to the numbering scheme?
I've been dealing with the problem of scaling CI at my company and at the same time trying to figure out which approach to take when it comes to CI and multiple branches. There is a similar question at stackoverflow, Multiple feature branches and continuous integration. I've started a new one because I'd like to get more of discussion and provide some analysis in the question.
So far I've found that there are 2 main approaches that I can take (or maybe some others???).
Multiple set of jobs (talking about Jenkins/Hudson here) per branch
Write tooling to manage the extra jobs
Create/modify/delete Jobs in bulk
Custom settings for each job per branch (SCM url, dep management repos duplications)
Some examples of people tackling this problem with shell tools, ant scripts and Jenkins CLI. See:
http://jenkins.361315.n4.nabble.com/Multiple-branches-best-practice-td2306578.html
http://jenkins.361315.n4.nabble.com/Is-it-possible-to-handle-multiple-branches-where-some-jobs-should-run-on-each-one-without-duplicatin-td954729.html
http://jenkins.361315.n4.nabble.com/Parallel-development-with-branches-td1013013.html
Configure or Create hudson job automatically
Will cause more load on your CI cluster
Feedback cycle for devs slows down (if the infrastructure cannot handle the new load)
Multiple set of jobs per 2 branches (dev & stable)
Manage the two sets manually (if you change the conf of a job then be sure to change in the other branch)
PITA but at least so few to manage
Other extra branches won't get a full test suite before they get pushed to dev
Unsatisfied devs. Why should a dev care about CI scaling problems. He has a simple request, when I branch I would like to test my code. Simple.
So it seems if I want to provide devs with CI for their own custom branches I need special tooling for Jenkins (API or shellscripts or something?) and handle scaling. Or I can tell them to merge more often to DEV and live without CI on custom branches. Which one would you take or are there other options?
When you talk about scaling CI you're really talking about scaling the use of your CI server to handle all your feature branches along with your mainline. Initially this looks like a good approach as the developers in a branch get all the advantages of the automated testing that the CI jobs include. However, you run into problems managing the CI server jobs (like you have discovered) and more importantly, you aren't really doing CI. Yes, you are using a CI server, but you aren't continuously integrating the code from all of your developers.
Performing real CI means that all of your developers are committing regularly to the mainline. Easy to say, but the hard part is doing it without breaking your application. I highly recommend you look at Continuous Delivery, especially the Keeping Your Application Releasable section in Chapter 13: Managing Components and Dependencies. The main points are:
Hide new functionality until it's finished (A.K.A Feature Toggles).
Make all changes incrementally as a series of small changes, each of which is releasable.
Use branch by abstraction to make large-scale changes to the codebase.
Use components to decouple parts of your application that change at different rates.
They are pretty self explanatory except branch by abstraction. This is just a fancy term for:
Create an abstraction over the part of the system that you need to change.
Refactor the rest of the system to use the abstraction layer.
Create a new implementation, which is not part of the production code path until complete.
Update your abstraction layer to delegate to your new implementation.
Remove the old implementation.
Remove the abstraction layer if it is no longer appropriate.
The following paragraph from the Branches, Streams, and Continuous Integration section in Chapter 14: Advanced Version Control summarises the impacts.
The incremental approach certainly requires more discipline and care - and indeed more creativity - than creating a branch and diving gung-ho into re-architecting and developing new functionality. But it significantly reduces the risk of your changes breaking the application, and will save your and your team a great deal of time merging, fixing breakages, and getting your application into a deployable state.
It takes quite a mind shift to give up feature branches and you will always get resistance. In my experience this resistance is based on developers not feeling safe committing code the the mainline and this is a reasonable concern. This in turn usually stems from a lack of knowledge, confidence or experience with the techniques listed above and possibly with the lack of confidence with your automated tests. The former can be solved with training and developer support. The latter is a far more difficult problem to deal with, however branching doesn't provide any extra real safety, it just defers the problem until the developers feel confident enough with their code.
I would set up separate jobs for each branch. I've done this before and it isn't hard to manage and set up if you've set up Hudson/Jenkins correctly. A quick way to create multiple jobs is to copy from an existing job that has similar requirements and modify them as needed. I'm not sure if you want to allow each developer to setup their own jobs for their own branches, but it isn't much work for one person (i.e. a build manager) to manage. Once the custom branches have been merged into stable branches, corresponding jobs can be removed when they are no longer necessary.
If you're worried about the load on the CI server, you could set up separate instances of the CI or even separate slaves to help balance the load across multiple servers. Make sure that the server you are running Hudson/Jenkins on is adequate. I've used Apache Tomcat and just had to ensure that it had enough memory and processing power to process the build queue.
It's important to be clear on what you want to achieve using CI and then figure out a way to implement it without much manual effort or duplication. There's nothing wrong with using other external tools or scripts that are executed by your CI server that help simplify your overall build management process.
I would choose dev+stable branches. And if you still want custom branches and afraid of the load, then why not move these custom ones to the cloud and let developers manage it themselves, e.g. http://cloudbees.com/dev.cb
This is the company where Kohsuke is now.
There is an Eclipse Tooling also, so if you are on Eclipse, you will have it tightly integrated right into dev env.
Actually what is really problematic is build isolation with feature branches. In our company we have a set of separate maven projects all be part of a larger distribution. These projects are maintained by different teams but for each distribution all projects need to be released. A featurebranch may now overlap from one project to another and thats when build isolation gets painfully. There are several solutions we've tried:
create separate snapshot repositories in nexus for each feature branch
share local repositories on dedicated slaves
use the repository-server-plugin with upstream repositories
build all within one job with one private repository
As a matter of fact, the last solution is the most promising. All other solutions lack in one or another way. Together with the job-dsl plugin it is easy to setup a new feature branch. simply copy and paste the groovy script, adapt branches and let the seed job create the new jobs. Make sure that the seed job removes nonmanaged jobs. Then you can easily scale with feature branches over different maven projects.
But as tom said well above, it would be nicer to overcome the necessity of feature branches and teach devs to integrate cleanly, but that is a longer process and the outcome is not clear with many legacy system parts you won't touch any more.
my 2 cents
There has been some discussion in abandoning our CI system (Hudson FWIW) due to the fact that our projects are somewhat segmented. Without revealing too much, you can think of each project as similar to a web site project: it has dependencies, its own unit tests, etc.
It seems like one of the major benefits of CI is to make sure that each component of a project works together, but aside from project inheritance most of our projects are standalone and unit tested fairly well.
Given what I have explained here (the oddity in our project organization); can anyone explain any benefits of CI for segmented\modular\many projects?
So far as I can tell, this is the only good reason I've found:
“Bugs are also cumulative. The more bugs you have, the harder it is to remove each one. This is partly because you get bug interactions, where failures show as the result of multiple faults - making each fault harder to find. It's also psychological - people have less energy to find and get rid of bugs when there are many of them - a phenomenon that the Pragmatic Programmers call the Broken Windows syndrome.”
From here: http://martinfowler.com/articles/continuousIntegration.html#BenefitsOfContinuousIntegration
I would use Hudson for the following reasons:
Ensuring that your projects build/compile properly.
Building jobs dependent on the build success of other jobs.
Ensuring that your code adheres to agreed-upon coding standards.
Running unit tests.
Notifying development team of any issues found.
If the number of projects steadily increases, you will find the need to be able to manage each one effectively, especially considering the above reasons for doing so.
In your situation, you can benefit from CI in (at least) these two ways:
You can let the CI server run certain larger test suites automatically after each subversion/... check-in. Especially those which test the interaction of different modules, hence the name continuous integration. This takes away the maintenance work and waiting time from the developers when they consider a check-in. Some CI (e.g. Hudson) also can be configured to automatically build modules when a depending module is build. This way you can let it automatically test if depending modules are compatible with the new version of the changed one.
You can let the CI server publish the new artifacts to the repository of a dependency resolver (e.g., Ivy, Maven). This way, the various modules can automatically download the latest (stable) revisions of the modules they depend on. Combine this point with the previous one and imagine the possibilities (!!!).