How does CI affect semantic versioning? - continuous-integration

In Countinous Delivery book, it's recommended to keep everything - including CI scripts - in the version control. Actually, current CI systems like gitlab CI already follow this rule of thumb and search for CI scripts in the same codebase.
On the other hand, we are versioning our codebase (and it's built artifacts) whenever it changes. And we follow semantic versioning for that; incrementing patch field for bugfixes, minor for non-breaking features, and so on...
And we make sure the version is incremented between commits by checking it in the CI.
But, there are commits that only change the CI scripts; i.e. adding an analysis job, optimizing another, etc.
My question, after this long boring preface, is that what is the best practice to versioning such changes to the CI? Since it possibly can affect the final built artifact (e.g. changing a build flag in the CI job for optimization or ...).
Is it ok to increment the version in this case?

Git is a revision control system. Every time you commit something to a git repo, it labels the content of the repo with a content hash value that represents that version of the repo. Semantic versioning of a git repo's content is redundant and pointless. The whole point of SemVer is to provide a means for producers to communicate risk to consumers. In other words, semantic versioning is intended for build product labeling, not the bits that go into producing the build.
If you attempt to apply SemVer semantics to the repo, you are labeling the product inputs, not the product itself. You should not apply a SemVer string until after all unit/regression/acceptance tests have been performed. How else can you have any certainty whether the code/build-script changes have broken anything?
Pre-build labeling cannot work. Build processes that are capable of reproducing the exact same output twice in a row, are extremely rare, if any exist at all. It is a violation of best practice to have multiple API's/packages in the world with the same SemVer string attached to them. If you label the repo content and then forward that label to the build output, every time you run the build, you produce a package with different content. There will always be some risk that more than one of those outputs will be released into the wild. Many security conscious consumers pay close attention to the content hash of packages they consume. Detecting that a particular producer has released multiple package hashes without bumping the version number, will raise red flags and lead to mistrust of that producers internal processes.
This is a very deep topic that can't be fully covered here. Other issues to consider are OS/Compiler/Tool chain updates. Will you also be committing the entire build tool chain to the same repo? This is an untenable approach, full of hazards I cannot fully enumerate, without taking a few months off work to document them.
Best practice:
Use semantic commit messages that clearly state the developer's intent.
Validate build outputs prior to packaging/labeling.
Always keep humans in the loop, for non-prerelease publications.
Just for clarity, let me add that maintaining build scripts and tool manifests in the repo is considered a best practice. It ties the versions of your scripts and tools, to the versions of the code you are building. Git does do this job quite well, by creating a commit hash that encompasses the state of the entire repo (minus the tags if I recall correctly). But there will be issues eventually, with older versions of tools, being withdrawn from file shares/feeds, particularly when they are found to create security vulnerabilities.
It will sometimes be the case, that older versions of your products, cannot be reproduced using the earlier build process. Checking in the binaries is often promoted as a fix for this issue, but I would argue that it's an anti-pattern. Binaries you are likely never going to want or need in the future, should not be stored in your repo. It just clogs everything up.
Consider using an alternate archival system. Maintaining a separate archive of older tools isn't a bad idea, but you will often find that you simply can't run them on current hardware and OS's, without significant reconfiguration of build machine(s) and re-introducing well known security risks. You should frequently prune such an archive, based on the latest known risks and weighing the cost of having to do some additional work, if/when the day ever comes, that you need to build from a really old commit hash.
It is better to maintain an up-to-date build system, that can build all of your code base, back to some reasonable point in its history. That point is usually the oldest bits that you are willing to actively support with bug fixes.

These days I'm using SemVer compatible Headver; https://github.com/line/HeadVer and feeling happy.
It is very CI friendly thanks to the automatic incremental versioning, but still be able to announce when breaking changes happens by allowing to define major version number manually.

Related

maven parent pom with different child versions

There are multiple modules in our applications and each of them have their separate version and depend on other modules(external to our organization). They all have a parent POM which has it's own version, independent from the children's version.
When one of those modules change, they're converted to snapshots.
For the following example:
Parent v14.0
- module1 v1.5.0
- dependency1(module2 v15.0.0)
- dependency2(external-jar v12.0.1)
- module2 v15.0.0
- module3 v3.1.0
If there would be a change in module2, then module2's version would become v15.0-SNAPSHOT, then module1 becomes v1.5-SNAPSHOT. The parent remains the same.
The purpose for not having the same version on parent+modules, is that we want to localize the updates made to some modules and not affect the others' versions.
This has been designed like this a long time ago and there are several bash scripts to support the updates, although they're not handling all the cases. In any case, we don't have a one-click release process and we feel we are quite far from it with this approach.
We don't know how to convince the management towards a single version approach on all modules. How do you feel about the above? Did you ever encountered a project using the above structure and how well did it go?
Thank you!
I've had to deal with such situations before. There is an actual benefit from having decentralized versions, especially in cases where your product is made out of a large number of modules and this is because of the following facts:
You don't have to release all of them as a whole, if only a handful have been changed (which, from my observation is almost always the case).
You don't have to create unnecessary tags in your version control for code which hasn't changed since the previous release.
You don't have to waste an excessive amount of time releasing modules which don't need to be released.
You know with certainty which modules have changed in a release, which helps a lot when you need to investigate a complex bug, which seems to be dating back a while.
You can actually release certain modules/aggregators before the actual release date of the complete product, allowing for more testing time and a feeling of completeness for a given part of the product.
You can make feature branch releases much easier and implement a continuous delivery in a better way.
You can re-use the same code across multiple development branches without wondering if that branched version matches the one for your branch (or at least with less confusion).
What we ended up doing was:
Extract a parent or set of parents (with no sub-modules).
Try to use fixed versions for parents as much as possible. This is a bit of a caveat, as you must change all modules that inherit it, but in the end it improves the stability.
Extract each of the modules whose versions are independent of the rest to separate modules.
Extract sets of modules whose versions must always move along together into aggregators.
Create jobs in your CI server that can do releases or manually release these modules.
Use the versions-maven-plugin.
I think it's a lot more mature of a project and company's development principles to use decentralized versions and I must admit that in the beginning I was very reluctant to this approach. You might not realize or understand the benefits immediately, but with some practice and a proper setup, you will start seeing the upsides. I'm not saying there aren't caveats like... for example bumping the version of a parent, or having to know in which modules to bump the version of one of your modules.
From my experience, this module actually works better in the end, once you've become used to working with it.
From my experience: Everywhere we tried this, it eventually failed.
Although Maven supports this approach it is not advisable because of the additional effort.
I try to use the following criteria when choosing whether to use distinct projects or a multimodule structure:
If all projects have the same release cycle, I put them in a common multi module structure. In that case, I give them all the same version and release them together.
If a part of the project is used by different other projects (organizational projects), I always split them of and give them a separate lifecycle and a separate version.
If one part of my project stabilizes, I split it off, and give it a separate lifecycle (Maven refactoring)
To do it different alway results in homebrew solutions that neither scale well, nor are easy to maintain.
Hope that helps.

Point of Artifact Repositories For Internal Code

Imagine for a cloud based solution, a good portion of the deployed code is developed internally. My question is what is the point of using an Artifact Repository for internal code where you could always build whatever version directly from the source code?
In other words, doesn't it make more sense to spend the time on the build server to facilitate ease pf building desired artifact versions from the code vs adding an Artifact Repository like Nexus to feed build artifacts to deployments?
In theory yes, if you can be certain
everything that went into an artifact is checked in such as sources, data files
the exact environment (OS, compiler, linker, tools) used to built your artifact can be restored perfectly (snapshot of virtual machine)
nothing was forgotten
EDIT
In practice, as Mark O'Conner notes, even then two builds will normaly not be identical because they typically include timestamps and checksums depending on the former. You would have to somehow manually fix those during the build or somehow exactly reproduce time and timing on your build computer.
Otherwise you might face the situation that you can not (exactly) rebuild a certain Artifact. I prefer to have everything published to be stored in safe place.
The Continuous Delivery book calls the practice of building a binary more than once an antipattern:
This antipattern violates two important principles. The first is to
keep the de-ployment pipeline efficient, so the team gets feedback as
soon as possible. Recompiling violates this principle because it
takes time, especially in large systems. The second principle is to
always build upon foundations known to be sound. The binaries that get
deployed into production should be exactly the same as those that went
through the acceptance test process—and indeed in many pipeline
im-plementations, this is checked by storing hashes of the binaries at
the time they are created and verifying that the binary is identical
at every subsequent stage in the process.
Binary equality checking via hash may also be important for auditing purposes in highly regulated domains.

Maintaining upstream vendor source with Xcode and SVN

Question: What is the best way to maintain a project based on another OSS project, through Xcode and version managed by SVN?
I'd like to start a fork (?) of a reasonably popular open source project (it's allowed). Mostly, I want to build my own user interface written in Cocoa/ObjC for it and throw in a few custom features of my own as well.
Now, this OSS project isn't exactly small. The project itself has over 3000 files, and the build process is pretty intense- consisting of multiple stages and steps, which need to compile build tools, run those, then compile the results.
All this is fine and dandy in Xcode, since it's easy enough to setup build phases and rules to handle everything.
What I'm not clear on, is how best to manage patches from upstream. They are constantly working on the project and I'd like to be able to keep up to date with those patches as easily as possible, as many of the diff files effect sometimes up to a hundred (!) files at once.
So maintaining a pristine unmodified copy of that source tree so I can apply patches to it seems like a smart thing to do, because I really don't want to be sorting through hundreds of files every few weeks merging patches by hand.
What I'm thinking of doing in this regard is:
1) Setup an "upstream" SVN repo to hold a copy of the upstream source, plus the bare minimum required to compile it in Xcode (so an xcproject, a few xcconfigs, some prefix header files and that's it)
2) Setup my own "downstream" SVN repo where I do all my work and apply my own modifications.
Whenever upstream releases a patch, I can apply it to #1 then synchronize across to #2, and deal with any issues created by my own modifications.
What I'm not clear about, is if this is a sane way of handling things- or if there's some better practice I should be following.
Is this the best way to handle things, or should I be looking at doing this some other way?
In SVN-world it was named "Vendor Branches" long time ago and intensively used by many teams (you can additionally google this phrase)
Technically it's
one SVN repo
at least one special branch (special in terms of usage, nothing more), which, with svn:externals, linked to 3-rd party repo of upstream code
your place for changes (trunk or any other place, I prefer trunk), initially created as copy of vanilla code and there you perform all code-hacks
If (or "when") vendor branch got updates from upstream, you have just merge branch to /your place/, integrate changes and continue to work

Multiple feature branches and continuous integration

I've been doing some reading about continuous integration recently and there is a scenario which could occur which I don't understand how to deal with appropriately.
We have a stable mainline/trunk branch and create branches for features. Each developer will keep their own feature branches up to date by merging from trunk into their branch on a regular basis. However it is entirely possible that two or more feature branches could be created and worked on over a period of several weeks or months. In this time many releases of the software could be deployed. This where my confusion arises.
It is very likely that changes for one feature branch will cause merge conflicts with other feature branches. CI suggests you should merge into trunk at least daily which would resolve the conflicts quickly. However, you may not want to merge the feature code into trunk because it may not be finished or you may not want that feature available in the next release. So, how do you deal with this scenario and still follow CI principles of daily code integration?
There are no feature branches in proper CI. Use feature toggles instead.
The idea explained more fully in this article is to merge from the trunk/release branch to feature branches daily, but only merge back in the other direction once a feature meets your definition of 'done'.
Code written by one feature team will be pushed into the trunk once it's complete, and will be 'distributed' to the other teams, where conflicts can be dealt with, as part of the daily merge process.
This doesn't go as far as satisfying Nick's desire for a version control system that can be used a backup tool, unless the changes being made are small enough that they can be committed to the feature branch within a timeframe where the the risk of losing your work is acceptable.
I personally don't try to reintegrate code into the release branch before it's done, and although I've never really tried, I'm sure building feature toggles in for unfinished work has its own issues.
I think they mean merging mainline into the feature branch, not the other way 'round. This way, the feature branch will not deviate from mainline too much, and be kept in an easily mergeable state.
The git folks do the same thing by rebasing feature branches on top of the master branch before submitting a feature.
In my experience with CI, the way that you should keep your feature branches up to date with the main line changes as others have suggested. This has been working me for several releases. If you are using subversion make sure you to merge with the merge history enable. This way when you are trying to merge your changes back to line it will only like you are merging the feature changes to line, not trying resolve conflicts which your feature might have with the main line. If you are using more advance VCS like git the first merge will be a rebase where the second will be a merge.
There are tools that can support you to get thins done more smoothly like this Feature branches with Bamboo
Feature branches committing back into the mainline, and OFTEN is an essential feature of Continuous Integration. For a thorough breakdown, see This Article
There's now some good resources showing how to combine both CI and feature branches. Bamboo or Feature Branch Notifier are some ways to look.
And this is another quite long article showing pros of so called distributed CI. Hereunder, one excerpt explaining the benefits:
Distributed CI has the advantage for Continuous Deployment because it keeps a clean and stable Mainline branch that can always be deployed to Production. During a Centralized CI process, an unstable Mainline will exist if code does not integrate properly (broken build) or if there is unfinished work integrated. This works quite well with iteration release planning, but creates a bottleneck for Continuous Deployment. The direct line from developer branch to Production must be kept clean in CD, Distributed CI does this by only allowing Production ready code to be put into the Mainline.
One thing that still can be challenging is keeping the branch build isolated so that it doesn't pollute your repository of binaries by pushing its branch builds to it. Bamboo seems to address that, but not sure it's as easy with Jenkins.

Release management with a distributed version control system

We're considering a switch from SVN to a distributed VCS at my workplace.
I'm familiar with all the reasons for wanting to use a DVCS for day-to-day development: local version control, easier branching and merging, etc., but I haven't seen that much that's compelling in terms of managing software releases. Here's our release process:
Discover what changes are available for merging.
Run a query to find the defects/tickets associated with these changes.
Filter out changes associated with "open" tickets. In our environment, tickets must be in a closed state in order to merged with a release branch.
Filter out changes we don't want in the release branch. We are very conservative when it comes to merging changes. If a change isn't absolutely necessary, it doesn't get merged.
Merge available changes, preferably in chronological order. We group changes together if they're associated with the same ticket.
Block unwanted changes from the release branch (svnmerge block) so we don't have to deal with them again.
Sometimes we can be juggling 3-5 different milestones at a time. Some milestones have very different constraints, and the block list can get quite long.
I've been messing around with git, mercurial and plastic, and as far as I can tell none of them address this model very well. It seems like they would work very well when you have only one product you're releasing, but I can't imagine using them for juggling multiple, very different products from the same codebase.
For example, cherry-picking seems to be an afterthought in mercurial. (You have to use the 'transplant' extension, which is disabled by default). After you cherry-pick a change into a branch it still shows up as an available integration. Cherry-picking breaks the mercurial way of working.
DVCS seems to be better suited for feature branches. There's no need for cherry-picking if you merge directly from a feature branch to trunk and the release branch. But who wants to do all that merging all the time? And how do you query for what's available to merge? And how do you make sure all the changes in a feature branch belong together? It sounds like total chaos.
I'm torn because the coder in me wants DVCS for day-to-day work. I really want it. But I fear the day when I have to put the release manager hat and sort out what needs to be merged and what doesn't. I want to write code, I don't want to be a merge monkey.
You really want to be using git in this situation, because it is so vastly superior when it comes to merging and release management. Git allows for a signoff process for changes to go to release; it actually provides support for multiple layers of release management, precisely because that is how Linux is managed.
Simply put each release in a branch. Instead of blocking out changes you don't want, accept only those that you do, by only signing off for release those changes that are going into a release.
Git also allows you to cherry-pick a collection of changes into a single patch to be sent upstream, so you don't have to carry all the 'oops, that didn't quite work' patches into the release branch or repository, only nice clean feature or fix patches.

Resources