maven parent pom with different child versions - maven

There are multiple modules in our applications and each of them have their separate version and depend on other modules(external to our organization). They all have a parent POM which has it's own version, independent from the children's version.
When one of those modules change, they're converted to snapshots.
For the following example:
Parent v14.0
- module1 v1.5.0
- dependency1(module2 v15.0.0)
- dependency2(external-jar v12.0.1)
- module2 v15.0.0
- module3 v3.1.0
If there would be a change in module2, then module2's version would become v15.0-SNAPSHOT, then module1 becomes v1.5-SNAPSHOT. The parent remains the same.
The purpose for not having the same version on parent+modules, is that we want to localize the updates made to some modules and not affect the others' versions.
This has been designed like this a long time ago and there are several bash scripts to support the updates, although they're not handling all the cases. In any case, we don't have a one-click release process and we feel we are quite far from it with this approach.
We don't know how to convince the management towards a single version approach on all modules. How do you feel about the above? Did you ever encountered a project using the above structure and how well did it go?
Thank you!

I've had to deal with such situations before. There is an actual benefit from having decentralized versions, especially in cases where your product is made out of a large number of modules and this is because of the following facts:
You don't have to release all of them as a whole, if only a handful have been changed (which, from my observation is almost always the case).
You don't have to create unnecessary tags in your version control for code which hasn't changed since the previous release.
You don't have to waste an excessive amount of time releasing modules which don't need to be released.
You know with certainty which modules have changed in a release, which helps a lot when you need to investigate a complex bug, which seems to be dating back a while.
You can actually release certain modules/aggregators before the actual release date of the complete product, allowing for more testing time and a feeling of completeness for a given part of the product.
You can make feature branch releases much easier and implement a continuous delivery in a better way.
You can re-use the same code across multiple development branches without wondering if that branched version matches the one for your branch (or at least with less confusion).
What we ended up doing was:
Extract a parent or set of parents (with no sub-modules).
Try to use fixed versions for parents as much as possible. This is a bit of a caveat, as you must change all modules that inherit it, but in the end it improves the stability.
Extract each of the modules whose versions are independent of the rest to separate modules.
Extract sets of modules whose versions must always move along together into aggregators.
Create jobs in your CI server that can do releases or manually release these modules.
Use the versions-maven-plugin.
I think it's a lot more mature of a project and company's development principles to use decentralized versions and I must admit that in the beginning I was very reluctant to this approach. You might not realize or understand the benefits immediately, but with some practice and a proper setup, you will start seeing the upsides. I'm not saying there aren't caveats like... for example bumping the version of a parent, or having to know in which modules to bump the version of one of your modules.
From my experience, this module actually works better in the end, once you've become used to working with it.

From my experience: Everywhere we tried this, it eventually failed.
Although Maven supports this approach it is not advisable because of the additional effort.
I try to use the following criteria when choosing whether to use distinct projects or a multimodule structure:
If all projects have the same release cycle, I put them in a common multi module structure. In that case, I give them all the same version and release them together.
If a part of the project is used by different other projects (organizational projects), I always split them of and give them a separate lifecycle and a separate version.
If one part of my project stabilizes, I split it off, and give it a separate lifecycle (Maven refactoring)
To do it different alway results in homebrew solutions that neither scale well, nor are easy to maintain.
Hope that helps.

Related

How does CI affect semantic versioning?

In Countinous Delivery book, it's recommended to keep everything - including CI scripts - in the version control. Actually, current CI systems like gitlab CI already follow this rule of thumb and search for CI scripts in the same codebase.
On the other hand, we are versioning our codebase (and it's built artifacts) whenever it changes. And we follow semantic versioning for that; incrementing patch field for bugfixes, minor for non-breaking features, and so on...
And we make sure the version is incremented between commits by checking it in the CI.
But, there are commits that only change the CI scripts; i.e. adding an analysis job, optimizing another, etc.
My question, after this long boring preface, is that what is the best practice to versioning such changes to the CI? Since it possibly can affect the final built artifact (e.g. changing a build flag in the CI job for optimization or ...).
Is it ok to increment the version in this case?
Git is a revision control system. Every time you commit something to a git repo, it labels the content of the repo with a content hash value that represents that version of the repo. Semantic versioning of a git repo's content is redundant and pointless. The whole point of SemVer is to provide a means for producers to communicate risk to consumers. In other words, semantic versioning is intended for build product labeling, not the bits that go into producing the build.
If you attempt to apply SemVer semantics to the repo, you are labeling the product inputs, not the product itself. You should not apply a SemVer string until after all unit/regression/acceptance tests have been performed. How else can you have any certainty whether the code/build-script changes have broken anything?
Pre-build labeling cannot work. Build processes that are capable of reproducing the exact same output twice in a row, are extremely rare, if any exist at all. It is a violation of best practice to have multiple API's/packages in the world with the same SemVer string attached to them. If you label the repo content and then forward that label to the build output, every time you run the build, you produce a package with different content. There will always be some risk that more than one of those outputs will be released into the wild. Many security conscious consumers pay close attention to the content hash of packages they consume. Detecting that a particular producer has released multiple package hashes without bumping the version number, will raise red flags and lead to mistrust of that producers internal processes.
This is a very deep topic that can't be fully covered here. Other issues to consider are OS/Compiler/Tool chain updates. Will you also be committing the entire build tool chain to the same repo? This is an untenable approach, full of hazards I cannot fully enumerate, without taking a few months off work to document them.
Best practice:
Use semantic commit messages that clearly state the developer's intent.
Validate build outputs prior to packaging/labeling.
Always keep humans in the loop, for non-prerelease publications.
Just for clarity, let me add that maintaining build scripts and tool manifests in the repo is considered a best practice. It ties the versions of your scripts and tools, to the versions of the code you are building. Git does do this job quite well, by creating a commit hash that encompasses the state of the entire repo (minus the tags if I recall correctly). But there will be issues eventually, with older versions of tools, being withdrawn from file shares/feeds, particularly when they are found to create security vulnerabilities.
It will sometimes be the case, that older versions of your products, cannot be reproduced using the earlier build process. Checking in the binaries is often promoted as a fix for this issue, but I would argue that it's an anti-pattern. Binaries you are likely never going to want or need in the future, should not be stored in your repo. It just clogs everything up.
Consider using an alternate archival system. Maintaining a separate archive of older tools isn't a bad idea, but you will often find that you simply can't run them on current hardware and OS's, without significant reconfiguration of build machine(s) and re-introducing well known security risks. You should frequently prune such an archive, based on the latest known risks and weighing the cost of having to do some additional work, if/when the day ever comes, that you need to build from a really old commit hash.
It is better to maintain an up-to-date build system, that can build all of your code base, back to some reasonable point in its history. That point is usually the oldest bits that you are willing to actively support with bug fixes.
These days I'm using SemVer compatible Headver; https://github.com/line/HeadVer and feeling happy.
It is very CI friendly thanks to the automatic incremental versioning, but still be able to announce when breaking changes happens by allowing to define major version number manually.

How to version products inside monorepo?

I have been educating myself about monorepos as I believe it is a great solution for my team and the current state of our projects. We have multiple web products (Client portal, Internal Portal, API, Core shared code).
Where I am struggling to find the answer that I want to find is versioning.
What is the versioning strategy when all of your projects and products are inside a monorepo?
1 version fits all?
Git sub-modules with independent versioning (kind of breaks the point of having a mono repo)
Other strategy?
And from a CI perspective, when you commit something in project A, should you launch the whole suite of tests in all of the projects to make sure that nothing broke, even though there was no necessarily a change made to a dependency/share module?
What is the versioning strategy when all of your projects and products are inside a monorepo?
I would suggest that one version fits all for the following reasons:
When releasing your products you can tag the entire branch as release-x.x.x for example. If bugs come up you wouldn't need to check "which version was of XXX was YYY using"
It also makes it easier to force that version x.x.x of XXX uses version x.x.x of YYY. In essence, keeping your projects in sync. How you go about this of course depends on what technology your projects are written in.
And from a CI perspective, when you commit something in project A, should you launch the whole suite of tests in all of the projects to make sure that nothing broke, even though there was no necessarily a change made to a dependency/share module?
If the tests don't take particularly long to execute, no harm can come from this. I would definitely recommend this. The more often your tests run the sooner you could uncover time dependent or environment dependent bugs.
If you do not want to run tests all the time for whatever reason, you could query your VCS and write a script which conditionally triggers tests depending on what has changed. This relies heavily on integration between your VCS and your CI server.

Should I leave the parent pom in SNAPSHOT or pass it to RELEASE when releasing one of the parent's modules

I would like to have more information about Maven best practices when releasing projects.
Say I have a parent pom with N modules. All are in SNAPSHOT versions including the parent. If I want to release one but not all of the modules, should I pass version for the parent pom project in RELEASE, or leave it in SNAPSHOT?
What are the pros and cons?
A bit more generally speaking, there are huge advantages in terms of simplicity, and very little disadvantages (a bit more hard drive consumption on the Nexus box), to keeping all version numbers uniform across a parent-child structure project, including keeping their RELEASE / SNAPSHOT status in sync. By always doing simultaneous releases of all modules, you do not have to keep in mind and manage the component inter-dependencies. You do not need to announce the releases to those clients, who only depend on a component that has not really changed.
That being said, if you do decide to "micro manage" a parent-child structure project, and every time to release only a certain subset of components, I would follow the following rules:
1.keep a component in its old RELEASE version until it needs to change. Promote it to the next SNAPSHOT only once it has been modified.
1a.That applies to parent poms too.
2.if a module depends on another module that has been modified, that counts as a modification of the former module as well
(if you want to modify a dependency, but to keep the item that depends on it in its old version, it helps a lot in terms of consistency and being able to reason about what is going on to simply branch out, and modify and release the dependency in a separate branch, in which the item that depends on it is also considered as being modified, but from where we do not release it. This helps you keep in mind that the dependency and the item that depends on it are being released from different branches, and are therefore deliberately inconsistent with each other at present.)
2a.this extends to cases where instead of "depends on a module that" we have "has a parent that". In a lot of ways, parents are like dependencies, in maven.
3.when you want to release (promote to the next RELEASE version) a module, you need to also release all modified modules it depends on (including its modified parents)
Respecting these will keep your project in a consistent state and make it easy to reason about its status, in my experience.
Since your parent is in SNAPSHOT, I assume it has been modified. Therefore you need to bump it to the next RELEASE when releasing the child you want to release.
If the parent has not changed since its last release, you could have kept it in its last RELEASE version, and this way you would not need to release it now together with the child that "depends on" (in terms of being its child) it.
You are writing 'All are in SNAPSHOT versions including the parent' you should not set in your parent SNAPSHOT in this way you can't release all the project.
If you want to release only a subpart of the project then make an other parent and include between your parent and all the projects or use as an other parent. In fact you have your reason to release only subset of your project and in this way you have a better documentation of your work.

Should I increase the version number in my project if a dependency changes?

Let's suppose I have a project called myLib-1.1.0. This project has a dependency on lib-dependency-1.2.3.
If there's a new version for this dependency and I need to use it, should I change my project version as well? No other modifications are made to myLib.
At the same time myLib is a dependency for various other projects. My main concern is the impact of a small change in a dependency might have upstream.
Yes. In maven, released versions are immutable. If you release 1.1.0 with a dependency to lib-dependency-1.2.3 then that's it.
If you change to depend on lib-dependency-1.2.4 then that's a new version. You should not redeploy 1.1.0 since some people might have already pulled that (supposedly immutable) 1.1.0.
So that means you need a different version, even if it's a just a new qualifier (myLib-1.1.0-RC-2 for example, but better just 1.1.1)
Maven doesn't recheck remote repos for release versions once it has it in the local repo, so if someone has 1.1.0 already locally, they will not get the new, fixed 1.1.0.
And about your rippling problem. Upstream projects should depend on the lowest acceptable released version. i.e. if the upstream project itself is ok with myLib-1.1.0 because it doesn't need (indirectly) lib-dependency-1.2.4 then it should stay with 1.1.0
Any code change that potentially affects the behavior should be given a new version number, in other words: anything that's not an absolute trivial change should be given a new version number. A changed dependency would definitely qualify for that because, unless you do a thorough code inspection of the dependency, you have no reason to assume that they only made absolute trivial changes.
Changes are often advertised as "small" (similar to being absolutely trivial as I call it above), but they hardly ever are. They may be negligible in someone's use case, but not in someone else's use case. I've even seen circumstances where there were only changes to Javadocs in a project that would break things down the line. (You could argue about how smart it is for someone to depend that strongly on Javadoc, but that's besides the point, isn't it?)
That is not to say that you can't accumulate changes and release a bunch of them as a single release. While accumulating, your project is in flux, and should have a ...-SNAPSHOT version. There should be no two versions of myLib-1.1.0 (without the -SNAPSHOT) that have even the least little change.
The fact that you're re-releasing your project also makes explicit the fact that regression testing and such should be redone to validate that it's still working with the changes in its dependency.

Benefits of CI for highly modularized projects

There has been some discussion in abandoning our CI system (Hudson FWIW) due to the fact that our projects are somewhat segmented. Without revealing too much, you can think of each project as similar to a web site project: it has dependencies, its own unit tests, etc.
It seems like one of the major benefits of CI is to make sure that each component of a project works together, but aside from project inheritance most of our projects are standalone and unit tested fairly well.
Given what I have explained here (the oddity in our project organization); can anyone explain any benefits of CI for segmented\modular\many projects?
So far as I can tell, this is the only good reason I've found:
“Bugs are also cumulative. The more bugs you have, the harder it is to remove each one. This is partly because you get bug interactions, where failures show as the result of multiple faults - making each fault harder to find. It's also psychological - people have less energy to find and get rid of bugs when there are many of them - a phenomenon that the Pragmatic Programmers call the Broken Windows syndrome.”
From here: http://martinfowler.com/articles/continuousIntegration.html#BenefitsOfContinuousIntegration
I would use Hudson for the following reasons:
Ensuring that your projects build/compile properly.
Building jobs dependent on the build success of other jobs.
Ensuring that your code adheres to agreed-upon coding standards.
Running unit tests.
Notifying development team of any issues found.
If the number of projects steadily increases, you will find the need to be able to manage each one effectively, especially considering the above reasons for doing so.
In your situation, you can benefit from CI in (at least) these two ways:
You can let the CI server run certain larger test suites automatically after each subversion/... check-in. Especially those which test the interaction of different modules, hence the name continuous integration. This takes away the maintenance work and waiting time from the developers when they consider a check-in. Some CI (e.g. Hudson) also can be configured to automatically build modules when a depending module is build. This way you can let it automatically test if depending modules are compatible with the new version of the changed one.
You can let the CI server publish the new artifacts to the repository of a dependency resolver (e.g., Ivy, Maven). This way, the various modules can automatically download the latest (stable) revisions of the modules they depend on. Combine this point with the previous one and imagine the possibilities (!!!).

Resources