Reducing redundant SNAPSHOTs - maven

Our project currently have more than 500 SNAPSHOT jars to be downloaded daily. This is becoming a burden for development especially when Artifactory is in place remotely, hence the connection is not ideal.
I would like to know the best way to reduce redundant SNAPSHOTs jars that are deployed to Artifactory. We have a large multi-module project that is deploying almost 200 newly SNAPSHOT jars although there is actually only 1 module that contains changes.
I found a similar question raised in the forum but no definite answer there too. Incremental build is not viable for us due to this issue.
Any suggestion is appreciated!

The "Managing Disk Space Usage" page in the Artifactory wiki describes various methods for cleaning up old snapshots:
Limiting the Number of Snapshots - you can specify the maximum number of snapshots that may be stored. To specify the maximum number of snapshots that may be stored, in the Edit Repository dialog, select the Basic Settings tab. You first need to check the Handle Snapshots checkbox which then enables you to set the Max Unique Snapshots field. This value is zero by default, which means that all snapshots are saved.
Using a user plugin for custom cleanup logic - you may write scripts to implement virtually any custom cleanup logic. This provides you with an extensive and flexible set of customization capabilities. See examples of such scripts on Github

If only one module changes, then why do you rebuild the entire project? Or in other words, your often changing module appears to have a different lifecycle than the rest of your project, so you should consider moving it to a separate project with its own release cycle.

Related

What are the consequences of always using Maven Snapshots?

I work with a small team that manages a large number of very small applications (~100 Portlets). Each portlet has its own git repository. During some code I was reviewing today, someone made a small edit, and then updated their pom.xml version from 1.88-SNAPSHOT to 1.89-SNAPSHOT. I added a comment asking if this is the best way to do releases, but I don't really know the negative consequences of doing this.
Why not do this? I know snapshots are not supposed to be releases, but why not? What are the consequences of using only snapshots? I know maven will not cache snapshots the same as non-snapshots, and so it may download the artifact every time, but let's pretend the caching doesn't matter. From a release-management perspective, why is using a SNAPSHOT version every time and just bumping the number a bad idea?
UPDATE:
Each of these projects results in a war file that will never be available on a maven repo outside of our team, so there are no downstream users.
The main reason for not wanting to do this is that the whole Maven eco-system relies on a specific definition of what a snapshot version is. And this definition is not the one you're setting in your question: it is only supposed to represent a version currently in active development, and it is not suppose to be a stable version. The consequence is that a lot of the tools built around Maven assumes this definition by default:
The maven-release-plugin will not let you prepare a release with a snapshot version as released version. So you'll need to resort to tagging by hand on your version control, or make your own scripts. This also means that the users of those libraries won't be able to use this plugin with default configuration, they'll need to set allowTimestampedSnapshots.
The versions-maven-plugin which can be used to automatically update to the latest release version won't work properly as well, so your users won't be able to use it without configuration pain.
Repository managers, like Artifactory or Nexus, comes built-in with a clear distinction of repositories hosting snapshot dependencies and release dependencies. For example, if you use shared Nexus company-wide, it could be configured to purge old snapshots so this would break things for you... Imagine someone depends on 1.88-SNAPSHOT and it is completely removed: you'll have to go back in time and redeploy it, until the next removal... Also, certain Artifactory internal repositories can be configured not to accept any snapshots, so you won't be able to deploy it there; the users will be forced, again, to add more repository configuration to point at those that do allow snapshots, which they may not want to do.
Maven is about convention before configuration, meaning that all Maven projects should try to share the same semantics (directory layout, versioning...). New developers that would access your project will be confused and lose time trying to understand why your project is build the way it is.
In the end, doing this will just cause more pain on the users and will not simplify a single thing for you. Probably, you could make it somewhat work, but when something is going to break (because of company policy, or some other future change), don't act surprised...
Tunaki gave a lot of reasonable points why you break Maven best practices, and I fully support that view. But even if you don't care about "conventions of other companies", there are reasons:
If you are not doing CI (and consider every build as potential release), you need to distinguish between versions which should go productive and those who are just for testing. If everything is SNAPSHOT, this is hard to do.
If someone (accidentally) deploys a second 1.88-SNAPSHOT, it will be the new 1.88-SNAPSHOT, hiding the old one (which is available by a concrete timestamp, but this is messy). Release versions cannot be deployed twice.

Specific cleanup interval for artifacts in TeamCity

I do have a project in TeamCity, that has a build configuration for the master release branch. This is compiled, every time a new version of our product is released.
In order to be able to pinpoint the introduction of errors, I do need a big retention time for some artifacts on this build configuration. As some other artifacts are rather big (full cd installation packages), my server's hard drive gets pretty full when simply upping the cleanup interval of this configuration.
Is it possible to configure two different cleanup intervals somehow? I would love to have a big retention time for the really important artifacts, while throwing the big ones away early.
I currently use TeamCity 9.0.3
Let's say for example, that my project has two artifacs:
smallupdatepack.zip (32 mb)
reallybigupdatecd.iso (700 mb)
I would like to configure TeamCity in a way that has the .iso kept for e.g. the last 10 builds, but the .zip is kept for the last 150 builds.
What I do not want, is a solution where all the .zip files are kept forever, while only the .iso files are deleted by an interval, which is all that seemed possible to me by using the build configuration's setting's artifact patterns alone.
You can specify custom cleanup rules for porjects/targets in Build History Clean-up page.
In your case, you can have a aggressive cleanup for all builds and a lenient cleanup for the Project/target for the master build
I have uploaded an example via an image below , if it helps
If you edit any of the settings, you can set individual period for artefacts. You can setup artefacts cleanup per target. However, for the same target you cannot setup different cleanup rules for multiple artefacts.
The answer by #Biswajit_86 looks like it's the only thing available for setting special clean up rules. I looked at it and it seems like the configuration specific settings should override the project settings and give you what you need, but maybe it doesn't work that way. Try it out and see if it works. If not, file a bug/suggestion with JetBrains.
The only other thing I could think of was to create a separate build configuration that only publishes the artifacts that you want to keep longer than your default rule. Give it a snapshot dependency on the configuration that creates the files and check the box to run on the same build agent. That way it doesn't need to rebuild them and can just publish what was already created. Set up a build trigger so that this new configuration runs whenever the other one finishes. Then set the clean up rules for this configuration to the longer retention setting.

What backend does Jenkins (Hudson) use for archiving build artifacts?

I've read about the disadvantages (especially this one) of using SVN to store build artificats (large binary files). Hudson was suggested as an alternative.
How does Hudson handle these files?
Edit: My project is not Java-based.
Hudson can create/keep an archive of build artifacts, and provides a nice browser view for inspecting them.
You need to enable Archive the Artifacts in the job definition.
Hudson is basically using flat file storage. You can find those files within Hudson in the jobs/builds/ folders. I'm not sure I'd say, "Use Hudson as an alternative to checking in file to source control" but using something as an alternative is a decent idea if it provides:
authoritative place to store
versioned binaries access control
checksums for tamper resistance
release meta-data (environment information; approval level)
retention periods
I'm not sure how well Hudson scores on those marks, but I think it does at least some of that. SVN is non-terrible as a solution there as well, but really struggles with retention periods (old builds tend to eat disk space like crazy) and isn't terribly well optimized for large binaries - most SCM systems are optimized for smallish text files.
I stole the list above from this presentation: http://www.anthillpro.com/html/resources/webinars/Role_of_Binary_repositories_in_Software_Configuration_Management.html (registration required)
We use Jenkins for our builds, but we also store the artifacts from the builds. Like Eric said above, Hudson/Jenkins store artficats using flat file storage. It is organized based on the build.
Some things I have noticed from use (in reponse to Eric's questions about an alternative to souce control for binaries):
Each build stores it's own artifact, so you do have a versioning of sorts.
You can use the fingerprinting option when archiving. This will allow you to differentiate between versions and also check for corruption.
Retention periods are completely up to you. We keep artifacts forever.
FYI, our projects are not Java either (they are C/C++) and our artifacts are tar.gz/zip files and documents.
It may or may not be the best way to store binaries, but it is definitely decent as long as you have regular backups (weekly in our case) and your disk is fault tolerant.

Jazz SCM Continuous Integration - build stream vs. workspace?

I am in the process of setting up a continuous integration build for a Spring Roo application using the Rational Team Concert (RTC) IDE and Jazz build engine. When setting up the build definition, the Build Workspace field on the Jazz Source Control tab allows the selection of either a user's repository workspace or a stream.
The RTC Continuous Integration Best Practices and other Jazz build resources consistently refer to using a dedicated repository workspace associated with a build user, leading me to believe that this is the preferred approach. I have not been able to find any information on building from the stream directly. Our project's stream contains all of the artifacts required to build, and I have tested and confirmed that the continuous integration build works from the stream. I am unable to think of any reason why I would need to create and manage a specific workspace for this purpose.
My question is, am I playing with fire by building directly off of the stream? Are there potential downstream complications with this approach that I am not aware of?
Answering my own question in case another SO user has the same question in the future.
After some experimentation, I discovered that a drawback to building directly from the stream was that it ignores the "Build only if there are changes accepted" property on the Jazz Source Control tab. As a result, builds from a stream may only be done at predefined intervals - it is not possible to configure the build to only happen when new changes have been committed to the stream.
A dedicated workspace is required for the build to accept new changes from the stream and use them to trigger a build request.
There is another BIG difference here. It has to do with HOW the build gets done. Let me highlight the difference here.
If you build from a dedicated build repository workspace, then your build workspace already has a copy of all of the code. When your changes are delivered, and the build is kicked off, then only the changed files (your change set) need to be updated and physically copied from the repository to the build repository workspace. Since most changes are small, this involves the copying of anywhere from 0.1% to 2% of your codebase from the repository.
If you build from "the stream", then your build workspace needs to be created (you have to compile somewhere!). So when this is created, your ENTIRE codebase needs to be updated and physically copied from the repository to the build repository workspace. This means retrieving 100% of your codebase from the repository.
Each file operation involves a call to discover the needed resource, fetching this resource from the database hosting the repository, and then having the Jazz application provide this source file over the network. It results in a load on the database server, the web server, and the application server. The more you download like this, the more of a load that you put on these components.
There are some things you can use to minimize this load on the Jazz infrastructure. Using content caching proxies (using a simple Squid proxy server) can help.
For more detail on your options here, and the relative merits of those options, go and read my blog post and whitepaper on Jazz Performance concerns (http://dtoczala.wordpress.com/2013/02/11/jazz-performance-a-guide-to-better-performance/). That article is almost a year old now, but still remains valid. You can also look at the Jazz Deployment Wiki (https://jazz.net/wiki/bin/view/Deployment/WebHome), and check out the sections on performance troubleshooting and performance concerns.

How to take backup of StarTeam project

I have a project repositiory on Start Team server.
I need to take regular back up the same.
How can I achieve this?
The Star team backup steps are given in the Appendix C of the “The StarTeam Administrator’s Guide.pdf”
It depends on what you mean by backing up the Project. If you mean backing up the entire repository then StarTeam makes this really easy. You just need a snapshot of the DB and a full copy of the repository files (full steps are documented.) However, if you mean backing up a specific Project in the repository, and ONLY that Project, with all history intace, then this is not currently possible--or at least it is a major challenge.
StarTeam used to have an ability to Import/Export projects but they discontinued support and development of that tool years ago. If you wish to back up a single Project independent of the rest of the server, then this is still possible, and useful in the case where you want to split the repository into a separately managed repository. Here is how to do that:
Create a duplicate repository including all of the repository files.
Delete everything from the clone except for the Project(s) that you want to split off -- note that in StarTeam 2011 the Project Delete was broken, so you may need to do this in a direct SQL query which marks the projects/views as deleted. Contact Support if you run into problems deleting manually, especially if you have a large repository.
Once your clone has been pruned of unnecessary projects, run the Online Purge tool until all projects and respective files have been removed from the DB and the Vault.
You can now change what you need to change on the new repository, such as the users, groups, security, etc. without affecting the first repository.
Once you have validated the new repository is working properly, you can then run a similar process on the first repository to get rid of the projects that were split off.
Another potential use for this is if you had reached end of life for a project and you wanted to keep it offline and backed up but wanted it to be restorable with full history on demand (for regulatory purposes, etc.) while being freed up to remove it from the active repository so you can make other projects run faster. Though this is probably best done in batches of projects as the process is currently quite labor intensive to perform.

Resources