I've read about the disadvantages (especially this one) of using SVN to store build artificats (large binary files). Hudson was suggested as an alternative.
How does Hudson handle these files?
Edit: My project is not Java-based.
Hudson can create/keep an archive of build artifacts, and provides a nice browser view for inspecting them.
You need to enable Archive the Artifacts in the job definition.
Hudson is basically using flat file storage. You can find those files within Hudson in the jobs/builds/ folders. I'm not sure I'd say, "Use Hudson as an alternative to checking in file to source control" but using something as an alternative is a decent idea if it provides:
authoritative place to store
versioned binaries access control
checksums for tamper resistance
release meta-data (environment information; approval level)
retention periods
I'm not sure how well Hudson scores on those marks, but I think it does at least some of that. SVN is non-terrible as a solution there as well, but really struggles with retention periods (old builds tend to eat disk space like crazy) and isn't terribly well optimized for large binaries - most SCM systems are optimized for smallish text files.
I stole the list above from this presentation: http://www.anthillpro.com/html/resources/webinars/Role_of_Binary_repositories_in_Software_Configuration_Management.html (registration required)
We use Jenkins for our builds, but we also store the artifacts from the builds. Like Eric said above, Hudson/Jenkins store artficats using flat file storage. It is organized based on the build.
Some things I have noticed from use (in reponse to Eric's questions about an alternative to souce control for binaries):
Each build stores it's own artifact, so you do have a versioning of sorts.
You can use the fingerprinting option when archiving. This will allow you to differentiate between versions and also check for corruption.
Retention periods are completely up to you. We keep artifacts forever.
FYI, our projects are not Java either (they are C/C++) and our artifacts are tar.gz/zip files and documents.
It may or may not be the best way to store binaries, but it is definitely decent as long as you have regular backups (weekly in our case) and your disk is fault tolerant.
Related
I am looking for a tool to manage the collection of binary files (input components) that make up a software release. This is a software product and we have released multiple versions each year for the last 20 years. The details and types of files may vary, but this is something many software teams need to manage.
What's a Software Release made of?
A mixture of files go into our software releases, including:
Windows executables/binaries (40 DLLs and 30+ EXE files).
Scripts used by the installer to create a database
API assemblies for various platforms (.NET, ActiveX, and Java)
Documentation files (HTML, PDF, CHM)
Source code for example applications
The full collected files for a single version of the release are about 90MB. Most are built from source code, but some are 3rd party.
Manual Process
Long ago we managed this manually.
When starting each new release the files used to build the last release would be copied to a new folder on a shared drive.
The developers would manually add or update files in this folder (hoping nothing was lost or deleted accidentally).
The software installer script would be compiled using the files in this folder to produce a SETUP.EXE (output).
Iterate steps 2 and 3 during validation & testing until release.
Automatic Process
Some years ago we adopted CI (building our binaries nightly or on-demand).
We resorted to putting 3rd party binaries under version control since they usually don't change as often.
Then we automated the process of collecting & updating files for a release based on the CI build outputs. Finally we were able to automate the construction of our SETUP.EXE.
Remaining Gaps
Great so far, but this leaves us with two problems:
Rebuilding Assemblies The CI mostly builds projects when something has changed, but when forced it will re-compile a binary that doesn't have any code change. The output is a fresh build of a binary we've previously tested (hint: should we always trust these are equivalent?).
Latest vs Stable Mostly our CI machine builds the latest versions of each project. In some cases this is ok, but often we want to release an older tested or stable version. To do this we have separate CI projects for the latest and stable builds - this works but is clumsy.
Thanks for your patience if you've got this far :-)
I Still Haven't Found What I'm Looking For
After some time searching for solutions it seems it might be easier to build our own solution, but surely someone else has solved these problems before!?
What we want is a way to store and manage binary files (either outputs from CI, or 3rd party files) such that each is tagged with a version (v1.2.3.4) that allows:
The CI to publish new versions of each binary (but reject rebuilt versions that already exist).
The development team to make a recipe for a software release (kinda like NuGet packages.config) that specifies components to include:
package name
version
path/destination in the release folder
The Automatic package script to use the recipe collect the required files, and compile the install package (e.g. SETUP.EXE).
I am aware of past debates about storing binaries in a VCS. For now I am looking for a better solution. That approach does not appear ideal for long-term ongoing use (e.g. how to prune old binaries)... amongst other issues.
I have tried some artifact repositories currently available. From my investigation these provide a solution for component/artifact storage and version control. However they do not provide tools for managing a list of components/artifacts to include in a software release.
Does anybody out there know of tools for this?
Have you found a way to get your CI infrastructure to address these remaining issues?
If you're using an artifact repository to solve this problem, how do you manage and automate the process?
This is a very broad topic, but it sounds like you want a release management tool (e.g. BuildMaster, developed by my company Inedo), possibly in conjunction with a package management server like ProGet (which you tagged, and is how I discovered this question).
To address some specific questions you have, I'll associate it with a feature that would solve the problem:
A mixture of files go into our software releases, including...
This is handled in BuildMaster with artifacts. This video gives a basic overview of how they are manually added to releases and deployed to a file system: https://inedo.com/support/tutorials/buildmaster/deployments/deploying-a-simple-web-app-to-iis
Of course, once that works to satisfaction, you can automate the import of artifacts from your existing CI tool, create them from a BuildMaster deployment plan itself, pull them from your package server, whatever. Down the line you can also have your CI tool call the BuildMaster release management API to create a release and automatically have it include all the artifacts and components you want (this is what most of our customers do now, i.e. have a build step in TeamCity create a release from a template).
Rebuilding Assemblies ... The output is a fresh build of a binary we've previously tested (hint: should we always trust these are equivalent?)
You can mostly assume they are equivalent functionally, but it's only the times that they are not that problems arise. This is especially true with package managers that do not lock dependencies to specific version numbers (i.e. NuGet, npm). You should be releasing exactly the same binary that was tested in previous environments.
[we want] the development team to make a recipe for a software release (kinda like NuGet packages.config) that specifies components to include:
This is handled with releases. A developer can choose its name, dates, etc., and associate it with a pipeline (i.e. a set of testing stages that the artifacts are deployed to), then can "click the deploy button" and have the automation do all the work.
Releases are grouped by "application", similar to a project in TeamCity. As a more advanced use case, you can use deployables. Deployables are essentially individual components of an application you include in a release; in your case the "Documentation" could be a deployable, and maybe contain an artifact of the .pdf and .docx files. Deployables from other applications (maybe a different team is responsible for them, or whatever) can then be referenced and "included" in a release, or you can reference ones from a past release.
Hopefully that provides some overview and fits your needs. Getting into this space is a bit overwhelming because there are so many terms, technologies, and methodologies, but my advice is to start simple and then slowly build upon it, e.g.:
deploy a single, manually uploaded component through BuildMaster to a share drive, then manually deploy it from there
add a deployment plan that imports the component
add a second plan and associate it with the 2nd stage that takes the uploaded artifact and deploys it to the target, bypassing the need for the share drive
add more deployment plans and associate them with pipeline stages and promote through them all to "close out" a release
add an agent and deploy to that instead of the default localhost server
add more components and segregate their deployment with deployables
add event listeners to email team members at points in the process
start adding approvals if you require gated "sign-offs"
and so on.
Our project currently have more than 500 SNAPSHOT jars to be downloaded daily. This is becoming a burden for development especially when Artifactory is in place remotely, hence the connection is not ideal.
I would like to know the best way to reduce redundant SNAPSHOTs jars that are deployed to Artifactory. We have a large multi-module project that is deploying almost 200 newly SNAPSHOT jars although there is actually only 1 module that contains changes.
I found a similar question raised in the forum but no definite answer there too. Incremental build is not viable for us due to this issue.
Any suggestion is appreciated!
The "Managing Disk Space Usage" page in the Artifactory wiki describes various methods for cleaning up old snapshots:
Limiting the Number of Snapshots - you can specify the maximum number of snapshots that may be stored. To specify the maximum number of snapshots that may be stored, in the Edit Repository dialog, select the Basic Settings tab. You first need to check the Handle Snapshots checkbox which then enables you to set the Max Unique Snapshots field. This value is zero by default, which means that all snapshots are saved.
Using a user plugin for custom cleanup logic - you may write scripts to implement virtually any custom cleanup logic. This provides you with an extensive and flexible set of customization capabilities. See examples of such scripts on Github
If only one module changes, then why do you rebuild the entire project? Or in other words, your often changing module appears to have a different lifecycle than the rest of your project, so you should consider moving it to a separate project with its own release cycle.
I have several projects which use code from a large set of component libraries. These libraries are under source control.
The libraries repository contains all the libraries used by all my projects and contains multiple versions of multiple libraries. Each library/version pair lives in its own folder. Each of my projects identifies the specific library/version pairs it needs through the folder paths of the references in its project file.
For example $(LibraryPath)\SomeLibrary\v1.1.5
Please note that the libraries repository is only ever added to. No changes are made to stuff already in the repository. Ever.
I have been of course been able to configure my build plan to pull the libraries repository to a libraries subfolder of the working directory. So far so good. However, using the auto branch management feature of Bamboo, this setup means that the libraries repository is cloned for each and every branch in all projects.
Not funny. No, really, not funny...
What I would like to do is:
pull the libraries repository in each build plan
but pull it to a fixed location that is the same for all build plans
it doesn't have to be an absolute path
but it does need to be outside the working directory of the current build plan to avoid unnecessary duplication
Unfortunately the Checkout Directory of the Source Code Checkout configuration task in a Bamboo build plan doesn't allow me to specify either an absolute path or a relative one that goes "up" for one or more levels from the working dir. The hint text explicitly states "(Optional) Specify an alternative sub-directory to which the code will be checked out." And indeed, specifying something like ..\Library gets punished with the message "Checkout to parent directory is forbidden".
I have seen information on the "artifact sharing" feature of Bamboo. This will probably work, but it seems like overkill for what I want to achieve.
What would be the easiest and least complicated way to achieve my goal using Atlassian's Bamboo Continuous Integration?
Out-of-the-box alternatives are welcome, but please don't direct me to any products that require intimate CLI use and/or whose documentation assumes (extensive) knowledge of 'nix and/or Java setup. I am on Windows and spoiled rotten by powerful (G)UI's.
I have the same problem - with a repository weighing in at around 2GB.
I'd like to simply "git checkout myBranch" and "git clean -fxd" instead of cloning every time (which should save a lot of time and disk space). However I also like Bamboo's automatic trigger with new branches showing up.
Like the OP, I'd love to be able to put "..\SharedDirectory" in the "CheckoutDirectory" for the
"Source Code Checkout" task but it won't let me go out above the \JOB_KEY\ folder
One possible solution is: replacing the "Source Code Checkout" task with the two git commands above. That way I can specify exact when/where/how to do the checkout. I think there may be problems with the initial checkout in this case - but once that is solved, all subsequent branches would use the same shared folder, and no more pulling down 2GB every time.
Imagine for a cloud based solution, a good portion of the deployed code is developed internally. My question is what is the point of using an Artifact Repository for internal code where you could always build whatever version directly from the source code?
In other words, doesn't it make more sense to spend the time on the build server to facilitate ease pf building desired artifact versions from the code vs adding an Artifact Repository like Nexus to feed build artifacts to deployments?
In theory yes, if you can be certain
everything that went into an artifact is checked in such as sources, data files
the exact environment (OS, compiler, linker, tools) used to built your artifact can be restored perfectly (snapshot of virtual machine)
nothing was forgotten
EDIT
In practice, as Mark O'Conner notes, even then two builds will normaly not be identical because they typically include timestamps and checksums depending on the former. You would have to somehow manually fix those during the build or somehow exactly reproduce time and timing on your build computer.
Otherwise you might face the situation that you can not (exactly) rebuild a certain Artifact. I prefer to have everything published to be stored in safe place.
The Continuous Delivery book calls the practice of building a binary more than once an antipattern:
This antipattern violates two important principles. The first is to
keep the de-ployment pipeline efficient, so the team gets feedback as
soon as possible. Recompiling violates this principle because it
takes time, especially in large systems. The second principle is to
always build upon foundations known to be sound. The binaries that get
deployed into production should be exactly the same as those that went
through the acceptance test process—and indeed in many pipeline
im-plementations, this is checked by storing hashes of the binaries at
the time they are created and verifying that the binary is identical
at every subsequent stage in the process.
Binary equality checking via hash may also be important for auditing purposes in highly regulated domains.
I have recently been charged with building out our "software infrastructure" and so I am putting together a continuous integration server.
After a build completes would it be considered bad form for the CI system to check in some of the artifacts it creates into a tag so that it can be fetched easily later (or if the build breaks you can more easily recreate the problem.)
For the record we use SVN and BuildMaster (free edition) here.
This is more of a best practices question rather than a how-to question. (It is pretty easy to do with BuildMaster)
Seth
If you believe this approach would be beneficial to you, go ahead and do it. As long as you maintain a clear trace of what source code was used to build each artifact, you'll be fine.
You should keep this artifact repository separated from the source code repository.
It is however a little odd to use a source code repository for this - these are typically used for things that will change, something your artifacts most definitely should not.
Source code repositories are also often used in a context where you want to check out "everything", for example the entire trunk. With artifacts you are typically looking for a specific version, and checking out all of the would only be done if exporting them to some other medium.
There are several artifact repositories specialized for this, for example Artifactory or Apache Archiva, but a properly backed up file server will thought-through access settings might be a simple and good-enough solution.
I would say it's a smell to check in binaries as a tag. Your build artifacts should be associated with a particular build version in your build system, and that build should be associated with a particular checkin. You should be able to recreate the exact source code from that information. If what you're looking for is a one-stop-function to open the precise source-code revision that generated the broken build, I'd suggest that you invest some time into building a Powershell module that will do that for you.
Something with a signature like:
OpenBuild -projectName "some project name" -buildNumber "some build number"