File server with incremental patches - caching

I'm deploying my build remotely to a file server and clients are downloading it. My build is mostly a loose collection of binary files with some text files. In total the build is around 1 GiB.
However, most of the time I'm making small changes to the executable or small binary data changes so the delta between builds is small.
I'd like to be able to push a build and anyone that downloads the build only downloads the delta. If a new user downloads it for the first time they would get the full build and not the delta. It would also be nice for users to pick a build from the past and grab that build.
I was thinking something like git would work because it has all the requirements that I listed but git requires users to download the entire history of the repository.
I could write something like this myself that has delta patching and compression but I imagined someone has written this before. Does anyone have any recommendations that meet my requirements?

Related

Can you host a bitbucket pipeline internally?

We are currently using bitbucket cloud to host our grails-app repository. We want to set up some pipelines to do things like run unit tests and make sure the app compiles before being able to merge a branch to master.
I know this can pretty easily be done by letting them host the pipeline and committing a well written pipe file, however there is a problem standing that our app is very large, and even on brand new macbook pros takes 20 minutes to compile, on some older ones it can take 2 hours or more. Grails, thankfully, only compiles files that have changes in them from the last compilation. However, this can't be used on a bitbucket pipe that's working off a fresh pull of the app every time it runs.
My solution to this was wanting to set up a pipeline to run for us internally so that it can already have the app pulled, and just switch to the desired branch and run from there. This still might take time if switching between 2 very diverged branches, but it's better than compiling from fresh every time.
I can't seem to find any documentation on hosting a pipeline internally with bitbucket cloud, does anyone know if this is possible, and if so where there is documentation for it?
It would also be acceptable to find a solution to the long compilation problem itself with bitbucket hosted pipelines.
A few weeks ago, self hosted runners was made available as a public beta. Here are the details: https://community.atlassian.com/t5/Bitbucket-Pipelines-articles/Bitbucket-Pipelines-Runners-is-now-in-open-beta/ba-p/1691022
Additionally, if you're looking to retain some of your files from one build to the next to save doing the same work over and over again, have a look at caches: https://support.atlassian.com/bitbucket-cloud/docs/cache-dependencies/ there are some built ones that you could use, but you can define your own custom ones as well. Essentially it's just a way of preserving the contents of a directory for a future build.

How can I collect the output from CI?

I don't know how to collect the data from each build machine on CI. (I use TeamCity for CI and this is the first time to use CI by myself.)
After building code and running the .exe file, an output file is generated. It is a .csv file and its size is less than 1KB and very simple. I want to collect the data to one place and do some statistics.
The build and running .exe file is working fine. However, I don't know the next step. I have two ideas.
(Idea 1) Set-up a log database server (e.g. kibana-elastic search) and send the output to it. However, it seems an overkilling solution.
(Idea 2) Create a batch file and just copy the log to somewhere.
However, I don't know what is a usual way to use CI and collect the data. I guess there will be a better solution. Is there any way to collect the data by using CI?
I can suggest using build artifacts: you can configure your builds so that they will produce and make some files available for the users of Teamcity. Then you can download them and analyze as you need. Taking into account that files are pretty small, I think it's an ideal variant.
If you need to collect all artifacts from every build, you can configure another build, which would run some python script, which in turn would utilize Teamcity REST API to collect all artifacts from specific build and zip and produce complete set of your files.
As an example you can check some build at JetBrains test server: just select finished build and navigate to Artifacts tab.
Please ask more questions if my answer is not clear enough.

In Bamboo, how do I pull a component library repository to a fixed location to avoid per-branch duplication?

I have several projects which use code from a large set of component libraries. These libraries are under source control.
The libraries repository contains all the libraries used by all my projects and contains multiple versions of multiple libraries. Each library/version pair lives in its own folder. Each of my projects identifies the specific library/version pairs it needs through the folder paths of the references in its project file.
For example $(LibraryPath)\SomeLibrary\v1.1.5
Please note that the libraries repository is only ever added to. No changes are made to stuff already in the repository. Ever.
I have been of course been able to configure my build plan to pull the libraries repository to a libraries subfolder of the working directory. So far so good. However, using the auto branch management feature of Bamboo, this setup means that the libraries repository is cloned for each and every branch in all projects.
Not funny. No, really, not funny...
What I would like to do is:
pull the libraries repository in each build plan
but pull it to a fixed location that is the same for all build plans
it doesn't have to be an absolute path
but it does need to be outside the working directory of the current build plan to avoid unnecessary duplication
Unfortunately the Checkout Directory of the Source Code Checkout configuration task in a Bamboo build plan doesn't allow me to specify either an absolute path or a relative one that goes "up" for one or more levels from the working dir. The hint text explicitly states "(Optional) Specify an alternative sub-directory to which the code will be checked out." And indeed, specifying something like ..\Library gets punished with the message "Checkout to parent directory is forbidden".
I have seen information on the "artifact sharing" feature of Bamboo. This will probably work, but it seems like overkill for what I want to achieve.
What would be the easiest and least complicated way to achieve my goal using Atlassian's Bamboo Continuous Integration?
Out-of-the-box alternatives are welcome, but please don't direct me to any products that require intimate CLI use and/or whose documentation assumes (extensive) knowledge of 'nix and/or Java setup. I am on Windows and spoiled rotten by powerful (G)UI's.
I have the same problem - with a repository weighing in at around 2GB.
I'd like to simply "git checkout myBranch" and "git clean -fxd" instead of cloning every time (which should save a lot of time and disk space). However I also like Bamboo's automatic trigger with new branches showing up.
Like the OP, I'd love to be able to put "..\SharedDirectory" in the "CheckoutDirectory" for the
"Source Code Checkout" task but it won't let me go out above the \JOB_KEY\ folder
One possible solution is: replacing the "Source Code Checkout" task with the two git commands above. That way I can specify exact when/where/how to do the checkout. I think there may be problems with the initial checkout in this case - but once that is solved, all subsequent branches would use the same shared folder, and no more pulling down 2GB every time.

Best practice for placement of large test datasets?

I'm dealing with an enormous amount of data (say, video) and most integration tests require at least a decent subset of this data.
These test files (subsets) can range from 200MB to 2GB.
Where would be a good place to put these files? Ideally they would not go directly into our version control system because people shouldn't have to download 5GB+ of test data every time they want to check out the project.
The test data needs to be updated by Jenkins whenever a schema change occurs (we already have this part figured out), so either maven or svn would need to download the latest version if anybody wanted to run the integration tests.
It would be great if it could be on-demand since we never run all the tests at once locally (e.g., if we are running TestX, then download the files required for this test before running).
Does anybody have any suggestion(s) on how to approach this?
Edit -- For the sake of simplicity let's say that the test files are incompressible.
In this case I would setup a file server share, that contains all the test data in a nicely organized way. Then let your test download the necessary test data itself. The advantage is that you can update the test data in the central place without updating the tests themselves. The next time the tests run, the new testdata will be downloaded.
If you need versioning, you would could use a repository manager like Nexus instead of a simple filesystem. If you need audit-ability, I would suggest a repository manager like subversion. However, make sure that you use a separate repo just for your testdata, so you can easily clean out the repo by replacing it with an empty repo that gets only the newest testdata loaded.

Maintaining upstream vendor source with Xcode and SVN

Question: What is the best way to maintain a project based on another OSS project, through Xcode and version managed by SVN?
I'd like to start a fork (?) of a reasonably popular open source project (it's allowed). Mostly, I want to build my own user interface written in Cocoa/ObjC for it and throw in a few custom features of my own as well.
Now, this OSS project isn't exactly small. The project itself has over 3000 files, and the build process is pretty intense- consisting of multiple stages and steps, which need to compile build tools, run those, then compile the results.
All this is fine and dandy in Xcode, since it's easy enough to setup build phases and rules to handle everything.
What I'm not clear on, is how best to manage patches from upstream. They are constantly working on the project and I'd like to be able to keep up to date with those patches as easily as possible, as many of the diff files effect sometimes up to a hundred (!) files at once.
So maintaining a pristine unmodified copy of that source tree so I can apply patches to it seems like a smart thing to do, because I really don't want to be sorting through hundreds of files every few weeks merging patches by hand.
What I'm thinking of doing in this regard is:
1) Setup an "upstream" SVN repo to hold a copy of the upstream source, plus the bare minimum required to compile it in Xcode (so an xcproject, a few xcconfigs, some prefix header files and that's it)
2) Setup my own "downstream" SVN repo where I do all my work and apply my own modifications.
Whenever upstream releases a patch, I can apply it to #1 then synchronize across to #2, and deal with any issues created by my own modifications.
What I'm not clear about, is if this is a sane way of handling things- or if there's some better practice I should be following.
Is this the best way to handle things, or should I be looking at doing this some other way?
In SVN-world it was named "Vendor Branches" long time ago and intensively used by many teams (you can additionally google this phrase)
Technically it's
one SVN repo
at least one special branch (special in terms of usage, nothing more), which, with svn:externals, linked to 3-rd party repo of upstream code
your place for changes (trunk or any other place, I prefer trunk), initially created as copy of vanilla code and there you perform all code-hacks
If (or "when") vendor branch got updates from upstream, you have just merge branch to /your place/, integrate changes and continue to work

Resources