Pipeline Cache setup issues - caching

I may be approaching the new Cache pipeline task the wrong way, but I'm trying to improve the Build task. Currently, the build task takes on average 20 minutes to complete. I've been reading about the new Cache task, but it seems that would help with caching items like Nuget and npm libraries which for us is in a different task and only takes 1 minute.
Is there a cache setup that can help with the build itself?

Caching is especially useful in scenarios where the same dependencies are downloaded over and over at the start of each run. This is often a time consuming process involving hundreds or thousands of network calls.
Caching can be effective at improving build time provided the time to restore and save the cache is less than the time to produce the output again from scratch. Because of this, caching may not be effective in all scenarios and may actually have a negative impact on build time.
Please refer to this document for details.
Here are two other ways to reduce build time:
You can try to reduce build time in CI pipelines through parallel builds. Here are the reference: blog1,blog2.
In order to obtain faster performance, you can use private agent to run the build pipeline, as private agents cache everything between builds. If you do need to clean the repo (for example to avoid problems caused by residual files from a previous build), you can select Clean option in build definition.
.NET/NuGet:
If you use PackageReferences to manage NuGet dependencies directly within your project file and have packages.lock.json file(s), you can enable caching by setting the NUGET_PACKAGES environment variable to a path under $(Pipeline.Workspace) and caching this directory.

Related

How to set up GitLab CI to run multiple build steps efficiently while indicating at which step it is?

I am pretty new to GitLab. I've set up pipelines and stages via .gitlab-ci.yml and they seem to work but I've just discovered that some of my assumptions were wrong.
I have a large, multi-project Gradle setup, producing many artifacts. We are in the process of setting up GitLab and I really wanted to make use of the GitLab UI to show the progress of the build. The idea was to nicely indicate to developers and reviewers how far the build got before it failed, something like:
Got its dependencies
Compiled main code, YAY!
Compiled test code, yippie!
Passed unit tests, we rock!
Passed integration tests, awesome!
Passed various static code analysis tests. We're almost good to go!
Generated documentation - can we ship it?
I've set up each of these as individual jobs of their individual stages, (incorrectly) assuming that Gradle will be able to do its incremental build magic and that this will be almost as quick as running it as a single step.
Then I noticed that each stage causes what seems to be a Docker container reinitialization. This also means that the Gradle daemon has to restart and has no knowledge of the past. It has to get all the dependencies. I think I could cache these, but it seems that they would be separately cached for each job. Finally, these some jobs end up repeating what jobs before them already did because their output isn't available to them. My thinking that serialized jobs would execute inside the same container instance was proven wrong. Each of the subsequent jobs generally have to repeat what jobs before them already did, significantly increasing the build time.
I think I understand that I could declare artifacts of each job and make them available to dependent jobs that way, but that does not eliminate all of the overhead and adds some of its own - copying the artifacts out to "somewhere" and then back, while also hitting the limits of how much I can pass on. In fact, my unit test job is now failing and I can't see why because of the log size limit, but it seems it has to do only with artifacts (the report) as the unit test pass nicely when I run them outside GitLab.
I also think I understand that the idea behind jobs was to be able to run them in parallel on separate runners. That is a very fine feature and I probably can use them for later stages, but not for (1)-(5) as they heavily rely on a lot of output of at least some of the previous jobs.
I could merge (1)-(5) into a single job (and a single stage) for performance reasons, but then there is no indication in the UI (that I know of) as to how far the build got ... and the logs would be even longer and nastier to figure out even if the log limit got lifted.
Do any of you have any suggestions as to what am I missing / should do here?
After further research, I found that this is not possible (yet). Jobs are meant to be units of (potentially) concurrent execution and can only communicate by copying artifacts, obviously.
What I would be interested in is steps lesser than jobs that would be indicated in the UI and that could post their artifacts when they (steps) complete but before the entire job is done. This would eliminate 1-2 minutes of job startup overhead that I am facing now.

How to deal with a very long process in TeamCity?

We want to introduce new tests that will be driven from TeamCity. The build part itself is reasonably fast but we expect the processes that follow to take a very long time (hours to days). A different machine will produce results and the results will be analysed. And of course we want to see the results at the end in TeamCity.
While we fully expect the long runtimes, we don't want to keep our TC server occupied for hours or days while it's waiting for the final results.
We see several basic options:
estimate the runtime and run a follow-up test at a predetermined time period
keep checking at regular intervals
run another build manually when the initial run is complete
How do you deal with a situation like this? What kind of considerations need to be taken into account? Did you try something like this and did it work (or not)?
Your three listed options sound reasonable but you are missing important features of teamcity that are at your disposal.
An alternative:
Set up a build that will run ONLY the very long process. Lets call it the Long_Build
Have a second build which does the analysis depend on the success of the Long_Build. Lest's call this one the Analysis_Build
Setup an Agent capable of running the Long_Build and Teamcity will run the build only on that agent.
Have your QA build, or main build, or whatever build succeed IFF (if and only if) the Analysis_Build checks out. This build specifically is an information gathering build that will likely check all of your others tests and everything that your QA deployment depends on in order to call a changeset or set of changesets good enough for deployment.
My advice is to never run builds manually. Whatever criteria you would use to run a build manually can be scripted and run automatically. This way you are closer to a continuous integration process where you can guarantee quality.
Also, if you are not already doing it, make sure you label your source control with successful builds. Whether it is your Long_Build, Analysis_Build or QA build you should have a way to obtain successful builds that have passed a series of quality specs.

Specific cleanup interval for artifacts in TeamCity

I do have a project in TeamCity, that has a build configuration for the master release branch. This is compiled, every time a new version of our product is released.
In order to be able to pinpoint the introduction of errors, I do need a big retention time for some artifacts on this build configuration. As some other artifacts are rather big (full cd installation packages), my server's hard drive gets pretty full when simply upping the cleanup interval of this configuration.
Is it possible to configure two different cleanup intervals somehow? I would love to have a big retention time for the really important artifacts, while throwing the big ones away early.
I currently use TeamCity 9.0.3
Let's say for example, that my project has two artifacs:
smallupdatepack.zip (32 mb)
reallybigupdatecd.iso (700 mb)
I would like to configure TeamCity in a way that has the .iso kept for e.g. the last 10 builds, but the .zip is kept for the last 150 builds.
What I do not want, is a solution where all the .zip files are kept forever, while only the .iso files are deleted by an interval, which is all that seemed possible to me by using the build configuration's setting's artifact patterns alone.
You can specify custom cleanup rules for porjects/targets in Build History Clean-up page.
In your case, you can have a aggressive cleanup for all builds and a lenient cleanup for the Project/target for the master build
I have uploaded an example via an image below , if it helps
If you edit any of the settings, you can set individual period for artefacts. You can setup artefacts cleanup per target. However, for the same target you cannot setup different cleanup rules for multiple artefacts.
The answer by #Biswajit_86 looks like it's the only thing available for setting special clean up rules. I looked at it and it seems like the configuration specific settings should override the project settings and give you what you need, but maybe it doesn't work that way. Try it out and see if it works. If not, file a bug/suggestion with JetBrains.
The only other thing I could think of was to create a separate build configuration that only publishes the artifacts that you want to keep longer than your default rule. Give it a snapshot dependency on the configuration that creates the files and check the box to run on the same build agent. That way it doesn't need to rebuild them and can just publish what was already created. Set up a build trigger so that this new configuration runs whenever the other one finishes. Then set the clean up rules for this configuration to the longer retention setting.

Speed up Adobe CQ5 Maven build

I need solution to speed up maven build process.
The project is based on Adobe cq5 otherwise AEM and i have more than 10 modules in my project where the build happens in linear way.
Currently Build process takes more than 10 min to compile.
is any other specific tool available or other way to speed up the process?
Thanks
I have one suggestion,If you have 10 modules than make separate profile for each module,and build only that part in which you are changing and modifying,no need to deploy 10 modules each time unnecessarily.It will not speed up maven project build but can help you to save your time.It is a workaround but will be helpful .
Try mvn -T 4 clean install #Builds with 4 threads
Its a multi-threaded mode to run Maven and is faster. Apache documentation here.
To add to the other answers:
1) Give more memory resources to the server where the AEM instance is installed, content creation involves a lot of disk access so using SSD is a must.
2) Having a clean AEM instance helps to speed up the process. As you may know the AEM repository grows because of revisions so each time you deploy the repository size grows and it becomes slower. if it's a production environment use maintenance tools like revision clean up and compaction.
revision clean up
how to maintain repository
As per my knowledge there are no such mechanism to speed up.
better to make a build of particular module, it will deploy faster rather waiting for all 10 module to be happened.
Thanks
You can try using the suggested answers to build modules in parallel. It should speed up the build in theory.
But really there is no magic answer. You have to find the bottleneck in your build, it could be the number of dependencies, it could be a specific slow plugin, it could be hardware related, and it could be something else.
Try this: https://github.com/lptr/maven-time-tracker
It may help you find the bottleneck.
I would like to answer this question, knowing it was posted a really long time back.
Currently I am using AEM 6.3 and to recompile and deploy CORE module changes there is a simple maven trick -
This command tells us to -
run 1 thread/core
compile just the core module of the list of projects
and of course, zip package and send it to running AEM instance.
In my observation, this reduced the turnaround time by a huge margin.

What is the best way to setup an integration testing server?

Setting up an integration server, I’m in doubt about the best approach regarding using multiple tasks to complete the build. Is the best way to set all in just one big-job or make small dependent ones?
You definitely want to break up the tasks. Here is a nice example of CruiseControl.NET configuration that has different targets (tasks) for each step. It also uses a common.build file which can be shared among projects with little customization.
http://code.google.com/p/dot-net-reference-app/source/browse/#svn/trunk
I use TeamCity with an nant build script. TeamCity makes it easy to setup the CI server part, and nant build script makes it easy to do a number of tasks as far as report generation is concerned.
Here is an article I wrote about using CI with CruiseControl.NET, it has a nant build script in the comments that can be re-used across projects:
Continuous Integration with CruiseControl
The approach I favour is the following setup (Actually assuming you are in a .NET project):
CruiseControl.NET.
NANT tasks for each individual step. Nant.Contrib for alternative CC templates.
NUnit to run unit tests.
NCover to perform code coverage.
FXCop for static analysis reports.
Subversion for source control.
CCTray or similar on all dev boxes to get notification of builds and failures etc.
On many projects you find that there are different levels of tests and activities which take place when someone does a checkin. Sometimes these can increase in time to the point where it can be a long time after a build before a dev can see if they have broken the build with a checkin.
What I do in these cases is create three builds (or maybe two):
A CI build is triggered by checkin and does a clean SVN Get, Build and runs lightweight tests. Ideally you can keep this down to minutes or less.
A more comprehensive build which could be hourly (if changes) which does the same as the CI but runs more comprehensive and time consuming tests.
An overnight build which does everything and also runs code coverage and static analysis of the assemblies and runs any deployment steps to build daily MSI packages etc.
The key thing about any CI system is that it needs to be organic and constantly being tweaked. There are some great extensions to CruiseControl.NET which log and chart build timings etc for the steps and let you do historical analysis and so allow you to continously tweak the builds to keep them snappy. It's something that managers find hard to accept that a build box will probably keep you busy for a fifth of your working time just to stop it grinding to a halt.
We use buildbot, with the build broken down into discrete steps. There is a balance to be found between having build steps be broken down with enough granularity and being a complete unit.
For example at my current position, we build the sub-pieces for each of our platforms (Mac, Linux, Windows) on their respective platforms. We then have a single step (with a few sub steps) that compiles them into the final version that will end up in the final distributions.
If something goes wrong in any of those steps it is pretty easy to diagnose.
My advice is to write the steps out on a whiteboard in as vague terms as you can and then base your steps on that. In my case that would be:
Build Plugin Pieces
Compile for Mac
Compile for PC
Compile for Linux
Make final Plugins
Run Plugin tests
Build intermediate IDE (We have to bootstrap building)
Build final IDE
Run IDE tests
I would definitely break down the jobs. Chances are you're likely to make changes in the builds, and it'll be easier to track down issues if you have smaller tasks instead of searching through one monolithic build.
You should be able to create one big job from the smaller pieces, anyways.
G'day,
As you're talking about integration testing my big (obvious) tip would be to make the test server built and configured as close as possible to the deployment environment as possible.
</thebloodyobvious> (-:
cheers,
Rob
Break your tasks up into discrete goal/operations, then use a higher-level script to tie them all together appropriately.
This makes your build process easier to understand for other people (you're documenting as you go so anyone on your team can pick it up, right?), as well as increasing the potential for re-use. It's likely you won't reuse the high-level scripts (although this could be possible if you have similar projects), but you can definitely reuse (even if it's copy/paste) the discrete operations rather easily.
Consider the example of getting the latest source from your repository. You'll want to group the tasks/operations for retrieving the code with some logging statements and reference the appropriate account information. This is the sort of thing that's very easy to reuse from one project to the next.
For my team's environment, we use NAnt since it provides a common scripting environment between dev machines (where we write/debug the scripts) and the CI server (since we just execute the same scripts in a clean environment). We use Jenkins to manage our builds, but at their core each project is just calling into the same NAnt scripts and then we manipulate the results (ie, archive the build output, flag failing tests etc).

Resources