What are some good patterns for using go in concourse-ci tasks. For example, should I build files locally with all dependencies and check-in cross-compiled binaries to the repo? Should I build on concourse prior to running the task?
Examples of what people do here would be great. Public repos of pipelines/tasks even better.
The way I see it there are currently 3 options for handling go builds:
Use vendoring
Explicitly declare the dependencies as concourse resources
Maintain a docker image with the required dependencies already included
All options have pros and cons. The first option is currently my favorite since the responsibility for handling dependencies is up to the project maintainers and there is a very clear way to see which versions/revisions of the dependencies are being used - i.e. just check the vendoring tool config - but it does force you to have all dependency code in the project's repo.
The second option follows the go "philosophy" of always tracking master, but it may lead to slower builds (concourse will need to check every single resource regularly) and may lead to sudden breakage because of changes in dependencies.
The third option allows you to implicitly fix the revision of the dependencies in the docker image, in that respect it's similar to the first, however it requires maintaining docker images (doesn't necessarily mean 1 per project, but it might mean more than one depending on the number of projects that use this option and if there are conflicting dependency versions between them)
Related
This question was migrated from Super User because it can be answered on Stack Overflow.
Migrated last month.
Context:
We're running the free version of Teamcity to manage our projects. Some of those projects have dependencies between each others.
The problem
Some projects have chained Snapshot Dependencies, and those dependencies are always being built instead of the latest artifacts from those dependencies being used.
Example: A depends on B, B depends on C. Push A triggers a build of C, followed by a build of B and finally a build of A.
Ideally: A would be built based on the latest built versions of B and C
Where I think the problem lies (but I might be wrong)
Each of our projects has a number of Snapshot dependencies, and each snapshot dependency is configured with the following parameters turned on:
[x] Do not run new build if there is a suitable one
[x] Only use successful builds from suitable ones
For the first option, the documentation says:
If this option is set, TeamCity will not run a new dependency build, if another dependency build in progress or completed with the appropriate source revision already exists. See also Suitable builds: (https://www.jetbrains.com/help/teamcity/2022.10/snapshot-dependencies.html#Suitable+Builds).
If we look in the Suitable Builds doc, it shows a list of requirements for a build to be considered suitable. The one I think is relevant is here:
It must not have any custom settings, including those defined via reverse.dep. (related feature request: TW-23700: (http://youtrack.jetbrains.com/issue/TW-23700)).
However, we currently have reverse.dep.*.env.SOME_PARAMETER as a Configuration Parameter in every single one of our builds (it's inherited through a template).
Based on that, it seems to me that the "Do not run new build if there is a suitable one" option is doing nothing, and therefore that's why all our dependencies are built every time (or am I wrong?)
We also have, in every one of our builds, an environment variable called env.SOME_PARAMETER which has the same value as the reverse.dep configuration parameter.
My question
Is there a way to avoid using reverse.dep in my situation so that the Do not run new build if there is a suitable one option works? Perhaps by using the environment variable instead?
I asked the senior developer at the company I work in, and they said that in theory it should work, but in practice it doesn't, but he seems recitent to explain further. I'm just a beginner in Teamcity, so detailed explanations are welcome
First things first: what is a Snapshot Dependency in a nutshell?
A Snapshot Dependency in a nutshell is a dependency between two build configurations which are linked by shared VCS Roots. VCS Roots are sort of like time lines in a book: they represent a chain of events (e.g. git commit history) and let you build from a givent point in time (e.g. commit).
Now, TeamCity excels at doing what it is intended to do: Continuous Integration and Deployment. It does so by being tied up closely to VCS Roots and effectively changes in these (optionally narrowed down scopes of the) VCS roots. A Snapshot Dependency is a dependency which links together the dependency based on VCS Roots and their changes.
An example
Build A has two VCS Roots, Foo and Bar. Foo and Bar are, say, different Git repositories which Build A needs to fetch before it is able to build the "A" artifact. Build B only needs Foo, and thus only has Foo attached as a VCS Root. Build A has a Snapshot Dependency on Build B, which we configure like yours: "Do not run new build if there is a suitable one" and "Only use successful builds from suitable ones".
So far so good. Now let's push a new commit to the "Foo" repository. TeamCity notices this and potentially triggers a new build of Build A, because the latest build of A is at that point outdated (it did not have our latest Foo commit included). The Snapshot Dependency of B in A links these two build configurations together so that - with our above configuration of the Dep. - we can require a build of B which includes the same revision of Foo, that build A was kicked off with (e.g. the latest commit). Because this does not (yet) exist, a build of B is started and put above build A in the queue.
Simply put: the VCS Root is a timeline, the Snapshot Dependency is a link between two build configurations based on the timeline(s) they have in common, and the configuration of the dependency dictates what should happen when a dependency is needed (e.g. "Do not run new build if there is a suitable one").
If we had manually started a build B with the latest Foo revision included, this would have been a suitable candidate for reuse, and TeamCity would simply remove the queued B build once it discovered that a build of B already exists, which shares the same changes that build A is started with (the latest push to Foo).
If you want just the latest artifacts of B and C...
...use Artifact Dependencies and only these. Removing the Snapshot Dependency of the build removes your need of having the dependency build every time Build A is triggered by a new change in its VCS Root. It however also means that there is no timeline linkage between the two builds and that you yourself need to ensure or be sure that the artifacts produced by B and C are not tightly linked to A. E.g. Build C could produce a driver as an artifact, B could produce a user manual of the driver and build A could just be using the driver, only expecting that it is in a working condition (but otherwise does not depend on changes in it).
Your question about reverse.dep.*...
I've not heard about this causing trouble before. I would however expect that a Snapshot Dependency (and not just an artifact dependency) is required by TeamCity for you to be allowed to use it.
Question is: do you need it? It sounds like you've got the value elsewhere already, and honestly fetching values from previous builds is likely going to cause you trouble in the long run, especially if you don't have a specifically good reason to do so.
I am using go modules on go1.12 to handle my Go dependencies. Is it best practice to also commit the vendor/ directory into version control?
This is somewhat related to Is it best-practice to commit the `vendor` directory? which asks this question in the case of using dep. With dep, commiting vendor/ is the only way to get truly reproducible builds. What about for go modules?
I'd like to give some arguments in favour of committing vendor, go.mod and go.sum.
I agree with the accepted answer's arguments that it's technically unnecessary and bloats the repo.
But here is a list of contra-arguments:
Building the project doesn't depend on some code being available on Github/Gitlab/... or the Go proxy servers. Open source projects may disappear because of censorship, authors incentives, licensing changes or some other reasons I can't currently think of, which did happen on npm, the JavaScript package manager, and broke many projects. Not in your repo, not your code.
We may have used internal or 3rd party Go modules (private) which may also disappear or become inaccessible, but if they are committed in vendor, they are part of our project. Nothing breaks unexpectedly.
Private Go modules may not follow semantic versioning, which means the Go tools will rely on the latest commit hash when fetching them on-the-fly. Repo history may be rewritten (e.g. rebase) and you, a colleague or your CI job may end up with different code for the dependencies they use.
Committing vendor can improve your code review process. Typically we always commit dependency changes in a separate commit, so they can be easily viewed if you're curious.
Here's an interesting observation related to bloating the repo. If I make code review and a team member has included a new dependency with 300 files (or updated one with 300 files changed), I would be pretty curious to deep dive into that and start a discussion about code quality, the need for this change or alternative Go modules. This may lead to actually decrease your binary size and overall complexity.
If I just see a single new line in go.mod in a new Merge Request, chances are I won't even think about it.
CI/CD jobs which perform compilation and build steps need not waste time and network to download the dependencies every time the CI job is executed. All needed dependencies are local and present (go build -mod vendor)
These are on top of my head, if I remember something else, I'll add it here.
Unless you need to modify the vendored packages, you shouldn't. Go modules already gives you reproducible builds as the go.mod file records the exact versions and commit hashes of your dependencies, which the go tool will respect and follow.
The vendor directory can be recreated by running the go mod vendor command, and it's even ignored by default by go build unless you ask it to use it with the -mod=vendor flag.
Read more details:
Go wiki: How do I use vendoring with modules? Is vendoring going away?
Command go: Modules and vendoring
Command go: Make vendored copies of dependencies
We recently brought Golang into the company that I work in for general use, but we hit a snag in the roll out because Go can use the go get command to get packages from the internet. Typically when we roll out Java and Python we are able to limit where the developer can pull packages from by pointing them to our internal artifactory.
So with Python we can change where they pull from by altering the pip command to pull from our internal artifactory, and with Java we can alter their settings.xml and pom.xml to point to our internal packages.
I know that during development you can fetch and pull in dependencies into your local then compile them into a standalone binary. What i am looking for is some mechanism that stops people from going out and pulling from the open internet.
Does something like this exist in Go? Can I stop people from going to the internet and go get 'ing packages?
Any help would be greatly appreciated.
It depends on your definition of "roll out", but typically there are three stages:
Development - at this point you can't prevent arbitrary go get calls, apart from putting the development machines behind restrictive proxies or similar technical measures.
Deployment - since Go programs can (should) be deployed as single binaries, go get is not used at all during deployment.
Building deployment artefacts - this is probably your issue:
The usual approach is not to fetch dependencies when building Go programs. Instead, dependencies are fetched during development, and made part of the source tree using vendoring, for example by using the dep tool.
At this point, the build step no longer needs to fetch any dependencies. The choice of which dependencies are allowed now becomes part of the rest of your process, such as code reviews.
Our company currently uses TFS for source control and build server. Most of our projects are written in C/C++, but we also have some .NET projects and wouldn't want to be limited if we need to use other languages in the future.
We'd like to use Git for our source control and we're trying to understand what would be the best choice for a build server. We have started looking into TeamCity, but there are some issues we're having trouble with which will probably be relevant regardless of our choice of build server:
Build dependencies - We'd like to be able to control the build dependencies for each <project, branch>. For example, have <MyProj, feature_branch> depend on <InfraProj1, feature_branch> and <InfraProj2, master>.
From what we’ve seen, to do that we might need to use Gradle or something similar to build our projects instead of plain MSBuild. Is this correct? Are there simpler ways of achieving this?
Local builds - Obviously we'd like to be able to build projects locally as well. This becomes somewhat of a problem when project dependencies are introduced, as we need a way to reference these resources or copy them locally for the build to succeed. How is this usually solved?
I'd appreciate any input, but a sample setup which covers these issues will also be a great help.
IMHO both issues you mention fall really in the config management category, thus, as you say, unrelated to the build server choice.
A workspace for a project build (doesn't matter if centralized or local) should really contain all necessary resources for the build.
How can you achieve that? Have a project "metadata" git repo with a "content" file containing all your project components and their dependencies (each with its own git/other repo) and their exact versions - effectively tying them together coherently (you may find it useful to store other metadata in this component down the road as well, like component specific SCM info if using a mix of SCMs across the workspace).
A workspace pull wrapper script would first pull this metadata git repo, parse the content file and then pull all the other project components and their dependencies according with the content file info. Any build in such workspace would have all the parts it needs.
When time comes to modify either the code in a project component or the version of one of the dependencies you'll need to also update this content file in the metadata git repo to reflect the update and commit it - this is how your project makes progress coherently, as a whole.
Of course, actually managing dependencies is another matter. Tons of opinions out there, some even conflicting.
By default, Go pulls imported dependencies by grabbing the latest version in master (github) or default (mercurial) if it cannot find the dependency on your GOPATH. And while this workflow is quite simple to grasp, it has become somewhat difficult to tightly control. Because all software change incurs some risk, I'd like to reduce the risk of this potential change in a manageable and repeatable way and avoid inadvertently picking up changes of a dependency, especially when running clean builds via CI server or preparing to deploy.
What is the most effective way I can pin (i.e. lock down or capture) a package dependency so I don't find myself unable to reproduce an old package, or even worse, unexpectedly broken when I'm about to release?
---- Update ----
Additional info on the Current State of Go Packaging. While I ended up (as of 7.20.13) capturing dependencies in a 3rd party folder and managing updates (ala Camlistore), I'm still looking for a better way...
Here is a great list of options.
Also, be sure to see the go 1.5 vendor/ experiment to learn about how go might deal with the problem in future versions.
You might find the way Camlistore does it interesting.
See the third party directory and in particular the update.pl and rewrite-imports.sh script. These scripts update the external repositories, change imports if necessary and make sure that a static version of external repositories is checked in with the rest of the camlistore code.
This means that camlistore has a completely repeatable build as it is self contained, but the third party components can be updated under the control of the camlistore developers.
There is a project to help you in managing your dependencies. Check gopack
godep
I started using godep early last year (2014) and have been very happy with it (it met the concerns I mentioned in my original question). I am no longer using custom scripts to manage the vendoring of dependencies as godep just takes care of it. It has been excellent for ensuring that no drift is introduced regardless of timing or a machine's package state. It works with the existing mechanism of go get and introduces the ability to pin (godep save) and restore (godep restore) based on Godeps/godeps.json.
Check it out:
https://github.com/tools/godep
There is no built in tooling for this in go. However you can fork the dependencies yourself either on local disk or in a cloud service and only merge in upstream changes once you've vetted them.
The 3rd party repositories are completely under your control. 'go get' clones tip, you're right, but you're free to checkout any revision of the cloned-by-go-get or cloned-by-you repository. As long as you don't do 'go get -u', nothing touches your 3rd party repositories already sitting at your hard disk.
Effectively, your external, locally cloned, dependencies are always locked down by default.