$GOPATH directory and diskspace - go

after a year or two developing with GO I had a look at my $GOPATH directory size and was surprised to see it was growing to a 4GB.
I understand that disk space is supposed (and it's not that much on this 128GB laptop SSD) to be cheap but still.
So my questions is, is there some good practice to manage the size $GOPATH directory ?
Would restarting from scratch would be a good idea? (although could be very much time consuming)

Your statement, that deleting the GOPATH would be time consuming, sounds like you are managing the dependencies of your Go projects with plain go get .... In my environment I do two things to make the GOPATH ephemeral.
Don't use plain go get for dependency management. Go 1.6 introduced the /vendor directory. Together with a dependency management tool like Glide every dependency for a certain project lies withing the project directory.
This means, if don't need a dependency anymore you can wipe the vendor directory of the project and download the dependencies again. Therefore you only have dependencies on your disk, which you actually need.
Also, if you stop working on a project and delete it from your disk, the dependencies also get deleted.
You can specify multiple paths in the GOPATH. Like the PATH environment variable you can split them by a colon. The interesting thing is, that Go uses the first path for downloading projects. On my machine the GOPATH looks like $HOME/.gopath:$HOME/projects. So if you put all your actual projects into the second directory, you will have an clear separation between your projects and dependencies. Thus you could wipe the first directory from time to time, but don't need to fear that you have to clone every of your projects, again.
These two things don't help to reduce the disk usage of your Go projects and their dependencies, but you get a better overview whats actually needed and makes you able to delete the dependency directory whenever you like.

Related

Should I commit vendor directory with go mod?

I am using go modules on go1.12 to handle my Go dependencies. Is it best practice to also commit the vendor/ directory into version control?
This is somewhat related to Is it best-practice to commit the `vendor` directory? which asks this question in the case of using dep. With dep, commiting vendor/ is the only way to get truly reproducible builds. What about for go modules?
I'd like to give some arguments in favour of committing vendor, go.mod and go.sum.
I agree with the accepted answer's arguments that it's technically unnecessary and bloats the repo.
But here is a list of contra-arguments:
Building the project doesn't depend on some code being available on Github/Gitlab/... or the Go proxy servers. Open source projects may disappear because of censorship, authors incentives, licensing changes or some other reasons I can't currently think of, which did happen on npm, the JavaScript package manager, and broke many projects. Not in your repo, not your code.
We may have used internal or 3rd party Go modules (private) which may also disappear or become inaccessible, but if they are committed in vendor, they are part of our project. Nothing breaks unexpectedly.
Private Go modules may not follow semantic versioning, which means the Go tools will rely on the latest commit hash when fetching them on-the-fly. Repo history may be rewritten (e.g. rebase) and you, a colleague or your CI job may end up with different code for the dependencies they use.
Committing vendor can improve your code review process. Typically we always commit dependency changes in a separate commit, so they can be easily viewed if you're curious.
Here's an interesting observation related to bloating the repo. If I make code review and a team member has included a new dependency with 300 files (or updated one with 300 files changed), I would be pretty curious to deep dive into that and start a discussion about code quality, the need for this change or alternative Go modules. This may lead to actually decrease your binary size and overall complexity.
If I just see a single new line in go.mod in a new Merge Request, chances are I won't even think about it.
CI/CD jobs which perform compilation and build steps need not waste time and network to download the dependencies every time the CI job is executed. All needed dependencies are local and present (go build -mod vendor)
These are on top of my head, if I remember something else, I'll add it here.
Unless you need to modify the vendored packages, you shouldn't. Go modules already gives you reproducible builds as the go.mod file records the exact versions and commit hashes of your dependencies, which the go tool will respect and follow.
The vendor directory can be recreated by running the go mod vendor command, and it's even ignored by default by go build unless you ask it to use it with the -mod=vendor flag.
Read more details:
Go wiki: How do I use vendoring with modules? Is vendoring going away?
Command go: Modules and vendoring
Command go: Make vendored copies of dependencies

Least-impact solution to binary references in VCS

We are using TeamCity 2017.1 and have been using it for years with great joy. A long time ago, someone decided that all third-party binaires should be put into Subversion (our VCS of choice).
This has worked fine, but over time this repos has grown quite large, and combined with our being better and better at using TeamCity, we now have dozens of build configurations which all uses third-party binaries.
Our third-party folder is called Department and is around 2.6 GB in size. As such this is not so bad, but remember that this folder is used by pretty much every single project on the build server!
Now, I will agree with everyone that says that we should use Nugets, network shares etc., and that would work great with new projects. However, we have a lot of history and we cannot begin to change every single solution and branch.
A co-worker came up with the idea, that IF we made a single build project that in reality did nothing but keep a single folder updated with our Department stuff. Then we just need to find a way to reference this, without have to change all our projects and solutions.
My initial though is using Snapshot dependencies and then create a symbolic link as the first build step and remove it as the last, in order to achieve the same relative levels.
But is there a better way? What do other people do?
And keep in mind, that replacing with nugets or something else is not an option.
Let me follow the idea of your colleague and improve it. There would be a build configuration that monitors the Subversion repository and copies packages to a network share. That network share will be used by development teams as nuget repository. Projects that will convert their dependencies from Binary reference to nuget reference will enjoy faster build. When all the teams will use nuget repositories you may kill that Subversion.

How to store go dependencies?

I am using GoDep to resolve a project dependencies.
My problem is that repositories for dependencies maight be removed and my project wouldn't build.
I am trying to find any solution to store dependencies at Artifactory or another solution.
Please advice.
Regards.
Okay so GoDeps may be the standard way of doing this, but I usually found it a bit complicated. In my opinion, use a Makefile which sets a custom GoPath and just include dependencies with your code (remove their .git folder). This way the version freezes and no one needs to do a godep restore or something similar.
You can make recipes like make deploy that builds your code, runs GoFmt, cleans the pkg files, installs it to your custom GoPath bin/ and then you just go and run the binary.
You can have another one like make install that will install any missing dependencies.
I've managed to create a watch using this on my Makefile to keep on looking for changes on a linux based system using inotify-tools and call rebuild.
Internally all commands will be using standard go commands but you'll get rid of the GoDeps and maintaining JSON. To upgrade a dependency, it may be a bit of a problem as you'd have to manually copy the whole directory into your custom path and remove the .git/ folder.
Our company uses this method and seems to work quite nice for us.
Plus this method basically gets you away the $GOPATH/src/github.com/repoName/ kind of paths.
If i seem unclear, let me know, I'll add a gist on github.

Getting leiningen to cache packages

In a clojurescript project I'd like leiningen to be less reliant on the internet connection during our CI builds. I was hoping to get it to cache packages on a network disc (using the :local-repo setting to create a "shared cache") and then add it as a repository in such way that it fetches from there first and only from clojars and other external sites when it can't find it in the "shared cache".
I read this, removed my ~/.m2 folder, and added the following to my project.clj:
:profiles {:local-cache
{:local-repo "/shared/disc/clojars-cache"
:repositories {"local" {:uri "file:///shared/disc/clojars-cache"
:releases {:checksum :ignore}}}}}
The initial build with lein with-profile +local-cache cljsbuild does indeed populate the cache, but
My ~/.m2/repository folder is recreated and filled with stuff, though it seems to only be the clojure stuff needed by leiningen, and
after removing ~/.m2 subsequent rebuilds don't seem to use the local repository at all but instead download from clojars anyway.
Clearly I'm missing something... or maybe I'm going about this in the completely wrong way.
In short, how can I get leiningen to
create a cache of packages on a network disc, and
get it to prefer this cache as source of packages (over external sources like clojars)?
Leiningen already prefers to go to ~/.m2 by default. It will only go to Clojars if it doesn't already have a copy of the JAR that is requested stored locally in its ~/.m2. The exception to this rule is if you are specifying SNAPSHOT versions, where it will go out to the network to check if the SNAPSHOT version it has is the latest once a day (by default).
You can set the :offline? key as true if you don't want Leiningen to go to the network at all.
Answering your questions:
How can I get Leiningen to create a cache of packages on a network disk?
Leiningen already creates a cache of packages in ~/.m2. You could symlink that directory to your network disk, or use :local-repo as you are now, although it sounds like :local-repo isn't working for you?
How can I get Leiningen to prefer this cache over external sources?
Leiningen already does this. It sounds like :local-repo either isn't working, hasn't been configured correctly, or that directory isn't writable by Leiningen?
Stepping back to look at the wider problem, you're wanting to prevent unnecessary network traffic in your CI builds. Leiningen already caches every dependency by default. You haven't said which CI tool you're using, but they should all have the ability to cache the ~/.m2 folder between runs. Depending on the tool, you'll have to download your deps either once per project, or once per machine. I'd recommend sticking with that, rather than trying to share deps over the network, as that could lead to hard to debug test failures.
If that's not going to work for you, can you provide more details about your setup, and why you'd like Leiningen to be less reliant on the network in your CI builds?
UPDATE: After seeing that Gitlab CI is being used, it looks like you need to add a caching config?
cache:
paths:
- ~/.m2

In Bamboo, how do I pull a component library repository to a fixed location to avoid per-branch duplication?

I have several projects which use code from a large set of component libraries. These libraries are under source control.
The libraries repository contains all the libraries used by all my projects and contains multiple versions of multiple libraries. Each library/version pair lives in its own folder. Each of my projects identifies the specific library/version pairs it needs through the folder paths of the references in its project file.
For example $(LibraryPath)\SomeLibrary\v1.1.5
Please note that the libraries repository is only ever added to. No changes are made to stuff already in the repository. Ever.
I have been of course been able to configure my build plan to pull the libraries repository to a libraries subfolder of the working directory. So far so good. However, using the auto branch management feature of Bamboo, this setup means that the libraries repository is cloned for each and every branch in all projects.
Not funny. No, really, not funny...
What I would like to do is:
pull the libraries repository in each build plan
but pull it to a fixed location that is the same for all build plans
it doesn't have to be an absolute path
but it does need to be outside the working directory of the current build plan to avoid unnecessary duplication
Unfortunately the Checkout Directory of the Source Code Checkout configuration task in a Bamboo build plan doesn't allow me to specify either an absolute path or a relative one that goes "up" for one or more levels from the working dir. The hint text explicitly states "(Optional) Specify an alternative sub-directory to which the code will be checked out." And indeed, specifying something like ..\Library gets punished with the message "Checkout to parent directory is forbidden".
I have seen information on the "artifact sharing" feature of Bamboo. This will probably work, but it seems like overkill for what I want to achieve.
What would be the easiest and least complicated way to achieve my goal using Atlassian's Bamboo Continuous Integration?
Out-of-the-box alternatives are welcome, but please don't direct me to any products that require intimate CLI use and/or whose documentation assumes (extensive) knowledge of 'nix and/or Java setup. I am on Windows and spoiled rotten by powerful (G)UI's.
I have the same problem - with a repository weighing in at around 2GB.
I'd like to simply "git checkout myBranch" and "git clean -fxd" instead of cloning every time (which should save a lot of time and disk space). However I also like Bamboo's automatic trigger with new branches showing up.
Like the OP, I'd love to be able to put "..\SharedDirectory" in the "CheckoutDirectory" for the
"Source Code Checkout" task but it won't let me go out above the \JOB_KEY\ folder
One possible solution is: replacing the "Source Code Checkout" task with the two git commands above. That way I can specify exact when/where/how to do the checkout. I think there may be problems with the initial checkout in this case - but once that is solved, all subsequent branches would use the same shared folder, and no more pulling down 2GB every time.

Resources