How to check-in packages when using Go modules? - go

We currently are using govendor to manage packages in our go repository. Since we are using a lot of packages, we have decided to check-in the packages sources code into vendor folder, so that:
Saving time downloading all packages every time the repository needs to be built in build machines.
Avoiding the possibility of one package becoming unavailable online (being deleted, network issues, etc...)
I am interested to use the modules notion introduced in v1.11. However I can't seem to find a similar approach of check-ing in the packages instead of having to download all the packages.
Any ideas?

Go modules provide a go mod vendor command that will create a vendor directory in your package root, same as glide or govendor or dep do.

Related

Should I commit vendor directory with go mod?

I am using go modules on go1.12 to handle my Go dependencies. Is it best practice to also commit the vendor/ directory into version control?
This is somewhat related to Is it best-practice to commit the `vendor` directory? which asks this question in the case of using dep. With dep, commiting vendor/ is the only way to get truly reproducible builds. What about for go modules?
I'd like to give some arguments in favour of committing vendor, go.mod and go.sum.
I agree with the accepted answer's arguments that it's technically unnecessary and bloats the repo.
But here is a list of contra-arguments:
Building the project doesn't depend on some code being available on Github/Gitlab/... or the Go proxy servers. Open source projects may disappear because of censorship, authors incentives, licensing changes or some other reasons I can't currently think of, which did happen on npm, the JavaScript package manager, and broke many projects. Not in your repo, not your code.
We may have used internal or 3rd party Go modules (private) which may also disappear or become inaccessible, but if they are committed in vendor, they are part of our project. Nothing breaks unexpectedly.
Private Go modules may not follow semantic versioning, which means the Go tools will rely on the latest commit hash when fetching them on-the-fly. Repo history may be rewritten (e.g. rebase) and you, a colleague or your CI job may end up with different code for the dependencies they use.
Committing vendor can improve your code review process. Typically we always commit dependency changes in a separate commit, so they can be easily viewed if you're curious.
Here's an interesting observation related to bloating the repo. If I make code review and a team member has included a new dependency with 300 files (or updated one with 300 files changed), I would be pretty curious to deep dive into that and start a discussion about code quality, the need for this change or alternative Go modules. This may lead to actually decrease your binary size and overall complexity.
If I just see a single new line in go.mod in a new Merge Request, chances are I won't even think about it.
CI/CD jobs which perform compilation and build steps need not waste time and network to download the dependencies every time the CI job is executed. All needed dependencies are local and present (go build -mod vendor)
These are on top of my head, if I remember something else, I'll add it here.
Unless you need to modify the vendored packages, you shouldn't. Go modules already gives you reproducible builds as the go.mod file records the exact versions and commit hashes of your dependencies, which the go tool will respect and follow.
The vendor directory can be recreated by running the go mod vendor command, and it's even ignored by default by go build unless you ask it to use it with the -mod=vendor flag.
Read more details:
Go wiki: How do I use vendoring with modules? Is vendoring going away?
Command go: Modules and vendoring
Command go: Make vendored copies of dependencies

How to develop vendored libraries?

I recently jumped back into an old Go project and migrated it to the vendor approach, using dep
It completed sucessfully and I now have a bunch of libs in the local projects vendor/ directory. Great!
However, I now want to work on one of those vendored libs, and see the changes in my main app live. The lib exists in its own project of course, doesn't live its life in this other apps vendor folder - and other local projects should be able to see the live changes too if necessary.
I'm thinking something like npm link, to compare across ecosystems.
How is this typically managed in Go?

Does golang have a central repository for the downloaded third-party packages?

I'm new to Golang. As I understand, when you want to create a new Go project, we just need to create a directory. Then we point the environment variable GOPATH to this directory. Inside this directory, we create three subdirectories pkg, src and bin. Then when we execute go get ..., the third-party package will be installed in the pkg subdirectory. Later if I want to create another Go project, I create a new dir called project2 and point GOPATH to project2. At this time go get ... will download third-party package in the pkg subdirectory of project2. My question is, whether Go has a central repository? If not, the same package will be downloaded twice if they are used in two different projects. Is that true?
I guess now there is https://gocenter.jfrog.com/
More info in this blog https://jfrog.com/blog/go-at-full-speed-with-gocenter
There is no central repository of go packages. Go always is looking for packages either in GOPATH or GOROOT. go get simply downloads packages using git or mercurial. I recommend you to read
https://golang.org/doc/code.html
and https://peter.bourgon.org/go-best-practices-2016/#repository-structure
GOPATH simply tells go compiler where to search for src, pkg directories.
Later if I want to create another Go project, I create a new dir called project2 and point GOPATH to project2
…
My question is, whether Go has a central repository? If not, the same package will be downloaded twice if they are used in two different projects. Is that true?
No, there is no central repository for Go code. However, it is also not true that the packages will always be downloaded twice.
The misconception here is that GOPATH points to an individual project: it does not. Instead, GOPATH points to an environment where all of your packages live; it is where go get will download packages, and where go build will look for packages when building.
Instead of changing GOPATH for every project, you should set GOPATH once and put all of your projects in $GOPATH/src/ (your projects don't contain an src/ directory, they go in the src/ directory).
So for example, the entire tree might look like:
$GOPATH/src/bitbucket.org/ (or GitHub, or your website, or whatever)
├── YourProject
└── AnotherProject
Update
It is worth noting that this answer is no longer correct. Now that Go Modules are the normal versioning mechanism for Go code and $GOPATH is being phased out, a central proxy has been setup that routes all requests for packages through Google servers where the various tagged versions of the package can be cached. A separate checksum database keeps hashes for every package that are audit-able and can help you detect if a package author has changed an already released tag. All of this isn't a central repository in the same sense that PyPi (in the Python world) or NPM (for JavaScript) are a repo: the packages are still fetched from their source control, but because all packages are routed through the proxy by default it serves a similar purpose. For more information see https://proxy.golang.org/
Recently, a new site that collects information about Go packages has emerged:
https://go.dev/.
go.dev is the hub for Go users providing centralized and curated resources from across the Go ecosystem.
It is an official companion website to golang.org. It does not qualify for a repository, such as cpan, nmpjs, nuget or crates. For external packages, it simply links to their respective Github pages.
Go.dev is currently in MVP status. We’re proud of what we’ve built and excited to share it with the community. We hope you find value and joy in using go.dev. Go.dev only has a small portion of features we intend to build, and we are actively seeking feedback
But as is written it the about page, it is still in early development. Maybe one day (hopefully) it shall become a fully featured code repository.

How to store go dependencies?

I am using GoDep to resolve a project dependencies.
My problem is that repositories for dependencies maight be removed and my project wouldn't build.
I am trying to find any solution to store dependencies at Artifactory or another solution.
Please advice.
Regards.
Okay so GoDeps may be the standard way of doing this, but I usually found it a bit complicated. In my opinion, use a Makefile which sets a custom GoPath and just include dependencies with your code (remove their .git folder). This way the version freezes and no one needs to do a godep restore or something similar.
You can make recipes like make deploy that builds your code, runs GoFmt, cleans the pkg files, installs it to your custom GoPath bin/ and then you just go and run the binary.
You can have another one like make install that will install any missing dependencies.
I've managed to create a watch using this on my Makefile to keep on looking for changes on a linux based system using inotify-tools and call rebuild.
Internally all commands will be using standard go commands but you'll get rid of the GoDeps and maintaining JSON. To upgrade a dependency, it may be a bit of a problem as you'd have to manually copy the whole directory into your custom path and remove the .git/ folder.
Our company uses this method and seems to work quite nice for us.
Plus this method basically gets you away the $GOPATH/src/github.com/repoName/ kind of paths.
If i seem unclear, let me know, I'll add a gist on github.

Project structure. Scientific Python projects

I am looking for a better way to structure my research projects. I have the following setup:
There are projects a,b,c and a library lib. Each project tackles a different research question and the library carries code that is used across projects. Thus all projects depend on lib. Things get more complicated as project c depends on projects a and b as well. When I work on project c, I will also update a,b or lib simultaneously. Each project is in a separate git repository.
So far I have dealt with this situation by including the dependencies above via git submodule and all the source files are located in the root dir of the project. The advantage is that I keep track of which version of lib my projects depend. Also one of my projects could depend on an outdated version of lib. I run everything from the root directory without "installing" any of the packages to site-packages or so. When a path is not set correctly, I override it via sys.path.insert.
However, the following points make me want to change layout:
I keep losing track of which version of lib I am editing.
I want to make use of automated testing tools (tox,jenkins etc.) which seem to be much easier to handle with a standard project setup.
sys.path.insert can lead to subtle problems which are hard to debug.
I usually want all my projects to work with the tip of lib anyway.
Therefore I am currently rearranging all projects (especiall lib) to be in line with the standard Python directory structure (source stored in a subdirectory, root contains a setup.py file) to be able to work in a virtualenv. Then I can list all my dependencies in requirements.txt. First I install lib as develop via pip install -e . Then I run pip freeze > requirements.txt which then includes a line similar to this.
-e git+<path_to_remote>#<sha>#egg=`lib`
So again I have generated a dependency to a specific commit (sha) as with git submodule, ensuring that I can checkout an old commit and the project should run. I can now install everything in a virtualenv and got rid of my path problems. Great.
I face some new trouble though. One problem is, how to update the sha in requirements.txt. The easiest (but probably not most elegant) solution I see is to write a pre-commit hook that updates the sha before commiting. Is there a better way?
And more generally - do you see a better solution given my setup?
As far as I see you have mostly solved your problem and there are only small bits left.
1) Don't use hashes to identify versions of your libraries. Even if you don't publish your libraries to the Cheese Shop, do a normal library versioning (semver) and tag you git repositories accordingly. Thing way you will have human-readable and manageable version in your git+https://github.com/... URLs of dependencies.
2) Make your tox setup in the way that will let you test stable version of dependencies (that you have tagged last time) and master version right from the latest repo revision.

Resources