Best practices for external benchmarks when using Go Modules - go

I have a Go repository, and within it I have some benchmarks (in a _test suffixed package). These benchmarks compare it to, among other things, some third party libraries. I am not using these libraries in my non-benchmark code.
I am now migrating my repo to go modules. I do not want those third party libraries in my go.mod since my library doesn't need them for normal usage, and I don't want to tie my module to those unnecessarily.
What is the recommended go-mod way to do this? My ideas:
build tag on the benchmarks
benchmarks to another repo
module within my module

If someone wants to run your benchmark (for example, to check whether its stated results hold for their machine configuration), then they need to know what versions of dependencies those benchmarks were originally run with. The information needed to reproduce your test and benchmark results belongs in your go.mod file.
But note that “having a minimum version” is not the same as “importing”.
If a user builds your package but does not build and run its test, or if they build some other package within your module, then they will not need to download the source code for the benchmark dependency even if that dependency is included in your go.mod file.
(And the proposal in https://golang.org/issue/36460 doubles-down on that property: if implemented, that proposal would avoid loading dependencies of packages that are never imported, potentially pruning out large chunks of the dependency graph.)
So if you really don't want users to have to build the dependencies of your benchmark, put the benchmark in a separate package from the one that you expect your users to import.

Related

What are the benefits of having a vendor folder?

I can't really grasp the purpose of having a vendor folder. Based on what I learned, it seems the vendor folder is only beneficial if you're trying to make your repo compatible with golang versions earlier than 1.11. We are running golang 1.12.14.
When I brought this up to my coworker he said:
Please use vendor with modules - go doesn't have a global artifactory. this is, currently, the best option to make sure you have hermetic builds and your code doesn't break when somebody changes something in their repo.
I thought this is what Go modules does? I asked this question and a commenter is saying I shouldn't use vendor? Does it make sense to add `go mod vendor` to a pre-commit hook?
Go modules bring the guarantee that you will be able to build your packages deterministically by locking down the dependencies into a go.sum. That being said, the promise to deterministically build your project only stands if your dependencies are still accessible in the future. You don't know if this is going to be the case.
Vendoring on the other hand, with or without Go modules, brings stronger guarantees as it enables to commit the dependencies next to the code. Thus even if the remote repository is no longer accessible (deleted, renamed, etc), you will still be able to build your project.
Another alternative is to use Go modules along with a proxy. You can find more information in the official documentation. You can also look at some OSS implementations like gomods/athens or goproxy/goproxy. If you don't feel like setting up and maintaining your own proxy, some commercial offers are available on the market.
So should you go mod vendor each time you commit? Well it's ultimately up to you dependending on the kind of guarantees you want. But yes leveraging a proxy or vendoring your dependencies help getting closer to reproducable builds.
Note: with Go 1.17, go mod vendor (from 1.17 Go commands) might be easier to use:
vendor contents
If the main module specifies go 1.17 or higher, go mod vendor now annotates vendor/modules.txt with the go version indicated by each vendored module in its own go.mod file.
The annotated version is used when building the module's packages from vendored source code.
If the main module specifies go 1.17 or higher, go mod vendor now omits go.mod and go.sum files for vendored dependencies, which can otherwise interfere with the ability of the go command to identify the correct module root when invoked within the vendor tree.
Vendor Folder is a great way to organize and manage third-party dependencies in your project. It is especially useful when your code relies on external libraries or frameworks.
Benefits of having a Vendor Folder:
It helps to reduce dependencies conflicts.
It allows you to keep a separate version of each library / framework installed in your project.
It helps to keep the project structure clean and organized.
It makes it easy to update, install, and remove any dependencies with minimal effort.
It makes it easier to switch between different versions of a library or framework.

Best practice(s) with Go Modules

I'm "all in" on Go Modules. Mostly, I prefer the experience. In Go development, I've -- perhaps like many others -- treated dependencies as if I worked in a mono repo, each of my projects had its own GOPATH and I'd often clone from scratch and pull all then-latest versions of dependencies.
Using Modules, I think I'm breaking the best practice:
For per-commit builds, my projects' go.mod file would contain only primary -- and often only one -- explicit dependencies. Effectively, I don't commit go.mod and leave my build process to generate it and then the build. My thinking being that, apart from e.g. specific platforms that I'm using, where my familiarity with them means I'm confident in pinning to a specific version, for other dependencies, I'd rather maintain currency and get #vLatest.
If I get to building releases, I'd then go mod tidy and commit the go.mod to source control for the basis of the build.
Besides:
potentially breaking builds (which is acceptable for currency);
the absence of go.sum and package hashes (which I'm not independently verifying but trusting, e.g. golang.proxy.org); and
the repetition of pulling dependencies which is unavoidable anyway with my build process,
Is this approach bad?
For building releases, dependency immutability and build reproducibility are critical to software releases. Relying on go mod tidy to create the go.mod assumes the module git tag is immutable and is always available which is not the case. To ensure that the module tag is persistent and immutable, a go module repository is recommended. Refer to Go1.11 documentation for a list of "always on" module repositories and enterprise proxies. A short video on "Go Module and Dependency Management - GoCenter and Project Athens" talks about immutable dependency management..

Multiple modules within the same project

I've been playing with Go modules and I was wondering what the best practice is in terms of the following directory structure:
project
├── go.mod
├── main.go
└── players
├── go.mod
├── players.go
└── players_test.go
I was having problems importing the players package into my root project at first, but I noticed I could do this in the root go.mod file
module github.com/<name>/<project>
require (
github.com/<name>/players v0.0.0
)
replace github.com/<name>/players => ./players
This then allows me to do import "github.com/<name>/players" in my main.go file.
Now this approach works and was taken from here but I'm not sure if that's the correct approach for this or whether this approach is just meant for updating a local package temporarily while it's outside version control.
Another option, that seems a little overkill, is to make every module its own repository?
TL;DR; - What's the best practice approach to having multiple modules within the same repository and importing them in in other modules / a root main.go file?
In general a module should be a collection of packages.
But still you can create modules of single packages. As Volker said, this might only make sense, if you want these packages to have a different lifecycle. It could also make sense, when you want to import these modules from another project and don't want the overhead of the whole collection of packages.
In General:
A module is a collection of related Go packages that are versioned together as a single unit.
Modules record precise dependency requirements and create reproducible builds.
Most often, a version control repository contains exactly one module defined in the repository root. (Multiple modules are supported in a single repository, but typically that would result in more work on an on-going basis than a single module per repository).
Summarizing the relationship between repositories, modules, and packages:
A repository contains one or more Go modules.
2. Each module contains one or more Go packages.
3. Each package consists of one or more Go source files in a single directory.
Source of the Quote: https://github.com/golang/go/wiki/Modules#modules
To answer the question:
You can do it the way you have shown in your approach
I understand this is an old question, but there are some more details that are worth mentioning when managing multiple modules in one repository, with or without go.work.
TL;DR
Each approach has pros and cons, but if you are working on a large code base with many modules, I'd suggest sticking to use version handling based on commits or tags, and use Go Workspace for your day to day development.
Go Module Details
replace Directive with No Versioning
When you use replace directive pointing to a local directory, you will find the version of the dependency module as v0.0.0-00010101000000-000000000000. Essentially you get no version information.
With the main go.mod defined with github.com/name/project module path, github.com/name/project module cannot make a reproducible build, because the dependency target for replace directive may have had its content updated. This can be especially problematic if the dependency target of github.com/name/project/players is used by many modules. Any change in such a common package can result in a behaviour change for all the dependents, all at the same time.
If that's not your concern, replace directive should work absolutely fine. In such a setup, go.work may be a layer you don't really need.
With Versioning
If you want to ensure version setup works for reproducible and deterministic build for multiple modules, you can take a few different approaches.
One go.mod, one repository
This is probably the easiest approach. For each module, there is a clear commit history and versioning. As long as you refer to the module via remote repository, this is probably the easiest setup to start with, and dependency setup is very clear.
However, note that this approach would mean you'd need to manage multiple repositories, and making go.work to help is going to require appropriate local directory mapping, which can be difficult for someone new to the code base.
Commit based versioning
It is still possible to deterministically define dependency with version information so that you can build your code, within a single repository. Commit based approach requires least step, and still works nicely. There are some catches to be noted, though.
For github.com/name/project to have a dependency for github.com/name/project/players, you need to ensure the code you need is in the remote repository. This is because github.com/name/project will pull the code and commit information from the remote repository, even if the same code is available on your local copy of the repository. This ensures that the version of github.com/name/project/players is taken from the commit reference, such as v0.1.1-0.20220418015705-5f504416395d (ref: details of "pseudo-version")
The module name must match up the directory structure. For example, if you have the single repository github.com/name/project, and module under /src/mymodule/, the module name must be github.com/name/project/src/mymodule. This is because when module path resolution takes place, Go finds the root of repository (in the above example, this would be github.com/name/project.git), and then tries to follow the directory path based on the module name.
If you are working in a private repository, you will need to ensure go.sum check doesn't block you. You can simply use GOPRIVATE=github.com/name/project to specify paths you don't want the checksum verification to be skipped.
Tag based versioning
Instead of using the commit SHA, you can use Git tags.
But because there could be many modules in one repository, Go Module needs to find which tag maps to which. For example, with the following directory structure:
# All assumed to be using `github.com/name/project` prefix before package name
mypackage/ # v1.0.0
anotherpackage/ # v0.5.1
nested/dependency/ # v0.8.3
You will need to create tags in github.com/name/project, named exactly to match the directory structure, such that:
mypackage/v1.0.0
anotherpackage/v0.5.1
nested/dependency/v0.8.3
This way, each tag is correctly referenced by Go Module, and your dependency can be kept deterministic.
go.work Behaviour
If you have go.work on a parent directory with go work use github.com/name/project/players, etc., that takes precedence and uses the local files. This is even when you have a version specified in your go.mod.
For local development, which spans across multiple projects, Go Workspace is a great way to work on multiple things at once, without needing to push the code change for the dependency only first. But at the same time, actual release will still require broken up commits, so that first commit can be referenced later in other code change.
go.work is said to be a file you rarely need to commit to the repository. You must be aware of what the impact of having go.work in parent paths would be, though.
--
References:
https://go.dev/doc/modules/managing-source: Discussion around repository setup
https://go.dev/ref/mod: Go Modules Reference
Side Note:
I have given a talk about this at Go Conference, hosted in Japan - you can find some demo code, slides, etc. here if you are curious to know more with examples.
In 2022, the best practice approach to having multiple modules within the same repository and importing them in other modules.
This is supported with a new "go module workspace".
Released with Go 1.18 and the new go work command.
See "Proposal: Multi-Module Workspaces in cmd/go" and issue 45713:
The presence of a go.work file in the working directory or a containing directory will put the go command into workspace mode.
The go.work file specifies a set of local modules that comprise a workspace.
When invoked in workspace mode, the go command will always select these modules and a consistent set of dependencies.
go.work file:
go 1.18
directory (
./baz // foo.org/bar/baz
./tools // golang.org/x/tools
)
replace golang.org/x/net => example.com/fork/net v1.4.5
You now have CL 355689
cmd/go: add GOWORK to go env command
GOWORK will be set to the go.work file's path, if in workspace mode
or will be empty otherwise.

How can I handle split packages in automatic modules?

I am currently testing to migrate an existing application to Jigsaw Modules. One of my modules uses ElasticSearch along with its Groovy Plugin.
org.elasticsearch:elasticsearch
org.elasticsearch.module:lang-groovy
Unfortunately, they share a split package, so mvn install gives me:
x reads package org.elasticsearch.script.groovy from both lang.groovy and elasticsearch
once for each required module in the descriptor, where x is the name of each module.
I assume that a newer elasticsearch version will have eliminated the split package by the time Java 9 is final, but is there generally a way to handle split packages in legacy dependencies?
I was hoping to be able to have those on the the classpath instead of the module path, but after reading this conversation on the mailing list it seems that there is no way to tell the Maven compiler to do so.
maven 3.3.9 -
maven-compiler-plugin 3.6.0 -
jdk9-ea+149 -
elasticsearch 2.3.3
After some more testing, I think there are a few options which should tackle many (but definitely not all) 3rd party split package situations.
Clean up dependencies - maybe a dependency isn't actually needed or can be replaced by a newer (or more distinct) JAR
Restructure your own module into two modules, each reading the package from one of both 3rd party modules (if possible/reasonable)
wrap one of the 3rd party modules (or both) in a simple module, which does nothing but explicitly export only the package(s) that are actually needed by your module.
Depending on the situation, one of these options might be a good fit to resolve the split package problem. But none of them can handle situations in which a coherent piece of code actually needs to access classes from both parts of the split package.

Maven copy resources in multi module project

My need is pretty basic but I could not find any clean answer to it: I simply need to be able to distribute a resource in a multi-module project.
Let us consider for example the LICENSE file, which I hereby assume to be the same for all modules. I prefer not to manually copy it into each and every module because the file could change over time. I also prefer not to statically link to resources (even if using relative paths) outside the project folder, because the modular structure can possibly change too.
Is there any plugin that can be used to robustly guarantee that each module is given the required file? It would be equally acceptable for such copy to be obtained by exploiting the POM of the parent project or directly performed by the super project in the modular hierarchy.
you could use the assembly and the dependency plugins.. did you stumble over that link?
http://www.sonatype.com/people/2008/04/how-to-share-resources-across-projects-in-maven/
it describes that option ..its from 2008, but maven is around for quite some time.. so I guess its more or less up to date
edit regarding comment
Another option is the maven-remote-resources-plugin.
For a more detailed example see:
http://maven.apache.org/plugins/maven-remote-resources-plugin/examples/sharing-resources.html
Since their intro speaks actually for itself, I quote (maven.apache.org)
This plugin is used to retrieve JARs of resources from remote repositories, process those resources, and incorporate them into JARs you build with Maven. A very common use-case is the need to package certain resources in a consistent way across your organization: at Apache it is required that every JAR produced contains a copy of the Apache license and a notice file that references all used software in a given project.

Resources