My go project consists of many components. Every component has its own vendor directory, which is populated by the dep. Because components have similar dependencies, there is a huge duplication in vendor directories.
Additionally, vendors are quite big: ~20MB.
My idea is to reduce the size of the repository by defining common vendor, on the top of the project.
project
vendor
|--component1
|----main.go
|----vendor
|--component2
|----main.go
|----vendor
Every component needs to define only dependencies specific to him.
To not provision common dependencies on every dep ensure executed on the component level, we can specify which packages should be ignored in Gopkg.toml file:
ignored = ["github.com/aszecowka/calc"]
Question: Does anyone use this approach? Any alternatives?
Update Context:
In my company we are investigating monorepo approach, we try to consolidate different go projects, but we end up with a really huge repository - mostly because of many vendors directories
Related
I've been playing with Go modules and I was wondering what the best practice is in terms of the following directory structure:
project
├── go.mod
├── main.go
└── players
├── go.mod
├── players.go
└── players_test.go
I was having problems importing the players package into my root project at first, but I noticed I could do this in the root go.mod file
module github.com/<name>/<project>
require (
github.com/<name>/players v0.0.0
)
replace github.com/<name>/players => ./players
This then allows me to do import "github.com/<name>/players" in my main.go file.
Now this approach works and was taken from here but I'm not sure if that's the correct approach for this or whether this approach is just meant for updating a local package temporarily while it's outside version control.
Another option, that seems a little overkill, is to make every module its own repository?
TL;DR; - What's the best practice approach to having multiple modules within the same repository and importing them in in other modules / a root main.go file?
In general a module should be a collection of packages.
But still you can create modules of single packages. As Volker said, this might only make sense, if you want these packages to have a different lifecycle. It could also make sense, when you want to import these modules from another project and don't want the overhead of the whole collection of packages.
In General:
A module is a collection of related Go packages that are versioned together as a single unit.
Modules record precise dependency requirements and create reproducible builds.
Most often, a version control repository contains exactly one module defined in the repository root. (Multiple modules are supported in a single repository, but typically that would result in more work on an on-going basis than a single module per repository).
Summarizing the relationship between repositories, modules, and packages:
A repository contains one or more Go modules.
2. Each module contains one or more Go packages.
3. Each package consists of one or more Go source files in a single directory.
Source of the Quote: https://github.com/golang/go/wiki/Modules#modules
To answer the question:
You can do it the way you have shown in your approach
I understand this is an old question, but there are some more details that are worth mentioning when managing multiple modules in one repository, with or without go.work.
TL;DR
Each approach has pros and cons, but if you are working on a large code base with many modules, I'd suggest sticking to use version handling based on commits or tags, and use Go Workspace for your day to day development.
Go Module Details
replace Directive with No Versioning
When you use replace directive pointing to a local directory, you will find the version of the dependency module as v0.0.0-00010101000000-000000000000. Essentially you get no version information.
With the main go.mod defined with github.com/name/project module path, github.com/name/project module cannot make a reproducible build, because the dependency target for replace directive may have had its content updated. This can be especially problematic if the dependency target of github.com/name/project/players is used by many modules. Any change in such a common package can result in a behaviour change for all the dependents, all at the same time.
If that's not your concern, replace directive should work absolutely fine. In such a setup, go.work may be a layer you don't really need.
With Versioning
If you want to ensure version setup works for reproducible and deterministic build for multiple modules, you can take a few different approaches.
One go.mod, one repository
This is probably the easiest approach. For each module, there is a clear commit history and versioning. As long as you refer to the module via remote repository, this is probably the easiest setup to start with, and dependency setup is very clear.
However, note that this approach would mean you'd need to manage multiple repositories, and making go.work to help is going to require appropriate local directory mapping, which can be difficult for someone new to the code base.
Commit based versioning
It is still possible to deterministically define dependency with version information so that you can build your code, within a single repository. Commit based approach requires least step, and still works nicely. There are some catches to be noted, though.
For github.com/name/project to have a dependency for github.com/name/project/players, you need to ensure the code you need is in the remote repository. This is because github.com/name/project will pull the code and commit information from the remote repository, even if the same code is available on your local copy of the repository. This ensures that the version of github.com/name/project/players is taken from the commit reference, such as v0.1.1-0.20220418015705-5f504416395d (ref: details of "pseudo-version")
The module name must match up the directory structure. For example, if you have the single repository github.com/name/project, and module under /src/mymodule/, the module name must be github.com/name/project/src/mymodule. This is because when module path resolution takes place, Go finds the root of repository (in the above example, this would be github.com/name/project.git), and then tries to follow the directory path based on the module name.
If you are working in a private repository, you will need to ensure go.sum check doesn't block you. You can simply use GOPRIVATE=github.com/name/project to specify paths you don't want the checksum verification to be skipped.
Tag based versioning
Instead of using the commit SHA, you can use Git tags.
But because there could be many modules in one repository, Go Module needs to find which tag maps to which. For example, with the following directory structure:
# All assumed to be using `github.com/name/project` prefix before package name
mypackage/ # v1.0.0
anotherpackage/ # v0.5.1
nested/dependency/ # v0.8.3
You will need to create tags in github.com/name/project, named exactly to match the directory structure, such that:
mypackage/v1.0.0
anotherpackage/v0.5.1
nested/dependency/v0.8.3
This way, each tag is correctly referenced by Go Module, and your dependency can be kept deterministic.
go.work Behaviour
If you have go.work on a parent directory with go work use github.com/name/project/players, etc., that takes precedence and uses the local files. This is even when you have a version specified in your go.mod.
For local development, which spans across multiple projects, Go Workspace is a great way to work on multiple things at once, without needing to push the code change for the dependency only first. But at the same time, actual release will still require broken up commits, so that first commit can be referenced later in other code change.
go.work is said to be a file you rarely need to commit to the repository. You must be aware of what the impact of having go.work in parent paths would be, though.
--
References:
https://go.dev/doc/modules/managing-source: Discussion around repository setup
https://go.dev/ref/mod: Go Modules Reference
Side Note:
I have given a talk about this at Go Conference, hosted in Japan - you can find some demo code, slides, etc. here if you are curious to know more with examples.
In 2022, the best practice approach to having multiple modules within the same repository and importing them in other modules.
This is supported with a new "go module workspace".
Released with Go 1.18 and the new go work command.
See "Proposal: Multi-Module Workspaces in cmd/go" and issue 45713:
The presence of a go.work file in the working directory or a containing directory will put the go command into workspace mode.
The go.work file specifies a set of local modules that comprise a workspace.
When invoked in workspace mode, the go command will always select these modules and a consistent set of dependencies.
go.work file:
go 1.18
directory (
./baz // foo.org/bar/baz
./tools // golang.org/x/tools
)
replace golang.org/x/net => example.com/fork/net v1.4.5
You now have CL 355689
cmd/go: add GOWORK to go env command
GOWORK will be set to the go.work file's path, if in workspace mode
or will be empty otherwise.
I have started learning go (1.7.4) and have a project which currently produces two executables. I have a directory structure as below following the standard go layout:
GOPATH=`pwd`
bin
src/
src/<project1>
src/<project1>/vendor
src/<project1>/glide.yaml
src/<project2>
src/<project2>/vendor
src/<project2>/glide.yaml
pkg/
Project 1 and project 2 share a lot of dependencies.
Is there a way to share the vendor directory between project1 and project2 and still pin the versions to ensure reproducible builds?
I don't want to duplicate the glide.yaml and vendor directories for each project as it bloats the build and violates DRY.
The pkg directory is the obvious the way to do this but unlike vendor I don't have a dependency manager tool like glide to ensure a specific version is used (see also my related question).
A possibly related issue is how this project is organised. I believe in go it would be more conventional for each project sub-directory to map to a single github repository. However, for my project I want to build at least two executables. I realise you can do this by using different package names but it confuses go and glide. I wrestled with getting this to work under a single project and decided/discovered it was easier to use the standard go layout and work two levels up. For example, an advantage is that "go build" etc. in the subdirectories just works without having to name the package. I can also have my build, test and package machinery at the top level operate on all projects and keep my go environment separate from any others.
The programs are not complex enough to warrant separate git repositories (even as submodules). If there is a pattern that makes this work it might render my original question moot.
It should be possible to have a shared vendor directory. The way I am doing it involves Go 1.11 and the new Go feature called modules. But I am pretty sure it should work with vendor and tools like glide and dep. To use dep/glide your directory structure might looks like this
- src
- projects
- project1
- project2
- vendor
- Glide.yaml
And you can build it either from the projects folder using go build -o p1 project1/*.go or from individual project folder using go build
The same structure, but outside of GOPATH will work for Go 1.11 modules. You would have to set the GO111MODULE variable to "on" or "auto". Mind you that go modules store dependencies in some other location and download them automatically during the build process when needed.
Note: glide github page recommends switching to dep as the more official tool
Edit: Just tested it with dep. It works for me.
I recommend look at new vendoring system - https://github.com/golang/go/wiki/Modules
It allows you to fix versions of packages used:
module github.com/my/thing
require (
github.com/some/dependency v1.2.3
github.com/another/dependency/v4 v4.0.0
)
If you are using a module in multiple projects in IntelliJ IDEA is it common practice to make a new project for these modules or to make the modules in one of the projects they are included in?
It depends on how you're working, whether the common module is developed by you or another team, and other factors. Often, common code is put in a module which produces a jar that the other modules depend upon, and include them all in the same project. This is normal when you're developing the common-code module at the same time as the project and changes are typically committed together (although not necessarily to the same repository or branch). If multiple projects use the common-code module though, and it's developed as a separate library, then it should maybe have a project of its own.
Also, you should probably be using maven when things get this complicated.
Context:
I have a multimodules maven project that looks like:
Root ---- ModuleA
ModuleB
ModuleC
ModuleD
.......
They are around 25 modules under the Root:
A few of them represent the core of the application (5 modules)
And each of the remaining modules represent the business processes implementation related to a type a customers. These modules are completely independant among each others.
When packaging or releasing the 'Root' project, the artifact generated is a single ZIP file aggregating all the JARs related to 'Root' modules.
The single ZIP file is generated according to an assembly descriptor, it represents the delivery artifact.
At deployment time on the target environment, the single ZIP is unziped under a directory where it is consumed (class loaded) by an 'engine', a java web application that provides the final services.
Constraints
The 'business constraints' from one side,
And the willing to reduce regressions between different versions on
the other side
The above constraints lead us to adopt the following release scenarios:
Either we release the Root and ALL its submodules. It means that
the resulting ZIP will aggegate all the submodules JAR with the same
version. The ZIP will contain something similar to:
[ModuleA-1.1.jar, ModuleB-1.1.jar, ModuleC-1.1.jar, ModuleD-1.1.jar,
......., ModuleX-1.1.jar].
Or we release the Root and A FEW of its submodules, the ones that we want to re update.
The resulting ZIP will aggegate all the submodules JAR : The released submodules will be aggregated with the last released versions, the unreleased submodules will be aggregated with another 'appropriate' version.
For example, if we made a such incremental release, the ZIP will contain something similar to [ModuleA-1.2.jar, ModuleB-1.1.jar, ModuleC-1.2.jar, ModuleD-1.1.1.jar,
......., ModuleX-1.1.2.jar].
These 2 scenarios were made possible by:
Either declaring the modules as MAVEN MODULES 'module' for the first
scenario
Or declaring the modules as MAVEN DEPENDENCIES 'dependency' for
the second scenario, the INCREMENTAL scenario.
Question
Both scenarios are working perfectly BUT when we are in the 2nd scenario (INCREMENTAL), the maven-release-plugin:prepare is uploading to the SCM (svn) all the modules [ModuleA, ModuleB, ModuleD, .... ModuleX], it is uploading the released and the non released ones, whereas the 'non released modules' are declared as 'dependency' in the pom and not as a 'module'.
1/ IS THERE a way to avoid uploading the 'non released' modules ? Is there a way to inject an 'exlcude directrory list' to SCM svn provider ?
2/ A MORE global question, does the approach used is a correct one ? Or is it an anti pattern usage ? In that case, what should be the alternative ?
Thank you.
To me, your approach looks like an antipattern. I recommend to only have projects in the same hierarchy that you want to release together. Projects with a different release lifecycle should live on their own - otherwise you will keep running into the issues you mentioned. If you run the release plugin from a root directory (multi-module setup), all of the content of that root directory will be tagged in SVN.
In your case, I would probably create the following hierarchies:
Core
One per customer type
Potentially one per type to bundle them (zip), depending on your structure
I would group it by the way you create the release. It might mean that you have to run the release plugin a couple of times instead of just once when you make a change e.g. in Core, but it will be a lot cleaner.
Your packaging project will then pull in all of the dependencies and package/assemble them.
If you have common configuration options, I recommend to put them into a common parent pom. This doesn't have to be your root (multi-module) pom.
Did you try to run the maven-release-plugin with -r argument + the list of all modules you want to release?
Basically, this argument allows you to specify the list of modules against which the maven command should be performed. (if you omit it: all submodules will be included, this the default behavior)
See more details about this command line here.
I never try to use it with the maven-release-plugin, and I don't know if it will work, especially regarding SCM operations.
We are working on a very large VS project.
We want the project structure to "hint" developers on the logical components and design.
For this purpose, which is best:
One project with many subfolders and namespaces
Split to multiple projects based on logical grouping of classes. Have all projects in the same solution with solution folders.
Same as #2 but have multiple solutions instead of a single with subfolders.
My projects are huge.
We separate each "module" in different assemblies, creating Class Libraries. Something like this:
Client.ProjectName (Solution)
Client (Class Library)
- SectionHandler...
- ComponentModels...
- Utilities...
Client.Web (Class Library)
- Handelrs
- Extenders
Client.Net (Class Library)
- MailQueue
Client.Blog.WebControls.UI (Class Library)
- TopContent.ascx
- PostsList.ascx
Client.News.WebControls.UI (Class Library)
- TopContent.ascx
- PostsList.ascx
Client.Website
Each Class Library is a project under the solution Client.ProjectName or under some other shared solution.
The file system looks like this:
Client
|- Framework
|- Client
|- files...
|- Client.Web
|- files...
|- Client.Net
|- files...
|- SolutionName
|- Client.Blog.WebControls.UI
|- Client.News.WebControls.UI
|- Website
Shared client libs goes immediately under the Client\Framework folder, it is meant to be used on all projects for this client. Specific projects goes under the solution. We also have a folder called Company where we keep projects that can be used in any other project for any client, it is like the company framework.
The solutions we use:
One for the company framework
One for a client framework
One for each client solution
The same project can be referenced in multiple solutions, so you don't necessarily need to create all those solutions.
With this format we could use a lot of things on other projects simply referencing a DLL. Without this structure some projects wouldn't be possible in the given time.
Solutions are just containers for projects, so it's really the splitting of the projects that is in question.
I would recommend using a different project (AKA class library or assembly) for each major functional area. You may still want to use different namespaces within each project, but separating the major functional areas into different assemblies will make each assembly smaller. Therefore, if you need to use only one or two functions in an application, you only reference those two projects instead of the one massive project. This will make for smaller applications that compile faster and have less overhead.
In terms of solutions, you can organize those however you want because like I said, they are only containers. You may want to put them all in one solution...or maybe each in a separate solution...or maybe put related projects into solutions. Personally, I either use one solution, or for large projects, I use a "master" solution so I can easily compile everything in one shot and individual solutions so I can work on projects individually.
A project should be your "atom" of re-use. Or to put it another way, projects are the granularity of reusable code. It's OK to have interdependent projects but each project should be planned to be useful for its own functionality.
A solution is just whatever collection of projects you need for development / build / test. You could have multiple solutions that specify different subsets of projects.
Folders within a project may help but they could be an indication that your project is getting too large.
Solution folders likewise mean your solution is probably getting too large. Can you divide your codebase into multiple solutions, each with a meaningful and testable output artifact? Solutions can depend on (tested) artifacts from other solutions, just as they do on third party libraries etc.
You should also consider how VS and solutions projects map to the granularity of projects on your version control schema and any branch/merge policies you have.
I have grown to prefer a single solution with subfolders for the key domains, and add the projects in those. It's easy to browse, and gives a rough idea to your devs as to what goes where.
Having multiple solutions is mostly useful if the integrations between the components in eigther solution is loose, so each team has its work solution, and tests against released components from the other teams' solution.