Facilitating git merges with multi-package go modules - go

We have two versions of a go module hosted by different people. Let's call them example.org/devproject and company.com/project. Different programs obviously import the two projects by different import paths, as appropriate, depending on whether they want the version maintained by example.org or company.com. Because the modules have numerous internal packages, the source files themselves contain references to their own module's name in many import statements.
My question is how example.org can make it as easy as possible for company.com to pull from our git repository, because right now a straight git pull messes up the import statements. Note that stack overflow has similar questions on this prior to go modules. However, this question is specifically about projects with go.mod files using go 1.11 or later, so please do not refer me to non-module answers or suggest not using go modules yet, because that is a different question.
I've tried the following that unfortunately do not work super well:
Use some abstract module name like module self in the go.mod file and add replace self => example.org/devproject. Then internal packages can just import "self/subpackage", and only the one replace line needs to change between repositories. However, that means all including projects need to use a replace directive, and we'd need to convince company.com to change all of their imports relative to "self", which isn't super likely. More importantly, this totally breaks go get and is probably a bad idea.
Use semi-automated scripts to maintain a git branch pull-me that is always exactly one commit ahead of master but has all the import paths and go.mod file fixed up for company.com. This is what we currently do, but it's annoying because the scripts don't know how to parse go source code, so just blindly replace example.org/devproject with company.com/project everywhere, meaning there is no guarantee that just because master works pull-me will, too. We somehow can't get gomvpkg working in a go module outside of a gopath (it will change package statements, but not the imports, which are the tricky part, and won't recurse into all the subpackages).
Create a fake gopath, copy the master branch into it, use gomvpkg to move stuff, then copy the files back with a revised import path, then delete the fake gopath. I can't 100% rule this out, but I gave up after gomvpkg just kept refusing to do the right thing and I got frustrated that something so conceptually simple was so difficult to make work in practice.
Are there any other solutions we should be considering? Are there any tools like govmpkg that understand go modules and will rewrite import paths without also trying to move stuff around in a gopath?
As is probably clear, the dynamics here are that example.org is much more motivated to get stuff upstreamed than company.com is to pull from us, so ideally most of the hoop jumping would happen on the example.org side, and there would maybe be a one-time simple non-intrusive request to company.com to do something to make this work.

Related

gentoo: how delete all config files on unmerging package (from its ebuild)

I am making my own personal package to have collection of usefull programs and configs. Main idea is to emerge this package and have system prepared for my prefferencies. Mainly it works (it simply depends on all my favourite programs), but I have two problems here:
how to install USE flags, UNMASK and such before affected programs are installed?
how to uninstall it (emerge --unmerge does NOT delete files in /etc, so even after uninstalling the package the USE flags (and others) are still kept - my intent is to REMOVE them, so next rebuild of world would NOT use them anymore - yes it means a lot of programs would lose some functionalities like support for some languages, support for some other programs and so on, it is desired result)
My solutions so far are:
The package have some files in /etc/portage/package.*
1.1. I emerge that package with --nodeps (so the config files are installed)
1.2. I emerge it again without that flag (so dependencies are installed
with right configuration))
I create (and install) script to parse /var/db/packages for my package CONTENTS and delete all /etc/portage/something files "manually" and I have to rum this script before unmerging the package
Is there better way to do it ?
You just doing/understanding it wrong! (sorry :)
First of all, instead of a metapackage (an empty ebuild that have only runtime dependencies) there is other ways:
use sets to describe your preferred packages. Manage your USE flags in a usual way (including per package USE if needed).
medium complexity solution is to write a metapackage ebuild (your current case) -- but, you can't mask/unmask USE flags anyway…
if you already have your overlay (obviously) -- defining your own profile would solve everything! Here you can manage everything just like you want: mask/unmask any USE flags, define what is system predefined package means for you, & etc…
Unfortunately, I don't use Gentoo portage (and emerge) and have no idea if it's possible to have multiple additive profiles. I have my own profiles here and it works perfectly with Paludis.
Second, never remove any configuration files (config-protected) after uninstall! There is no packages that do that, and there is a bunch of reasons for that… The main one is that user may have them modified and don't want to loose his changes. Moreover, personally I prefer to have all configs that I've ever touched to be in a dedicated VCS repo -- it wouldn't be nice, if someone, except me, would remove smth…
Imagine a real life example: user wants to reinstall some package and he has a bunch of configuration files, he spent some time to carefully edit them. Trivial way is to uninstall and then install again -- Oops! He lost his configs!
Moreover, from ebuild's POV, you have pkg_prerm and pkg_postrm functions, but both of them are called even at upgrade time (i.e. when unmerge followed by immediate merge phase). You have to be really careful to distinct that use cases… And what is more scare, having any "hardcoded" (and unique) rules in any package, you don't have any influence on them…
So, please, never remove any config protected files, let the user to take care of them (he is the boss, not a package manager)…
Update: If you really want to be able to remove some config-protected files, setting up your own profile looks even more better idea. You can set CONFIG_PROTECT_MASK to enforce unprotect files and/or directories. In that way you don't need to modify any ebuilds and/or write an ugly cleanup code.

Implications of not using repo path for my own packages

Suppose I decide to keep all personally developed packages organized
as follows:
$GOPATH/
bin/
pkg/
src/
somepkg1
somepkg2
...
somepkgN
Further, suppose there is a great deal of code reuse among them, so I
decide to keep the whole $GOPATH workspace under the same Git
repository (each package could be a submodule), as opposed to more
traditional scenario where subpackages are less coherent (co-existing
solely because of using go get from the same workspace):
$GOPATH/
bin/
pkg/
src/github.com/<me>/
somepkg1
somepkg2
...
somepkgN
I can see that with the former approach (not using github.com/<me>/
in the package paths), go get would not be able to fetch packages as
they are not "declaring" themselves to be available online. However,
one can easily work around that by using git submodules, so all
packages would be fetched in the first place (note it's a tightly
controlled ecosystem so there will be no name clashes).
Is there any other limitation (besides go get) of not using full
paths for packages?
(I am mostly concerned about limitations arising from certain code
refactoring/analysis tools that exploit the repository path as base path convention that
allows go get to look for the package online.)
For the Go compiler and all elements of the go tool except go get the package import path is an almost opaque string containing the import path. You can lay out your code like you want (the compiler itself happy compiles files from different folders into one package). If you don't need or want your code to be go getable there is no need to use a repo path. The analysis and refactoring tools in golang.org/x/tools work on the opaque import paths (as far as I know) and do not access the network.

Is it better to have golang statements go out to github or have relative path and why?

In golang import statements, is it better to have something like:
import(
'project/package'
)
or
import(
'github.com/owner/project/package'
)
for files local to your project?
For either option, why would you want to do one over the other?
I ask this because while it's easy and more intuitive to do the first one, I've also seen a lot of big projects like Kubernetes and Etcd do it the second way.
These are exactly the same as far as the go build tools are concerned. The fact that package is in the directory
$GOPATH/src/github.com/owner/project/package
or in the directory
$GOPATH/src/project/package
makes no difference.
The only difference is that with the former you can use go get to fetch the source automatically, and the latter you have to clone the code yourself.
What you don't want to use are paths relative to the project itself, like
import "./project/package"
That is not compatible with all the go tools, and is highly discouraged.

how to handle go import absolute paths and github forks?

There are plenty of questions around this, including why you shouldn't use import "./my/path" and why it only works because some legacy go code requires it.
If this is correct, how do you handle encapsulation of a project and by extension github forks? In every other lang, I can do a github fork of a project, or git clone, and everything is encapsulated there. How do I get the same behaviour out of a go project?
Simple example using the go "hello world" example.
hello.go
package main
import ("fmt"
"github.com/golang/examples/stringutil")
func main() {
fmt.Printf(stringutil.Reverse("hello, world")+"\n")
}
The above works great. But if I want to use my own stringutil which is in a subdirectory and will compile to a single binary, I still need the complete path:
package main
import ("fmt"
"github.com/myrepo/examples/util/stringutil")
func main() {
fmt.Printf(stringutil.Reverse("hello, world")+"\n")
}
Now, if someone copies or forks my repo, it has a direct dependency on "github.com/myrepo/", even though this is used entirely internally!
What if there are 20 different files that import utils/? I need to change each one each time someone forks? That is a lot of extraneous changes and a nonsensical git commit.
What am I missing here? Why are relative paths such a bad thing? How do I fork a project that refers to its own subsidiary directories (and their packages) without changing dozens of files?
As for the reasoning behind not allowing relative imports, you can read this discussion for some perspective: https://groups.google.com/forum/#!msg/golang-nuts/n9d8RzVnadk/07f9RDlwLsYJ
Personally I'd rather have them enabled, at least for internal imports, exactly for the reason you are describing.
Now, how to deal with the situation?
If your fork is just a small fix from another project that will probably be accepted as a PR soon - just manually edit the git remotes for it to refer to your own git repo and not the original one. If you're using a vendoring solution like godep, it will work smoothly since saving it will just vendor your forked code, and go get is never used directly.
If your fork is a big change and you intend to remain forked, rewrite all the import paths. You can automate it with sed or you can use gofmt -r that supports rewriting of the code being formatted.
[EDIT] I also found this tool which is designed to help with that situation: https://github.com/rogpeppe/govers
I've done both 1 and 2 - when I just had a small bugfix to some library I just changed the remote and verndored it. When I actually forked a library without intent of merging my changes back, I changed all the import paths and continued to use my repo only.
I can also think of an addition to vendoring tools allowing automation of this stuff, but I don't think any of them support it currently.

Whats a good best practice with Go workspaces?

I'm just getting into learning Go, and reading through existing code to learn "how others are doing it". In doing so, the use of a go "workspace", especially as it relates to a project's dependencies, seems to be all over the place.
What (or is there) a common best practice around using a single or multiple Go workspaces (i.e. definitions of $GOPATH) while working on various Go projects? Should I be expecting to have a single Go workspace that's sort of like a central repository of code for all my projects, or explicitly break it up and set up $GOPATH as I go to work on each of these projects (kind of like a python virtualenv)?
I think it's easier to have one $GOPATH per project, that way you can have different versions of the same package for different projects, and update the packages as needed.
With a central repository, it's difficult to update a package as you might break an unrelated project when doing so (if the package update has breaking changes or new bugs).
I used to use multiple GOPATHs -- dozens, in fact. Switching between projects and maintaining the dependencies was a lot harder, because pulling in a useful update in one workspace required that I do it in the others, and sometimes I'd forget, and scratch my head, wondering why that dependency works in one project but not another. Fiasco.
I now have just one GOPATH and I actually put all my dev projects - Go or not - within it. With one central workspace, I can still keep each project in its own git repository (src/<whatever>) and use git branching to manage dependencies when necessary (in practice, very seldom).
My recommendation: use just one workspace, or maybe two (like if you need to keep, for example, work and personal code more separate, though the recommended package path naming convention should do that for you).
If you just set GOPATH to $HOME/go or similar and start working, everything works out of the box and is really easy.
If you make lots of GOPATHs with lots of bin dirs for lots of projects with lots of common dependencies in various states of freshness you are, as should be quite obvious, making things harder on yourself. That's just more work.
If you find that, on occasion, you need to isolate some things, then you can make a separate GOPATH to handle that situation.
But in general, if you find yourself doing more work, it's often because you're choosing to make things harder.
I've got what must be approaching 100 projects I've accumulated in the last four years of go. I almost always work in GOPATH, which is $HOME/go on my computers.
Using one GOPATH across all of your projects is very handy, but I find this to only be the case for my own personal projects.
I use a separate GOPATH for each production system I maintain because I use git submodules in each GOPATH's directory tree in order to freeze dependencies.
So, something like:
~/code/my-project
- src
- github.com
+ dependency-one
+ dependency-two
- my-org
- my-project
* main.go
+ package-one
+ package-two
- pkg
- bin
By setting GOPATH to ~/code/my-project, then it uses the dependency-one and dependency-two git submodules within that project instead of using global dependencies.
Try envirius (universal virtual environments manager). It allows to compile any version of go and create any number of environments based on it. $GOPATH/$GOROOT are depend on each particular environment.
Moreover, it allows to create environments with mixed languages (for example, python & go in one environment).
At my company I created Virtualgo to make managing multiple GOPATHs super easy. A couple of advantages over handling it manually are:
Automatic switching to the correct GOPATH when you cd to a project.
It integrates well with vendoring tools
It also sets the new GOBIN in your path, so you can use the executables installed there.
It still has your original GOPATH as a backup. If a package is not found in the project specific workspace it will search the main GOPATH.
One workspace + godep is best as for me.
I follow KISS - one GOPATH, two go paths:
export GOPATH=$HOME/go:$HOME/development/go
That way third party stuff goes in a central place (package install uses the first path entry by default), and I can flexibly move my projects elsewhere, at the second path entry.
You might want to try the direnv package.
https://direnv.net/
Just use GoSwitch. Saves a heck of a lot of time and sanity.
Add the script to the root of each of your projects and source it.
It will make that project dir your gopath and also add/removes the exact bin folder of that project to path.
https://github.com/buffonomics/goswitch

Resources