Is it recommended to keep a program sources (as opposed to lib sources) in a single file? - go

I am making my first steps into Go and obviously am reasoning from what I'm used to in other languages rather than understanding go specificity and styles yet.
I've decided to rewrite a ruby background job I have that takes ages to execute. It iterates over a huge table in my database and process data individually for each row, so it's a good candidate for parallelization.
Coming from a ruby on rails task and using ORM, this was meant to be, as I thought of it, a quite simple two files program: one that would contain a struct type and its methods to represent and work with a row and the main file to operate the database query and loop on rows (maybe a third file to abstract database access logic if it gets too heavy in my main file). This file separation as I intended it was meant for codebase clarity more than having any relevance in the final binary.
I've read and seen several things on the topic, including questions and answers here, and it always tends to resolve into writing code as libraries, installing them and then using them into a single file source (in package main) program.
I've read that one may pass multiple files to go build/run, but it complains if there is several package name (so basically, everything should be in main) and it doesn't seem that common.
So, my questions are :
did I get it right, and having code mostly as a library with a single file program importing it the way to go?
if so, how do you deal with having to build libraries repeatedly? Do you build/install on each change in library codebase before executing (which is way less convenient that what go run promise to be) or is there something common I don't know of to execute library dependent program quick and fast while working on those libraries code?

No.
Go and the go tool works on packages only (just go run works on files, but that is a different story): You should not think about files when organizing Go code but packages. A package may be split into several files, but that is used for keeping test code separated and limiting file size or
grouping types, methods, functions, etc.
Your questions:
did I get it right, and having code mostly as a library with a single file program
importing it the way to go?
No. Sometimes this has advantages, sometimes not. Sometimes a split may be one lib + one short main,
in other cases, just one large main might be better. Again: It is all about packages and never about files. There is nothing wrong with a single 12 file main package if this is a real standalone program. But maybe extracting some stuff into one or a few other packages might result in more readable code. It all depends.
if so, how do you deal with having to build libraries repeatedly? Do you build/install on each change in library codebase before executing (which is way less convenient that what go run promise to be) or is there something common I don't know of to execute library dependent program quick and fast while working on those libraries code?
The go tool tracks the dependencies and recompiles whatever is necessary. Say you have a package main in main.go which imports a package foo. If you execute go run main.go it will recompile package foo transparently iff needed. So for quick hacks: No need for a two-step go install foo; go run main. Once you extract code into three packages foo, bar, and waz it might be a bit faster to install foo, bar and waz.

No. Look at the Go commands and Go standard packages for exemplars of good programming style.
Go Source Code

Related

go/packages.Load() returns different types.Named for identical code

I'm trying to determine if two types are identical with go/types.Identical, and suprisingly enough, types of the same piece of code returned by different packages.Load calls are always different.
Am I making a wrong assumption on those APIs?
package main
import (
"fmt"
"go/types"
"golang.org/x/tools/go/packages"
)
func getTimeTime() *types.Named {
pkgs, err := packages.Load(&packages.Config{
Mode: packages.NeedImports | packages.NeedSyntax | packages.NeedTypes | packages.NeedDeps | packages.NeedTypesInfo,
Overlay: map[string][]byte{
"/t1.go": []byte(`package t
import "time"
var x time.Time`),
},
}, "file=/t1.go")
if err != nil {
panic(err)
}
for _, v := range pkgs[0].TypesInfo.Types {
return v.Type.(*types.Named) // named type of time.Time
}
panic("unreachable")
}
func main() {
t1, t2 := getTimeTime(), getTimeTime()
if !types.Identical(t1, t2) {
fmt.Println(t1, t2, "are different")
}
}
Apparently, there is a piece of hidden doc explaining all these (it attaches to nothing, so it's not on godoc): https://cs.opensource.google/go/x/tools/+/master:go/packages/doc.go;l=75
Motivation and design considerations
The new package's design solves problems addressed by two existing
packages: go/build, which locates and describes packages, and
golang.org/x/tools/go/loader, which loads, parses and type-checks
them. The go/build.Package structure encodes too much of the 'go
build' way of organizing projects, leaving us in need of a data type
that describes a package of Go source code independent of the
underlying build system. We wanted something that works equally well
with go build and vgo, and also other build systems such as Bazel and
Blaze, making it possible to construct analysis tools that work in all
these environments. Tools such as errcheck and staticcheck were
essentially unavailable to the Go community at Google, and some of
Google's internal tools for Go are unavailable externally. This new
package provides a uniform way to obtain package metadata by querying
each of these build systems, optionally supporting their preferred
command-line notations for packages, so that tools integrate neatly
with users' build environments. The Metadata query function executes
an external query tool appropriate to the current workspace.
Loading packages always returns the complete import graph "all the way
down", even if all you want is information about a single package,
because the query mechanisms of all the build systems we currently
support ({go,vgo} list, and blaze/bazel aspect-based query) cannot
provide detailed information about one package without visiting all
its dependencies too, so there is no additional asymptotic cost to
providing transitive information. (This property might not be true of
a hypothetical 5th build system.)
In calls to TypeCheck, all initial packages, and any package that
transitively depends on one of them, must be loaded from source.
Consider A->B->C->D->E: if A,C are initial, A,B,C must be loaded from
source; D may be loaded from export data, and E may not be loaded at
all (though it's possible that D's export data mentions it, so a
types.Package may be created for it and exposed.)
The old loader had a feature to suppress type-checking of function
bodies on a per-package basis, primarily intended to reduce the work
of obtaining type information for imported packages. Now that imports
are satisfied by export data, the optimization no longer seems
necessary.
Despite some early attempts, the old loader did not exploit export
data, instead always using the equivalent of WholeProgram mode. This
was due to the complexity of mixing source and export data packages
(now resolved by the upward traversal mentioned above), and because
export data files were nearly always missing or stale. Now that 'go
build' supports caching, all the underlying build systems can
guarantee to produce export data in a reasonable (amortized) time.
Test "main" packages synthesized by the build system are now reported
as first-class packages, avoiding the need for clients (such as
go/ssa) to reinvent this generation logic.
One way in which go/packages is simpler than the old loader is in its
treatment of in-package tests. In-package tests are packages that
consist of all the files of the library under test, plus the test
files. The old loader constructed in-package tests by a two-phase
process of mutation called "augmentation": first it would construct
and type check all the ordinary library packages and type-check the
packages that depend on them; then it would add more (test) files to
the package and type-check again. This two-phase approach had four
major problems: 1) in processing the tests, the loader modified the
library package, leaving no way for a client application to see
both the test package and the library package; one would mutate
into the other. 2) because test files can declare additional methods
on types defined in the library portion of the package, the
dispatch of method calls in the library portion was affected by the
presence of the test files. This should have been a clue that the
packages were logically different. 3) this model of "augmentation"
assumed at most one in-package test per library package, which is
true of projects using 'go build', but not other build systems. 4)
because of the two-phase nature of test processing, all packages that
import the library package had to be processed before augmentation,
forcing a "one-shot" API and preventing the client from calling Load
in several times in sequence as is now possible in WholeProgram mode.
(TypeCheck mode has a similar one-shot restriction for a different
reason.)
Early drafts of this package supported "multi-shot" operation.
Although it allowed clients to make a sequence of calls (or concurrent
calls) to Load, building up the graph of Packages incrementally, it
was of marginal value: it complicated the API (since it allowed some
options to vary across calls but not others), it complicated the
implementation, it cannot be made to work in Types mode, as explained
above, and it was less efficient than making one combined call (when
this is possible). Among the clients we have inspected, none made
multiple calls to load but could not be easily and satisfactorily
modified to make only a single call. However, applications changes may
be required. For example, the ssadump command loads the user-specified
packages and in addition the runtime package. It is tempting to
simply append "runtime" to the user-provided list, but that does not
work if the user specified an ad-hoc package such as [a.go b.go].
Instead, ssadump no longer requests the runtime package, but seeks it
among the dependencies of the user-specified packages, and emits an
error if it is not found.

Splitting client/server code

I'm developing a client/server application in golang, and there are certain logical entities that exist both on client and server(the list is limited)
I would like to ensure certain code for this entities is included ONLY in the server part but NOT in the client(wise versa is nice, but not so important).
The naive thought would be to rely on dead code elimination, but from my brief research it's not a reliable way to handle the task... go build simply won't eliminate dead code from the fact that it may have been used via reflection(nobody cares that it wasn't and there is no option to tune this)
More solid approach seems to be splitting code in different packages and import appropriately, this seems reliable but over-complicates the code forcing you to physically split certain entities between different packages and constantly keep this in mind...
And finally there are build tags allowing to have multiple files under the same package built conditionally for client and server
The motivation with using build tags is that I want to keep code as clean as possible without introducing any synthetic entities
Use case:
there are certain cryptography routines, client works with public key, server operates with private... Code logically belongs to the same entity
What option would you choose and why?
This "dead code elimination" is already done–partly–by the go tool. The go tool does not include everything from imported packages, only what is needed (or more precisely: it excludes things that it can prove unreachable).
For example this application
package main; import _ "fmt"; func main() {}
results in almost 300KB smaller executable binary (on windows amd64) compared to the following:
package main; import "fmt"; func main() {fmt.Println()}
Excludable things include functions, types and even unexported and exported variables. This is possible because even with reflection you can't call a function or "instantiate" types or refer to package variables just by having their names as a string value. So maybe you shouldn't worry about it that much.
Edit: With Go 1.7 released, it is even better: read blog post: Smaller Go 1.7 binaries
So if you design your types and functions well, and you don't create "giant" registries where you enumerate functions and types (which explicitly generates references to them and thus renders them unexcludable), compiled binaries will only contain what is actually used from imported packages.
I would not suggest to use build tags for this kind of problem. By using them, you'll have an extra responsibility to maintain package / file dependencies yourself which otherwise is done by the go tool.
You should not design and separate code into packages to make your output executables smaller. You should design and separate code into packages based on logic.
I would go with separating stuffs into packages when it is really needed, and import appropriately. Because this is really what you want: some code intended only for the client, some only for the server. You may have to think a little more during your design and coding phase, but at least you will see the result (what actually belongs / gets compiled into the client and into the server).

How To Structure Large OpenCL Kernels?

I have worked with OpenCL on a couple of projects, but have always written the kernel as one (sometimes rather large) function. Now I am working on a more complex project and would like to share functions across several kernels.
But the examples I can find all show the kernel as a single file (very few even call secondary functions). It seems like it should be possible to use multiple files - clCreateProgramWithSource() accepts multiple strings (and combines them, I assume) - although pyopencl's Program() takes only a single source.
So I would like to hear from anyone with experience doing this:
Are there any problems associated with multiple source files?
Is the best workaround for pyopencl to simply concatenate files?
Is there any way to compile a library of functions (instead of passing in the library source with each kernel, even if not all are used)?
If it's necessary to pass in the library source every time, are unused functions discarded (no overhead)?
Any other best practices/suggestions?
Thanks.
I don't think OpenCL has a concept of multiple source files in a program - a program is one compilation unit. You can, however, use #include and pull in headers or other .cl files at compile time.
You can have multiple kernels in an OpenCL program - so, after one compilation, you can invoke any of the set of kernels compiled.
Any code not used - functions, or anything statically known to be unreachable - can be assumed to be eliminated during compilation, at some minor cost to compile time.
In OpenCL 1.2 you link different object files together.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Overhead for calling a procedure/function in another Oracle package

We're discussing the performance impact of putting a common function/procedure in a separate package or using a local copy in each package.
My thinking is that it would be cleaner to have the common code in a package, but others worry about the performance overhead.
Thoughts/experiences?
Put it in one place and call it from many - that's basic code re-use. Any overhead in calling one package from another will be minuscule. If they still doubt it, get them to demonstrate the performance difference.
The worriers are perfectly at liberty to prove the validity of their concerns by demonstrating a performance overhead. that ought to be trivial.
Meanwhile they should consider the memory usage and maintenance overhead in repeating code in multiple places.
Common code goes in one package.
Unless you are calling a procedure in a package situated on a different data base over a DB link, the overhead of calling a procedure in another package is negligible.
There are some performance concerns, as well as memory concerns, but they are rare and far between. Besides, they fall into "Oracle black magic" category. For example, check this link. If you can clearly understand what that is about, consider yourself an accomplished Oracle professional. If not - don't worry, because it's really hardcore stuff.
What you should consider, however, is the question of dependencies.
Oracle package consists of 2 parts: spec and body:
Spec is a header, where public procedures and functions (that is, visible outside the package) are declared.
Body is their implementation.
Although closely connected, they are 2 separate database objects.
Oracle uses package status to indicate if the package is VALID or INVALID. If a package becomes invalid, then all the other packages
that depend on it become invalid too.
For example, If you programme calls a procedure in package A, which calls a procedure in package B, that means that
you programme depends on package A, and package A depends on package B. In Oracle this relation is transitive and that means that
your programme depends on package B. Hence, if package B is broken, your programme also brakes (terminates with error).
That should be obvious. But less obvious is that Oracle also tracks dependencies during the compile time via package specs.
Let's assume that the specs and bodies for both of your package A and package B are successfully compiled and valid.
Then you go and make a change to the package body of package B. Because you only changed the body, but not the spec,
Oracle assumes that the way package B is called have not changed and doesn't do anything.
But if along with the body you change the package B's spec, then Oracle suspects that you might have changed some
procedure's parameters or something like that, and marks the whole chain as invalid (that is, package B and A and your programme).
Please note that Oracle doesn't check if the spec is really changed, it just checks the timestemp. So, it's enough just to recomplie the spec to invalidate everything.
If invalidation happens, next time you run you programme it will fail.
But if you run it one more time after that, Oracle will recompile everything automatically and execute it successfully.
I know it's confusing. That's Oracle. Don't try to wrap your brains too much around it.
You only need to remember a couple of things:
Avoid complex inter-package dependencies if possible. If one thing depends on the other thing, which depends on one more thing and so on,
then the probability of invalidating everything by recompiling just one database object is extremely high.
One of the worst cases is "circular" dependencies, when package A calls a procedure in package B, and package B calls procedure in package A.
It that case it is almost impossible to compile one without braking another.
Keep package spec and package body in separate source files. And if you need to change the body only, don't touch the spec!

Resources