What is the best way to manage a large quantity of constants - oracle

I am currently working on a very complex program that processes rows from an input table and has a huge number of possible outcomes for each record. Because of this I have a very large number of constants defined for the outcome messages. There is one success message for the record, but a multitude of possible warnings and errors.
My first thought was to define all of my constants for these messages at the package body level, but then I decided to move each constant to the procedure where it is used. I'm now second guessing that decision and thinking of moving everything back to package body level. What is the best way to define this many constants? Ease of maintainability is my ultimate goal for this program since it is so complex.

I think this is a matter of taste. In my application I put all error codes into an Error-Package. All main and commonly used constants I put into a separate package (without a package body).

Again, a matter of taste, but I tend to put a list of named constants at the package spec level rather than the package body so that they can be referenced by any portion of the application. If I ever want to change the error code that c_err_for_specific_reason_x uses, it becomes a single place to do so.
If I wanted to hide the codes and put them within the body I would have a get_error_code(p_get_error_name varchar) function that did the translation based on you passing a valid constant name.
I've done both on different projects, but tend towards the list over the function most times. I tend to use the function if it a table-driven source of the data.

It ... wait for it ... depends!
Since you currently define your constants in the package body, you don't need them to be publicly accessible outside the package. So defining them in a spec really doesn't buy you anything.
Here's is the rule I follow: Define constants within the smallest scope needed. So if a constant is used only within one procedure, define it in that procedure. If it is used within more than one procedure, define it in the body. If it is used elsewhere by code in other packages (or non-packaged SPs) but only when using a particular package, define it in the spec of that package. If it is used by other code for general use, put it in a separate spec of such general constants.

Related

Common lisp best practices for splitting code between files

I'm moderately new to common lisp, but have extended experience with other 'separate compilation' languages (think C/C++/FORTRAN and such)
I know how to do an ASDF system definition. I know how to separate stuff in packages. I'm using SBCL, by the way.
The question is this: what's the best practice for splitting code (large packages) between .lisp files? I mean, in C there are include files, while lisp lives with the current image state. So with multiple files I need to handle dependencies or serial order in the system definition. But without something like forward declarations it's painful.
Simple example on what I want to do: I have, for example, two defstructs that are part of the same bigger data structure (like struct1 is a parent of some set of struct2). Some functions works on one, some other works on the other and some other use both.
So I would have: a packages.lisp, a fun1.lisp (with the first defstruct and related functions), a fun2.lisp (with the other defstruct and functions) and a funmix.lisp (with functions that use both). In an ideal world everything is sealed and compiling these in this order would be fine. As most of you know, this in practice almost never happen.
If I need to use struct2 functions from the struct1 ones I would need to either reorder or add a dependency. But then if there's some kind of back call (that can't be done with a closure) I would have struct1.lisp depending on struct2.lisp and vice-versa which is obviously not valid. So what? I could break the loop putting the defstruct in a separate file (say, structs.lisp) but what if either of the struct's function need to access the common functions in the third file? I would like to avoid style notes.
What's the common way to solve this, i.e. keeping loosely related code in the same file but still be able to interface to other ones. Is the correct solution to seal everything in a compilation unit (a single file)? use a package for every file with exports?
Lisp dependencies are simple, because in many cases, a Lisp implementation doesn't need to process the definition of something in order to compile its use.
Some exceptions to the rule are:
Macros: macros must be loaded in order to be expanded. There is a compile-time dependency between a file which uses macro and the file which defines them.
Packages: a package foo must be defined in order to use symbols like foo:bar or foo::priv. If foo is defined by a defpackage form in some foo.lisp file, then that file has to be loaded (either in source or compiled form).
Constants: constants defined with defconstant should be seen before their use. Similar remarks apply to inline functions, compiler macros.
Any custom things in a "domain specific language" which enforces definition before use. E.g. if Whizbang Inference Engine needs rules to be defined when uses of the rules are compiled, you have to arrange for that.
For certain diagnostics to be suppressed like calls to undefined functions, the defining and using files must be taken to be as a single compilation unit. (See below.)
All the above remarks also have implications for incremental recompilation.
When there is dependency like the above between files so that one is a prerequisite of the other, when the prerequisite is touched, the dependent one must be recompiled.
How to split code into files is going to be influenced by all the usual things: cohesion, coupling and what have you. Common-Lisp-specific reasons to keep certain things together in one file is inlining. The call to a function which is in the same file as the caller may be inlined. If your program supports any in-service upgrade, the granularity of code loading is individual files. If some functions foo and bar should be independently redefinable, don't put them in the same file.
Now about compilation units. Suppose you have a file foo.lisp which defines a function called foo and bar.lisp which calls (foo). If you just compile bar.lisp, you will likely get a warning that an undefined function foo has been called. You could compile foo.lisp first and then load it, and then compile bar.lisp. But that will not work if there is a circular reference between the two: say foo.lisp also calls (bar) which bar.lisp defines.
In Common Lisp, you can defer such warnings to the end of a compilation unit, and what defines a compilation unit isn't a single file, but a dynamic scope established by a macro called with-compilation-unit. Simply put, if we do this:
(with-compilation-unit
(compile-file "foo.lisp") ;; contains (defun foo () (bar))
(compile-file "bar.lisp")) ;; contains (defun bar () (foo))
If a compile-file isn't surrounded by with-compilation-unit then there is a compilation unit spanning that file. Otherwise, the outermost nesting of the with-compilation-unit macro determines the scope of what is in the compilation unit.
Warnings about undefined functions (and such) are deferred to the end of the compilation unit. So by putting foo.lisp and bar.lisp compilation into one unit, we suppress the warnings about either foo or bar not being defined and we can compile the two in any order.
Build systems use with-compilation-unit under the hood, as appropriate.
The compilation unit isn't about dependencies but diagnostics. Above, we don't have a compile time dependency. If we touch foo.lisp, bar.lisp doesn't have to be recompiled or vice versa.
By and large, Lisp codebases don't have a lot of hard dependencies among the files. Incremental compilation often means that just the affected files that were changed have to be recompiled. The C or C++ problem that everything has to be rebuilt because a core header file was touched is essentially nonexistent.
but what if
No matter how you first organize your code, if you change it significantly you are going to have to refactor. IMO there is no ideal way of grouping dependencies in advance.
As a rule of thumb it is generally safe to define generic functions first, then types, then actual methods, for example. For non-generic functions, you can cut circular dependencies by adding forward declarations:
(declaim (ftype function ...))
Having too much circular dependency is a bit of a code smell.
Is the correct solution to seal everything in a compilation unit
Yes, if you group the definitions in the same compilation unit (the same file), the file compiler will be able to silence the style notes until it reaches the end of file: at this point it knows if there are still missing references or if all the cross-references are resolved.
But then if there's some kind of back call (that can't be done with a closure)
If you have a specific example in mind please share, but typically you can define struct1 and its functions in a way that can be self-contained; maybe it can accept a map that binds event names to callbacks:
(make-struct-1 :callbacks (list :on-empty one-is-empty
:on-full one-is-full))
Similarly, struct2 can accept callbacks too (Dependency Injection) and the main struct ties them using closures (?).
Alternatively, you can design your data-structures so that they signal conditions, and the in the caller code you intercept them to bind things together.

Is there a difference between fun(n::Integer) and fun(n::T) where T<:Integer in performance/code generation?

In Julia, I most often see code written like fun(n::T) where T<:Integer, when the function works for all subtypes of Integer. But sometimes, I also see fun(n::Integer), which some guides claim is equivalent to the above, whereas others say it's less efficient because Julia doesn't specialize on the specific subtype unless the subtype T is explicitly referred to.
The latter form is obviously more convenient, and I'd like to be able to use that if possible, but are the two forms equivalent? If not, what are the practicaly differences between them?
Yes Bogumił Kamiński is correct in his comment: f(n::T) where T<:Integer and f(n::Integer) will behave exactly the same, with the exception the the former method will have the name T already defined in its body. Of course, in the latter case you can just explicitly assign T = typeof(n) and it'll be computed at compile time.
There are a few other cases where using a TypeVar like this is crucially important, though, and it's probably worth calling them out:
f(::Array{T}) where T<:Integer is indeed very different from f(::Array{Integer}). This is the common parametric invariance gotcha (docs and another SO question about it).
f(::Type) will generate just one specialization for all DataTypes. Because types are so important to Julia, the Type type itself is special and allows parameterization like Type{Integer} to allow you to specify just the Integer type. You can use f(::Type{T}) where T<:Integer to require Julia to specialize on the exact type of Type it gets as an argument, allowing Integer or any subtypes thereof.
Both definitions are equivalent. Normally you will use fun(n::Integer) form and apply fun(n::T) where T<:Integer only if you need to use specific type T directly in your code. For example consider the following definitions from Base (all following definitions are also from Base) where it has a natural use:
zero(::Type{T}) where {T<:Number} = convert(T,0)
or
(+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)
And even if you need type information in many cases it is enough to use typeof function. Again an example definition is:
oftype(x, y) = convert(typeof(x), y)
Even if you are using a parametric type you can often avoid using where clause (which is a bit verbose) like in:
median(r::AbstractRange{<:Real}) = mean(r)
because you do not care about the actual value of the parameter in the body of the function.
Now - if you are Julia user like me - the question is how to convince yourself that this works as expected. There are the following methods:
you can check that one definition overwrites the other in methods table (i.e. after evaluating both definitions only one method is present for this function);
you can check code generated by both functions using #code_typed, #code_warntype, #code_llvm or #code_native etc. and find out that it is the same
finally you can benchmark the code for performance using BenchmarkTools
A nice plot explaining what Julia does with your code is here http://slides.com/valentinchuravy/julia-parallelism#/1/1 (I also recommend the whole presentation to any Julia user - it is excellent). And you can see on it that Julia after lowering AST applies type inference step to specialize function call before LLVM codegen step.
You can hint Julia compiler to avoid specialization. This is done using #nospecialize macro on Julia 0.7 (it is only a hint though).

Create constants visible across packages, accessible directly

I would like to define my Error Codes in a package models.
error.go
package models
const{
EOK = iota
EFAILED
}
How can I use them in another package without referring to them as models.EOK. I would like to use directly as EOK, since these codes would be common across all packages.
Is it the right way to do it? Any better alternatives?
To answer you core question
You can use the dot import syntax to import the exported symbols from another package directly into your package's namespace (godoc):
import . "models"
This way you could directly refer to the EOK constant without prefixing it with models.
However I'd strongly advice against doing so, as it generates rather unreadable code. see below
General/style advice
Don't use unprefixed export path like models. This is considered bad style as it will easily globber. Even for small projects, that are used only internally, use something like myname/models. see goblog
Regarding your question about error generation, there are functions for generating error values, e.g. errors.New (godoc) and fmt.Errorf (godoc).
For a general introduction on go and error handling see goblog
W.r.t. the initial question, use a compact package name, for example err.
Choosing an approach to propagating errors, and generating error messages depends on the scale and complexity of the application. The error style you show, using an int, and then a function to decode it, is quite C-ish.
That style was partly caused by:
the lack of multiple value returns (unlike Go),
the need to use a simple type (to be easily propagated), and
that gets translated to text with a function (unlike Go's error interface), so that the local language strings can be changed.
For small apps with simple errors strings. I put the packages' error strings at the head of a package file, and just return them, maybe using errors.New(...), or fmt.Errorf if the string needs to be completed using some data.
That 'int' style of error reporting doesn't offer something as flexible as Go's error interface. The error interface lets us build information-rich error structures, to return useful information, and not just an int value or string.
An implication is different packages can yield different real-types which implement the Error interface. We don't need to agree a single error real-type across an entire set of packages. So error is an interface which can be easily propagated, like an int, yet, the real-type of error can be much richer than an int. Error generation (implementing Error) can be as centralised or distributed as we need, unlike strerror()-style functions which can be awkward to extend.

How should I organise structs, variables and interfaces in Go?

I have a codebase where one file contains quite a lot of Structs, Interfaces and Variables in the same file as functions and I'm not sure if I need to seperate this into separate files with appending filename. So for example accounts.go will be accounts_struct.go and accounts_interface.go with struct and interface respectively.
What would be a good approach for the file organisation when you have growing codebase for Structs, Variables and Interfaces?
A good model to check out is the source code of Go itself: http://golang.org/src
(in addition of the official "Effective Go")
You will see that this approach (separating based on language items like struct, interface, ...) is never used.
All the files are based on features, and it is best to use a proximity principle approach, where you can find in the same file the definition of what you are using.
Generally, those features are grouped in one file per package, except for large ones, where one package is composed of many files (net, net/http)
If you want to separate anything, separate the source (xxx.go) from the tests/benchmarks (xxx_test.go)
As Thomas Jay Rush adds in the comments
Sometimes source code is automatically generated -- especially data structure definitions.
If the data structures are in the same file as the hand-wrought code, one must build capacity to preserve the hand-wrought portion in the code-generation phase.
If the data structures are separated in a different file, then inclusion allows one to simply write the data structure file without worry.
Dave Cheney offers an interesting perspective in "Absolute Unit (Test) # LondonGophers" (March 2019)
You should take a broader view of the "unit" under test.
The units are not each internal function you write, but a whole package. Specifically the public API of a package.
Organizing your files to facilitate testing their Public API is a good idea.
accounts_struct_test.go would not, in that regards, make much sense.
See also "How I organize packages in Go" by Bartłomiej Klimczak
Sometimes, a few handlers or repositories are needed.
For example, some information can be stored in a database and then sent via an event to a different part of your platform. Keeping only one repository with a method like saveToDb() isn’t that handy at all.
All of elements like that are split by the functionality: repository_order.go or service_user.go.
If there are more than 3 types of the object, there are moved to a separate subfolder.
Here is my mental model for designing a package.
a. A package should encompass one idea or concept. http is a concept, http client or http message is not.
b. A file in a package should encompass a set of related types, a good rule of thumb is if two files share the same set of imports, merge them. Using the previous example, http/client.go, http/server.go are a good level of granularity
c. Don't do one file per type, that's not idiomatic Go.

When writing a single package meant to be used as a command, which is idiomatic: name all identifiers as private or name all identifiers as public?

In Go, public names start with an upper case letter and private names start with a lower case letter.
I'm writing a program that is not library and is a single package. Is there any Go idiom that stipulates whether my identifiers should be all public or all private? I do not plan on using this package as a library or as something that should be imported from another Go program.
I can't think of any reason why I'd want a mixture. It "feels" like going all private is the proper choice.
I don't think I got any concrete answer, but Nate was closest with telling me to think of "exporting vs non-exporting" instead of "public and private".
This leads me to believe that not exporting anything is the best approach. In the worst case scenario, if I end up importing code from my application in another package, I will have to rethink what should be exported and what shouldn't be. Which is a good thing IMO.
If you are attempting to adjust your mindset to be more Go idiomatic, you should stop thinking of variables, functions, and methods as public or private. The more accurate term is exported or not exported. It definitely has a more C like feel to it.
As others have stated exporting really isn't needed for application program code. If for organizational reasons you decide to break your program up into packages, you could use sub-packages. At work we've decided to do just this. We have:
projectgopath/src/projectname
projectname/subcomponent1
projectname/subcomponent2
So far I am really liking this structure. It aids in separation of concerns, but does not go to the extent of making a package outside of the main project. The intent is clear. The sub-package's intended use is for this program only...
The new go build and go install commands seem to deal very well with it. We group components together in packages and expose only the necessary bits via exports.
In the described situation both approaches are equally valid, so it's more or less a matter of personal preferences. In my case I'm using camelCase identifiers for package main, mostly out of habit.
A lot of my go files started their life in isolated commands and were moved to packages as they could be reused by a few commands around the same topic.
I think you should make private all that couldn't possibly be called from elsewhere (supposing one day you make it an importable package) and make public the big functions that can be understood from elsewhere (if any) and structs fields when they are orthogonal (I mean when a change of the value of one field doesn't break the consistency of the struct value).

Resources