When to reuse functions? - go

I have a function in my program that generates random strings.
func randString(s []rune, l int) string
s is a slice of runes containing the possible characters in the string. I pass
in a rune slice of both capital and lowercase alphabetic characters. l
determines the length of the string. This works great. But I also need to
generate random hex strings for html color codes.
It seems all sources say that it's good programming practice to reuse code. So I
made another []rune that held [1-9a-f] and feed that into randString. That
was before I realized that the stdlib already inclues formatting verbs for int
types that suit me perfectly.
In practice, is it better to reuse my randString function or code a separate
(more efficient) function? I would generate a single random int and Sprintf it
rather than having to loop and generate 6 random ints which randString does.

1) If there is an exact solution in the standard library, you should like always choose to use that.
Because:
The standard library is tested. So it does what it says (or what we expect it to do). Even if there is a bug in it, it will be discovered (by you or by others) and will get fixed without your work/effort.
The standard library is written as idiomatic Go. Chances are it's faster even if it does a little more than what you need compared to the solution you could write.
The standard library is (or may) improve by time. Your program may get faster just because an implementation was improved in a new Go release without any effort from your part.
The solution is presented (which means it's ready and requires no time from you).
The standard library is well and widely known, so your code will be easier to understand by others and by you later on.
If you're already imported the package (or will in the near future), this means zero or minimal overhead as libraries are statically linked, so the function you need is already linked to your program (to the compiled executable binary).
2) If there is a solution provided by the standard library but it is a general solution to similar problems and/or offers more than what you need:
That means it's more likely not the optimal solution for you, as it may use more memory and/or work more slowly as your solution could be.
You need to decide if you're willing to sacrifice that little performance loss for the gains listed above. This also depends how and how many times you need to use it (e.g. if it's a one-time, it shouldn't matter, if it's in an endless loop called very frequently, it should be examined carefully).
3) And at the other end: you should avoid using a solution provided by the standard library if it wasn't designed to solve your problem...
If it just happens that its "side-effect" solves your problem: Even if the current implementation would be acceptable, if it was designed for something else, future improvements to it could render your usage of it completely useless or could even break it.
Not to mention it would confuse other developers trying to read, improve or use your code (you included, after a certain amount of time).
As a side note: this question is exactly about the function you're trying to create: How to generate a random string of a fixed length in golang? I've presented mutiple very efficient solutions.

This is fairly subjective and not go-specific but I think you shouldn't reuse code just for the sake of reuse. The more code you reuse the more dependencies you create between different parts of your app and as result it becomes more difficult to maintain and modify. Easy to understand and modify code is much more important especially if you work in a team.
For your particular example I would do the following.
If a random color is generated only once in your package/application then using fmt.Sprintf("#%06x", rand.Intn(256*256*256)) is perfectly fine (as suggested by Dave C).
If random colors are generated in multiple places I would create function func randColor() string and call it. Note that now you can optimize randColor implementation however you like without changing the rest of the code. For example you could have implemented randColor using randString initially and then switched to a more efficient implementation later.

Related

If you write a compiler in pure Prolog, will it work as a decompiler also?

If you write a compiler in pure Prolog (no extra-logical bits), will it work as a decompiler also?
(A book I was reading opined on this, but I wonder if anyone has actually tried it)
I once wrote the equivalent of cdecl.org as a reversible program. It was a bit tricky, but I demonstrated that it could be done. (Somewhere in a pile of papers is the source code; one of these days, I hope to publish it on github.) The code was 2 or 3 times as compact at some existing code that used tools such as yacc/lex (bison/flex).
For something like cdecl -- where you're translating between char ** const * const x and declare x as const pointer to const pointer to pointer to char, compiling/decompiling makes sense. But what does it mean to translate from arbitrary machine code to source code? Even translating between some IR and source code doesn't seem to make a lot of sense.
This question needs to be much more precise, as we don't know what a "compiler" is (an extraneous-information-dumping transformation from a graph - the program in language 1 - to another graph - the algorithmically equivalent graph in language 2, I suppose). It also not clear what "no-extra logical bits implies". If yo get rid of these, what kind of compilers can you still build?
Seen this way, compilation looks like pure deduction (Prolog running forward, or CHR) while decompilation looks like possibly very hard search (you will get a program among the gazillion possible ones but it won't be pleasant too look at and in no way resemble the one you had earlier). Someone who as a toolbox of theorems freshly in his mind can certainly say more.
But I would say not automagically, no. For one, there will be no guarantee that an infinite "recursion on the left" loop won't appear when "decompiling".

Why is `math.Sin` disallowed in a Go constant?

According to Effective Go, the function math.Sin cannot be used to define a constant because that function must happen at run-time.
What is the reasoning behind this limitation? Floating-point consistency? Quirk of the Sin implementation? Something else?
There is support for this sort of thing in other languages. In C, for example: as of version 4.3, GCC supports compile-time calculation of the sine function. (See section "General Optimizer Improvements").
However, as noted in this blog post by Bruce Dawson, this can cause unexpected issues. (See section "Compile-time versus run-time sin").
Is this a relevant concern in Go? Or is this usage restricted for a different reason?
Go doesn't support initializing a constant with the result of a function. Functions are called at runtime, not at compile time. But constants are defined at compile time.
It would be possible to make exceptions for certain functions (like math.Sin for example), but that would make the spec more complicated. The Go developers generally prefer to keep the spec simple and consistent.
Go simply lacks the concept. There is no way of marking a function as pure (its return value depends only on its arguments, and it doesn't alter any kind of mutable state or perform I/O), there is no way for the compiler to infer pureness, and there's no attempt to evaluate any expression containing a function call at compile-time (because doing so for anything except a pure function of constant arguments would be a source of weird behavior and bugs, and because adding the machinery needed to make it work right would introduce quite a bit of complexity).
Yes, this is a substantial loss, which forces a tradeoff between code with bad runtime behavior, and code which is flat-out ugly. Go partisans will choose the ugly code and tell you that you are a bad human being for not finding it beautiful.
The best thing you have available to you is code generation. The integration of go generate into the toolchain and the provision of a complete Go parser in the standard library makes it relatively easy to munge code at build time, and one of the things that you can do with this ability is create more advanced constant-folding if you so choose. You still get all of the debuggability peril of code generation, but it's something.

When to use references versus types versus boxes and slices versus vectors as arguments and return types?

I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Static/strong typing and refactoring

It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.
I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.
Is this true?
I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.
The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.
The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).
Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.
Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.
As to your second question:
How can we re-factor safely in a runtime typed language?
The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.
One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).
No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.
I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?
In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.
I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.
To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.
I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.
I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.
Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?
Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.
An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.
The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.
I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.
One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.

Resources