Go's Type Inference Algorithm

Go's Type Inference Algorithm - go

What type inference algorithm does the Go compiler use?
I tried looking this up on golang but I can't find documentation. I am tempted to assume that it would be Hindley-Milner, but I would like to know for sure

Go certainly doesn't use Hindley-Milner. Why would you think that? In fact, Go doesn't have type inference in general, only with the := construct, and that uses the extremely simple rule of taking the evaluated type of the right-hand side and applying it to the newly-declared variable on the left. It's actually pretty darn similar to C++11's auto keyword (except without the rules on handling const and references).

Related

Why is `math.Sin` disallowed in a Go constant?

According to Effective Go, the function math.Sin cannot be used to define a constant because that function must happen at run-time.
What is the reasoning behind this limitation? Floating-point consistency? Quirk of the Sin implementation? Something else?
There is support for this sort of thing in other languages. In C, for example: as of version 4.3, GCC supports compile-time calculation of the sine function. (See section "General Optimizer Improvements").
However, as noted in this blog post by Bruce Dawson, this can cause unexpected issues. (See section "Compile-time versus run-time sin").
Is this a relevant concern in Go? Or is this usage restricted for a different reason?

Go doesn't support initializing a constant with the result of a function. Functions are called at runtime, not at compile time. But constants are defined at compile time.
It would be possible to make exceptions for certain functions (like math.Sin for example), but that would make the spec more complicated. The Go developers generally prefer to keep the spec simple and consistent.

Go simply lacks the concept. There is no way of marking a function as pure (its return value depends only on its arguments, and it doesn't alter any kind of mutable state or perform I/O), there is no way for the compiler to infer pureness, and there's no attempt to evaluate any expression containing a function call at compile-time (because doing so for anything except a pure function of constant arguments would be a source of weird behavior and bugs, and because adding the machinery needed to make it work right would introduce quite a bit of complexity).
Yes, this is a substantial loss, which forces a tradeoff between code with bad runtime behavior, and code which is flat-out ugly. Go partisans will choose the ugly code and tell you that you are a bad human being for not finding it beautiful.
The best thing you have available to you is code generation. The integration of go generate into the toolchain and the provision of a complete Go parser in the standard library makes it relatively easy to munge code at build time, and one of the things that you can do with this ability is create more advanced constant-folding if you so choose. You still get all of the debuggability peril of code generation, but it's something.

Any documentation/article about the `&MyType{}` pattern in golang?

In most golang codebases I look, people are using types by reference:
type Foo struct {}
myFoo := &Foo{}
I usually take the opposite approach, passing everything as copy and only pass by reference when I want to perform something destructive on the value, which allows me to easily spot destructive functions (and which is fairly rare).
But seeing how references are commonplace, I guess it's not just a matter of taste. I get there's a cost in duplicating values, is it that much of a game changer? Or are there other reasons why references are preferred?
It would be great if someone could point me to an article or documentation about why references are preferred.
Thanks!

Go is pass by value. I try to use references like in your example as much as possible to remove the mental process of thinking about not making duplicates of objects. Go is mostly meant for networking & scaling, which makes performance a priority. Obvious downside of this is as you say, receiving methods can destroy the object that the pointer points to.
Otherwise there is no rule as to which you should use. Both are quite ok.
Also, somewhat related to the question, from the Go docs: Pointers vs. Values

When to use references versus types versus boxes and slices versus vectors as arguments and return types?

I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!

Why doesn't Haskell have symbols (a la ruby) / atoms (a la erlang)?

The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.
Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example
type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"
In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).
The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).
So why have symbols not been added to the language? Or am I thinking about this the wrong way?

I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.
The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.
Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.
In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.

I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:
Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".
Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).
Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.
Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.
So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.

Atoms aren't provided by the language, but can be implemented reasonably as a library:
http://hackage.haskell.org/package/simple-atom
There are a few other libs on hackage, but this one looks the most recent and well-maintained.

Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.
The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.
I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.

An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
Use Enum instead.
data FileType = GZipped | BZipped | Plain
deriving Enum
descr ft = ["compressed with gzip",
"compressed with bzip2",
"uncompressed"] !! fromEnum ft

What is the difference between Latent type and Manifest type?

Could someone give me a clear distinction between latent and manifest type system?

Sometimes, the same concept gets invented independently in different areas of computer science. This is one of those occasions. What the Scheme community calls latent and manifest typing, the rest of the world calls implicit and explicit typing. The meaning is exactly the same:
In explicit / manifest typing, the programmer has to explicitly write down the types, thus the types become manifest in the source code.
In implicit / latent typing, the programmer does not write down the types. The types are thus implicit or latent.
Please note that the question of implicit vs. explicit typing is completely orthogonal to e.g. dynamic vs. static typing, strong vs. weak typing, sound vs. unsound typing, safe vs. unsafe typing and nominal vs. structural vs. duck typing.
Haskell for example is implicitly, strongly, statically, sound, safe, structurally typed.

See Anton van Straaten's post on Lambda the Ultimate. It describes latent typing in the context of Scheme.
Manifest typing would be used in a statically typed language where the type of a term is declared syntactically or can be inferred at compile time from other such terms.

Latent typing: A style of typing that does not require explicit type declarations. It is associated with duck typing, dynamic typing and type inference. You can see these in languages like Python, Lisp, Haskell etc.
Manifest typing: The type of all variables declared are explicitly identified. Languages like C, C++ and Java follow this.

Part of the reason it's hard to answer your question is that this is an active area of research. In particular, there are a whole bunch of people that would like to make it possible to mix typed and untyped languages, and to allow programs where certain parts are typed and certain parts are not.
I claim that there's not yet widespread agreement on what meaning will finally be attached to the term "latent type."
However, the issue of latent and manifest types is not the same issue as that of type inference.
Type inference, in a statically typed language, refers to a system that can deduce types for program terms without programmer assistance, typically using a hindley-milner-style type system and unification. Haskell and OCaml both have type inference.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio