How to learn internals of the Go Programming Language? [closed]

How to learn internals of the Go Programming Language? [closed] - go

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
The community reviewed whether to reopen this question 10 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Recently I've participated several Go job interviews. The first one asked me How is channel implemented?, then the second one asked How is goroutine implemented?. Well as you can guess, the next one asked How is a Go interface implemented?.
I've been using Go for six months, but to be honest I never did care or know these Go internals.
I tried to learn these by reading the source code of Go, but can't really understand the quintessence.
So the question is, for a noob in Go, how do I learn the Go internals?

The most organized collection of internal resources links is probably this:
Golang Internals Resources
Other than that, answers to these questions aren't collected in once place, but they are scattered in different blog posts.
Slice internals: The Go Blog: usage and internals
String internals: The Go Blog: Strings, bytes, runes and characters in Go
Constants internals: The Go Blog: Constants
Reflection insight: The Go Blog: The Laws of Reflection
Interface internals: RSC: Go Data Structures: Interfaces
Channel implementation: Overview on SO: How are Go channels implemented?
Channel internals: Go channel on steroids
Map implementation: Overview on SO: Golang map internal implementation - how does it search the map for a key?; also related: Go's maps under the hood
Map internals: Macro View of Map Internals In Go

Let me warn you that you may be missing the real point of the interviewers.
(Disclaimer: I do job interviews of Go programmers from time to time,
for a somewhat demanding project, so all of the below is my personal
world view. Still, it is shared by my cow-orkers ;-)).
Most of the time, it's rather worthless for an employee to know precisely how this or that bit of the runtime (or the compiler) is implemented—in part because this may change in any future release, and in part because there
do exist at least two up-to-date implementations of Go ("the gc suite", and
a part of GCC), and they all are free to implement a particular feature
in any way they wish.
What a (sensible) interviewer should really be interested in is
whether you understand "the why" of a particular core feature.
Say, when they ask you to explain how a channel is implemented,
what they should be interested to hear from you is that a channel
provides synchronization and may also provide buffering.
So you may tell them that a channel is like
a variable protected by a mutex—for the case of an unbuffered channel,—or
like a slice protected by a mutex.
And then add that an upshot of using a channel instead of a hand-crafted
solution involving a mutex is that operations on channels
can be easily combined using the select statement, while implementing
a matching functionality without channels is possible (and by that time
they will probably would like to hear from you about sync.Cond)
but is really cumbersome and error-prone.
It's not actually completely worthless to know nitty-gritty details
of channels—say to know that their implementation tries hard to lower
the price paid for the synchronization in the "happy case"
(there is no contention at the
moment a goroutine accesses a channel), and that it is also clever about
not jumping straight into the kernel to sleep on a lock in the
"unhappy case", but I see no point in knowng these details by heart.
The same applies to goroutines. You should maintain a clear picture in what
are the differences between an OS process and a thread running in it,
and what context belongs to a thread—that is, what is needed to be saved
and restored when switching between the threads.
And then the differences between an OS thread and a "green thread" which
a goroutine mostly is. It's okay to just know who schedules OS threads
and who schedules goroutines, and why the latter is faster.
And what are the benefits of having goroutines (the main one is network
poller integrated into the scheduler (see this),
the second is dynamic stacks, the third is low context switching overhead
most of the time).
My recommendation is to read through the list presented by #icza.
And in addition to what you've asked about,
I'd present the following list of what a good candidate
should be familiar with—in the order from easiest to hardest (to grok):
The mechanics of slices and append. You should know arrays exist
and how they are different from slices.
How interfaces are implemented.
The dualistic nature of Go strings (given s contains a string,
what is the difference between iterating over its contents via
for i := 0; i < len(s); i++ { c := s[i] } over for i, c := range s {}).
Also: what kinds of data strings may contain—you should know that it's
perfectly okay to contain arbitrary binary data in them, UTF-8 is not
a requirement.
The differences between string and []byte.
How blocking I/O is implemented (for the network; that's about
the netpoller integrated into the runtime).
Knowing about the difference in handing non-network blocking I/O and
syscalls in general is a bonus.
How the scheduler is implemented (those Ps running Gs on Ms).

This GitHub repo would be helpful to get to know about go internals.
You can get all go-internals related resources here.
A collection of articles and videos to understand Golang internals.
A book about the internals of the Go programming language

Related

Golang main difference from CSP-Language by Hoare

Look at this statement taken from The examples from Tony Hoare's seminal 1978 paper:
Go's design was strongly influenced by Hoare's paper. Although Go differs significantly from the example language used in the paper, the examples still translate rather easily. The biggest difference apart from syntax is that Go models the conduits of concurrent communication explicitly as channels, while the processes of Hoare's language send messages directly to each other, similar to Erlang. Hoare hints at this possibility in section 7.3, but with the limitation that "each port is connected to exactly one other port in another process", in which case it would be a mostly syntactic difference.
I'm confused.
Processes in Hoare's language communicate directly to each other. Go routines communicate also directly to each other but using channels.
So what impact has the limitation in golang. What is the real difference?

The answer requires a fuller understanding of Hoare's work on CSP. The progression of his work can be summarised in three stages:
based on Dijkstra's semaphore's, Hoare developed monitors. These are as used in Java, except Java's implementation contains a mistake (see Welch's article Wot No Chickens). It's unfortunate that Java ignored Hoare's later work.
CSP grew out of this. Initially, CSP required direct exchange from process A to process B. This rendezvous approach is used by Ada and Erlang.
CSP was completed by 1985, when his Book was first published. This final version of CSP includes channels as used in Go. Along with Hoare's team at Oxford, David May concurrently developed Occam, a language deliberately intended to blend CSP into a practical programming language. CSP and Occam influenced each other (for example in The Laws of Occam Programming). For years, Occam was only available on the Transputer processor, which had its architecture tailored to suit CSP. More recently, Occam has developed to target other processors and has also absorbed Pi calculus, along with other general synchronisation primitives.
So, to answer the original question, it is probably helpful to compare Go with both CSP and Occam.
Channels: CSP, Go and Occam all have the same semantics for channels. In addition, Go makes it easy to add buffering into channels (Occam does not).
Choices: CSP defines both the internal and external choice. However, both Go and Occam have a single kind of selection: select in Go and ALT in Occam. The fact that there are two kinds of CSP choice proved to be less important in practical languages.
Occam's ALT allows condition guards, but Go's select does not (there is a workaround: channel aliases can be set to nil to imitate the same behaviour).
Mobility: Go allows channel ends to be sent (along with other data) via channels. This creates a dynamically-changing topology and goes beyond what is possible in CSP, but Milner's Pi calculus was developed (out of his CCS) to describe such networks.
Processes: A goroutine is a forked process; it terminates when it wants to and it doesn't have a parent. This is less like CSP / Occam, in which processes are compositional.
An example will help here: firstly Occam (n.b. indentation matters)
SEQ
PAR
processA()
processB()
processC()
and secondly Go
go processA()
go processB()
processC()
In the Occam case, processC doesn't start until both processA and processB have terminated. In Go, processA and processB fork very quickly, then processC runs straightaway.
Shared data: CSP is not really concerned with data directly. But it is interesting to note there is an important difference between Go and Occam concerning shared data. When multiple goroutines share a common set of data variables, race conditions are possible; Go's excellent race detector helps to eliminate problems. But Occam takes a different stance: shared mutable data is prevented at compilation time.
Aliases: related to the above, Go allows many pointers to refer to each data item. Such aliases are disallowed in Occam, so reducing the effort needed to detect race conditions.
The latter two points are less about Hoare's CSP and more about May's Occam. But they are relevant because they directly concern safe concurrent coding.

That's exactly the point: in the example language used in Hoare's initial paper (and also in Erlang), process A talks directly to process B, while in Go, goroutine A talks to channel C and goroutine B listens to channel C. I.e. in Go the channels are explicit while in Hoare's language and Erlang, they are implicit.
See this article for more info.

Recently, I've been working quite intensively with Go's channels, and have been working with concurrency and parallelism for many years, although I could never profess to know everything about this.
I think what you're asking is what's the subtle difference between sending a message to a channel and sending directly to each other? If I understand you, the quick answer is simple.
Sending to a Channel give the opportunity for parallelism / concurrency on both sides of the channel. Beautiful, and scalable.
We live in a concurrent world. Sending a long continuous stream of messages from A to B (asynchronously) means that B will need to process the messages at pretty much the same pace as A sends them, unless more than one instance of B has the opportunity to process a message taken from the channel, hence sharing the workload.
The good thing about channels is that that you can have a number of producer/receiver go-routines which are able to push messages to the queue, or consume from the queue and process it accordingly.
If you think linearly, like a single-core CPU, concurrency is basically like having a million jobs to do. Knowing a single-core CPU can only do one thing at a time, and yet also see that it gives the illusion that lots of things are happening at the same time. When executing some code, the time the OS needs to wait a while for something to come back from the network, disk, keyboard, mouse, etc, or even some process which sleeps for a while, give the OS the opportunity to do something else in the meantime. This all happens extremely quickly, creating the illusion of parallelism.
Parallelism on the other hand is different in that the job can be run on a completely different CPU independent of what's going with other CPUs, and therefore doesn't run under the same constraints as the other CPU (although most OS's do a pretty good job at ensuring workloads are evenly distributed to run across all of it's CPUs - with perhaps the exception of CPU-hungry, uncooperative non-os-yielding-code, but even then the OS tames them.
The point is, having multi-core CPUs means more parallelism and more concurrency can occur.
Imagine a single queue at a bank which fans-out to a number of tellers who can help you. If no customers are being served by any teller, one teller elects to handle the next customer and becomes busy, until they all become busy. Whenever a customer walks away from a teller, that teller is able to handle the next customer in the queue.

Prolog - high-level purpose of WAM [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to understand the purpose of the WAM at a conceptual high level, but all the sources I have consulted so far assume that I know more than I currently do at this juncture, and they approach the issue from the bottom (details). They start with throwing trees at me, where as right now I am concerned with seeing the whole forest.
The answers to the following questions would help me in this endeavor:
Pick any group of accomplished, professional Prolog implementers - the SISCtus people, the YAP people, the ECLiPSe people - whoever. Now, give them the goal of implementing a professional, performant, WAM-based Prolog on an existing virtual machine - say the Erlang VM or Java VM. To eliminate answers such as "it depends on what your other goals are," lets say that any other goals they have besides the one I just gave are the ones they had when they developed their previous implementations.
Would they (typically) implement a virtual machine (the WAM) inside of a VM (Erlang/JVM), meaning would you have a virtual machine running on top of, or being simulated by, another virtual machine?
If the answer to 1 is no, does that mean that they would try to somehow map the WAM and its associated instructions and execution straight onto the underlying Erlang/Java VM, in order to make the WAM 'disappear' so to speak and only have one VM running (Erlang/JVM)? If so, does this imply that any WAM heaps, stacks, memory operations, register allocations, instructions, etc. would actually be Erlang/Java ones (with some tweaking or massaging)?
If the answer to 1 is yes, does that mean that any WAM heaps, stacks, memory ops, etc. would simply be normal arrays or linked lists in whatever language (Erlang or Java, or even Clojure running on the JVM for that matter) the developers were using?
What I'm trying to get at is this. Is the WAM merely some abstraction or tool to help the programmer organize code, understand what is going on, map Prolog to the underlying machine, perhaps provide portability, etc. or is it seen as an (almost) necessary, or at least quite useful "end within itself" in order to implement a Prolog?
Thanks.

I'm excited to see what those more knowledgeable than I are able to say in response to this interesting question, but in the unlikely event that I actually know more than you do, let me outline my understanding. We'll both benefit when the real experts show up and correct me and/or supply truer answers.
The WAM gives you a procedural description of a way of implementing Prolog. Prolog as specified does not say how exactly it must be implemented, it just talks about what behavior should be seen. So WAM is an implementation approach. I don't think any of the popular systems follow it purely, they each have their own version of it. It's more like an architectural pattern and algorithm sketch than a specification like the Java virtual machine. If it were firmer, the book Warren's Abstract Machine: A Tutorial Reconstruction probably wouldn't need to exist. My (extremely sparse) understanding is that the principal trick is the employment of two stacks: one being the conventional call/return stack of every programming language since Algol, and the other being a special "trail" used for choice points and backtracking. (edit: #false has now arrived and stated that WAM registers are the principal trick, which I have never heard of before, demonstrating my ignorance.) In any case, to implement Prolog you need a correct way of handling the search. Before WAM, people mostly used ad-hoc methods. I wouldn't be surprised to learn that there are newer and/or more sophisticated tricks, but it's a sound architecture that is widely used and understood.
So the answer to your three-part question is, I think, both. There will be a VM within the VM. The VM within the VM will, of course, be implemented in the appropriate language and will therefore use that language's primitives for handling the invisible parts of the VM (the stack and the trail). Clojure might provide insight into the ways a language can share things with its own implementation language. You would be free to intermix as desired.
The answer to your final question, what you're trying to get at, is that the WAM is merely an abstraction for the purposes you describe and not an end to itself. There is not, for instance, such a thing as "portable WAM bytecode" the way compiled Java becomes portable JVM bytecode which might justify it absent the other benefits. If you have a novel way of implementing Prolog, by all means try it and forget all about WAM.

Is Go's buffered channel lockless?

Go's buffered channel is essentially a thread-safe FIFO queue. (See Is it possible to use Go's buffered channel as a thread-safe queue?)
I am wondering how it's implemented. Is it lock-free like described in Is there such a thing as a lockless queue for multiple read or write threads??
greping in Go's src directory (grep -r Lock .|grep chan) gives following output:
./pkg/runtime/chan.c: Lock;
./pkg/runtime/chan_test.go: m.Lock()
./pkg/runtime/chan_test.go: m.Lock() // wait
./pkg/sync/cond.go: L Locker // held while observing or changing the condition
Doesn't to be locking on my machine (MacOS, intel x86_64) though. Is there any official resource to validate this?

If you read the runtime·chansend function in chan.c, you will see that runtime·lock is called before the check to see if the channel is buffered if(c->dataqsiz > 0).
In other words, buffered channels (and all channels in general) use locks.
The reason your search did not find it was you were looking for "Lock" with a capital L. The lock function used for channels is a non-exported C function in the runtime.

You can write lock-free (and even wait-free!) implementations for everything you like. Modern hardware primitives like CMPXCHG are enough to be universally usable. But writing and verifying such algorithms isn't one of the easiest tasks. In addition to that, much faster algorithms might exists: lock free algorithms are just a very small subset of algorithms in general.
As far as I remember, Dmitry Vyukov has written a lock-free MPMC (mutli-producer/multi-consumer) channel implementation for Go in the past, but the patch was abandoned, because of some problems with Go's select statement. Supporting this statement efficiently seems to be really hard.
The main goal of Go's channel type is however, to provide a high-level concurrency primitive that is easily usable for a broad range of problems. Even developers who aren't experts at concurrent programming should be able to write correct programs that can be easily reviewed and maintained in larger software projects. If you are interested in squeezing out every last bit of performance, you would have to write a specialized queue implementation that suits your needs.

How do you write data structures that are as efficient as possible in GHC? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
So sometimes I need to write a data structure I can't find on Hackage, or what I find isn't tested or quality enough for me to trust, or it's just something I don't want to be a dependency. I am reading Okasaki's book right now, and it's quite good at explaining how to design asymptotically fast data structures.
However, I am working specifically with GHC. Constant factors are a big deal for my applications. Memory usage is also a big deal for me. So I have questions specifically about GHC.
In particular
How to maximize sharing of nodes
How to reduce memory footprint
How to avoid space leaks due to improper strictness/laziness
How to get GHC to produce tight inner loops for important sections of code
I've looked around various places on the web, and I have a vague idea of how to work with GHC, for example, looking at core output, using UNPACK pragmas, and the like. But I'm not sure I get it.
So I popped open my favorite data structures library, containers, and looked at the Data.Sequence module. I can't say I understand a lot of what they're doing to make Seq fast.
The first thing that catches my eye is the definition of FingerTree a. I assume that's just me being unfamiliar with finger trees though. The second thing that catches my eye is all the SPECIALIZE pragmas. I have no idea what's going on here, and I'm very curious, as these are littered all over the code.
Many functions also have an INLINE pragma associated with them. I can guess what that means, but how do I make a judgement call on when to INLINE functions?
Things get really interesting around line ~475, a section headered as 'Applicative Construction'. They define a newtype wrapper to represent the Identity monad, they write their own copy of the strict state monad, and they have a function defined called applicativeTree which, apparently is specialized to the Identity monad and this increases sharing of the output of the function. I have no idea what's going on here. What sorcery is being used to increase sharing?
Anyway, I'm not sure there's much to learn from Data.Sequence. Are there other 'model programs' I can read to gain wisdom? I'd really like to know how to soup up my data structures when I really need them to go faster. One thing in particular is writing data structures that make fusion easy, and how to go about writing good fusion rules.

That's a big topic! Most has been explained elsewhere, so I won't try to write a book chapter right here. Instead:
Real World Haskell, ch 25, "Performance" - discusses profiling, simple specialization and unpacking, reading Core, and some optimizations.
Johan Tibell is writing a lot on this topic:
Computing the size of a data structure
Memory footprints of common data types
Faster persistent structures through hashing
Reasoning about laziness
And some things from here:
Reading GHC Core
How GHC does optimization
Profiling for performance
Tweaking GC settings
General improvements
More on unpacking
Unboxing and strictness
And some other things:
Intro to specialization of code and data
Code improvement flags

applicativeTree is quite fancy, but mainly in a way which has to do with FingerTrees in particular, which are quite a fancy data structure themselves. We had some discussion of the intricacies over at cstheory. Note that applicativeTree is written to work over any Applicative. It just so happens that when it is specialized to Id then it can share nodes in a manner that it otherwise couldn't. You can work through the specialization yourself by inlining the Id methods and seeing what happens. Note that this specialization is used in only one place -- the O(log n) replicate function. The fact that the more general function specializes neatly to the constant case is a very clever bit of code reuse, but that's really all.
In general, Sequence teaches more about designing persistent data structures than about all the tricks for eeking out performance, I think. Dons' suggestions are of course excellent. I'd also just browse through the source of the really canonical and tuned libs -- Map, IntMap, Set, and IntSet in particular. Along with those, its worth taking a look at Milan's paper on his improvements to containers.

Proactively using 'lines of code' (LOC) metric in your software-development process? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Codebase size has a lot to do with complexity of a software system (the higher the size the higher the costs for maintenance and extensions). A way to map codebase size is the simple 'lines of code (LOC)' metric (see also blog-entry 'implications of codebase-size').
I wondered how many of you out there are using this metric as a part for retrospective to create awareness (for removing unused functionality or dead code). I think creating awareness that more lines-of-code mean more complexity in maintenance and extension can be valuable.
I am not taking the LOC as fine grained metric (on method or function level), but on subcomponent or complete product level.

I find it a bit useless. Some kinds of functions - user input handling, for example , are going to be a bit long winded no matter what. I'd much rather use some form of complexity metric. Of course, you can combine the two, and/or any other metrics that take your fancy. All you need is a good tool - I use Source Monitor (with whom I have no relationship other than satisfied user) which is free and can do you both LOC and complexity metrics.
I use SM when writing code to make me notice methods that have got too complex. I then go back and take a look at them. About half the time I say, OK, that NEEDS to be that complicated. What I'd really like is (free) tool as good as SM but which also supports a tag list of some sort which says "ignore methods X,Y & Z - they need to be complicated". But I guess that could be dangerous, which is why I have so far not suggested the feature to SM's author.

I'm thinking it could be used to reward the team when the LOC decreases (assuming they are still producing valuable software and readable code...).

Not always true. While it is usually preferable to have a low LOC, it doesn't mean the code is any less complex. In fact, its usually more-so. Code thats been optimized to get the minimal number of cycles can be completely unreadable, even by the person who wrote it a week later.
As an example from a recent project, imagine setting individual color values (RGBa) from a PNG file. You can do this a bunch of ways, the most compact being just 1 line using bitshifts. This is a lot less readable and maintainable then another approach, such as using bitfields, which would take a structure definition and many more lines.
It also depends on the tool doing the LOC calculations. Does it consider lines with just a single symbol on them as code (Ex: { and } in C-style languages)? That definitely doesn't make it more complex, but does make it more readable.
Just my two cents.

LOCs are easy to obtain and deliver reasonable information whithin one not trivial project. My first step in a new project is always counting LOCs.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio