Is this a compiler trick in Go? - go

There is a function in the Go language (https://github.com/golang/go/blob/master/src/io/ioutil/ioutil.go#L18-L38). Specifically this branch seems redundant to me:
if int64(int(capacity)) == capacity {
buf.Grow(int(capacity))
}
capacity is never modified and the function doesn't seem to be recursive. Is this something to trick the compiler?

The capacity argument to io.ReadAll has type int64. The argument to *bytes.Buffer's Grow is int. So we have to narrow the input int64 to (perhaps) 32 bits. The result may not be the same value as the original int64 value. If not, we skip the call (though I might argue that we should try to grow the buffer to the maximum int value, or—probably better—just return an error directly).

It looks like it is to prevent allocating a very large buffer for architectures smaller than 64 bits. That equivalence check will fail if int is smaller than 64 bits on the architecture and capacity is larger than int limit.

Related

Why does unsafe.Sizeof return a uintptr?

As per the documentation (https://golang.org/pkg/unsafe/#Sizeof) unsafe.Sizeof returns the size of the given expression in bytes. A size of any given expression can ideally be denoted by a uint32 or uint64. Then why does Golang return a uintptr instead? Isn't that confusing? A uintptr is supposed to hold a pointer to some data value but in this case it is not actually a pointer it is just a number right?
There are a lot of good answers in the comments, which boil down to "because that's big enough, yet not too big". I think, though, it might be helpful to view this from a historical perspective, with particular attention to how this all came about in the C programming language.
In very old (pre-standard) C, if you go far back enough in time, there was not even an explicit unsigned integer type. The PDP-11 had:
char, which was 8 bits and signed;
int, which was 16 bits and signed; and
pointers, which were 16 bits and unsigned.
That is:
int i;
int *u;
was how you made two integers, i being signed, and u being unsigned. Setting i to 32767 (0x7fff) and then incrementing it gave you -32768 (0x8000), which gradually increased to -1 (0xffff) and then zero. Setting u to 32767 and then incrementing it gave you 32768, which gradually increased to 65535, and then rolled over to zero.
The lack of distinction between integers and pointers meant that device drivers could read:
struct {
int csr;
int blk;
int bar;
int bcr;
};
0177440->bcr = count;
0177440->blk = block;
0177440->bar = addr;
0177440->csr = READ | GO;
which might be how one told a device to read some bytes or blocks.
(This is also why struct member names, like st_ino in struct stat, were all prefixed like this: st_ino just meant "some integer offset" and you could use the st_ino member with any pointer, or even with an ordinary variable. The prefix meant you could #include multiple headers without having their struct member names collide.)
All of this turned untenable when C was made to work on 32-bit and other machines. C grew an unsigned integer type, rather than pressing pointers into service as unsigned integers, and Steve Johnson's PCC compiler turned unsigned into a modifier, that could be applied to char and short as well as int. A lot of experimentation occurred. Eventually, in 1989, C was first standardized with most of the syntax and semantics that we have now (though new standards have added new types, and many functions, and so on).
Some of the early C pioneers were involved with creating Go, with particular influence from Ken Thompson. There is a quote on the Wikipedia page that is appropriate here:
When the three of us [Thompson, Rob Pike, and Robert Griesemer] got started, it was pure research. The three of us got together and decided that we hated C++. [laughter] ... [Returning to Go,] we started off with the idea that all three of us had to be talked into every feature in the language, so there was no extraneous garbage put into the language for any reason.
As we see from the early days of C, a pointer-as-integer is a suitable unsigned type that can not only hold any pointer, but, if treated as unsigned, can also hold any object size. A pointer-as-integer is not directly usable as a pointer, of course, and with a GC system and concurrency, we need the language itself to have pointers. But we also need to be able to write the runtime support for the language,1 for which we need integer-ized pointers, which also covers all of our needs for object sizes. So one type, built in to the compiler, covers all the requirements. That is as simple as possible, but no simpler.
1I say "we" as if I had anything to do with it. It's just obvious, once you have implemented a few runtime systems.

Why does type int exist in Go

There's int, int32, int64 in Golang.
int32 has 32 bits,
int64 has 64 bits,
int has 32 or 64 or different number of bits according to the environment.
I think int32 and int64 will be totally enough for the program.
I don't know why int type should exist, doesn't it will make the action of our code harder to predict?
And also in C++, type int and type long have uncertain length. I think it will make our program fragile. I'm quite confused.
Usually each platform operates best with integral type of its native size.
By using simple int you say to your compiler that you don't really care about what bit width to use and you let him choose the one it will work fastest with. Note, that you always want to write your code so that it is as platform independent as possible...
On the other hand int32 / int64 types are useful if you need the integer to be of a specific size. This might be useful f.e. if you want to save binary files (don't forget about endiannes). Or if you have large array of integers (that will only reach up to 32b value), where saving half the memory would be significant, etc.
Usually size of int is equal to the natural word size of target. So if your program doesn't care for the size of int (Minimal int range is enough), it can perform best on variety of compilers.
When you need a specific size, you can of course use int32 etc.
In versions of Go up to 1.0, int was just a synonym for int32 — a 32-bit Integer. Since int is used for indexing slices, this prevented slices from having more than 2 billion elements or so.
In Go 1.1, int was made 64 bits long on 64-bit platforms, and therefore large enough to index any slice that fits in main memory. Therefore:
int32 is the type of 32-bit integers;
int64 is the type of 64-bit integers;
int is the smallest integer type that can index all possible slices.
In practice, int is large enough for most practical uses. Using int64 is only necessary when manipulating values that are larger than the largest possible slice index, while int32 is useful in order to save memory and reduce memory traffic when the larger range is not necessary.
The root cause for this is array addressability. If you came into a situation where you needed to call make([]byte, 5e9) your 32 bit executable would be unable to comply, while your 64 bit executable could continue to run. Addressing an array with int64 on a 32 bit build is wasteful. Addressing an array with int32 on a 64 bit build is insufficient. Using int you can address an array to its maximum allocation size on both architectures without having to code a distinction using int32/int64.

What are reasons to use a sized or unsigned integer type in go?

Doing the go tour in chapter "Basics/Basic types" it says:
When you need an integer value you should use int unless you have a specific reason to use a sized or unsigned integer type.
What are those specific reasons? Can we name them all?
Other available ressources only talk about 32 and 64 bit signed and unsigned types. But why would someone use int types < 32 bit?
If you cannot think of a reason not to use a standard int, you should use a standard int. In most cases, saving memory isn't worth the extra effort and you are probably not going to need to store that large values anyway.
If you are saving a very large number of small values, you might be able to save a lot of memory by changing the datatype to a smaller one, such as byte. Storing 8bit values in an int means we are storing 24bits of zeroes for every 8 bits of data, and thus, wasting a lot of space. Of course, you could store 4 (or maybe 8) bytes inside an int with some bitshift magic, but why do the hard work when you can let the compiler and the cpu do it for you?
If you are trying to do computations that might not fit inside a 32bit integer, you might want an int64 instead, or even bigint.

What are Go arrays indexed by?

I am some 'memory allocator' type code, by using an array and indexes rather than pointers. I'm hoping that the size of the index of the array is smaller than a pointer. I care because I am storing 'pointers' as integer indexes in an array rather than 64-bit pointers.
I can't see anything in the Go spec that says what an array is indexed by. Obviously it's some kind of integer. Passing very large values makes the runtime complain that I can't pass negative numbers, so I'm guessing that it's somehow cast to a signed integer. So is it an int32? I'm guessing it's not an int64 because I didn't touch the top bit (which would have been 2's compliment for a negative number).
Arrays may be indexed by any integer type.
The Array types section of the Go Programming Language Specification says that in an array type definition,
The length is part of the array's type and must be a constant
expression that evaluates to a non-negative integer value.
In an index expression such as a[x]:
x must be an integer value and 0 <= x < len(a)
But there is a limitation on the magnitude of an index; the description of Length and capacity says:
The built-in functions len and cap take arguments of various types and
return a result of type int. The implementation guarantees that the
result always fits into an int.
So the declared size of an array, or the index in an index expression, can be of any integer type (int, uint, uintptr, int8, int16, int32, int64, uint8, uint16, uint32, uint64), but it must be non-negative and within the range of type int (which is the same size as either int32 or int64 -- though it's a distinct type from either).
It's a very interesting question indeed. I have not found any direct rules in documentation too; instead I've found two great discussions in Groups.
In the first one, among many things, I've found an answer why indexes are implemented as int - but not uint:
Algorithms can benefit from the ability to express negative offsets
and such. If indexes were unsigned you'd always need a conversion in
these cases.
The second one specifically talks about possibility (but possibility only!) of using int64 for large arrays, mentioning limitations of len and cap functions (which limitations are actually mentioned in the doc):
The built-in functions len and cap take arguments of various types and
return a result of type int. The implementation guarantees that the
result always fits into an int.
I do agree, though, that more... official point of view wouldn't hurt. )
Arrays and slices are indexed by ints. An int is defined as being a 32 or 64 bit signed integer. The most common implementation (6g) uses 32 bit integers regardless of the architecture at this point in time. However, it is planed that eventually an int will be 64bit on 64bit machines and therefore the same length as a pointer.
The language spec defines 3 implementation dependent numeric types:
uint either 32 or 64 bits
int same size as uint
uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value

Hashing of pointer values

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.
So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.
This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.
Here are some more algorithms, including performance tests.
It seems that the magic words are "integer hashing".
They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.
If you know the lowest possible pointer address (which is often the case if you're working within a large buffer), just convert the pointer to an integer by subtracting the lowest possible pointer value; eg. that could be the buffer's base address.
-Remember: pointer subtracted from pointer equals an offset (integer).
So: Don't "chop off" bits; it's much better to convert to an offset.
This will result in that the offset value is much smaller than a pointer value.
It may help further to shift the pointer value right twice (eg. divide by 4) in some cases as well, before hashing it.
The problem with pointers is often that small blocks of memory is likely to be allocated on the same address (eg. a block being freed and another block is taking the freed block's place).
Why not just use an existing hash function?

Resources