How does an untyped constant '\n' get converted into a byte when passed as method arg? - go

I was watching this talk given at FOSDEM '17 about implementing "tail -f" in Go => https://youtu.be/lLDWF59aZAo
In the author's initial example program, he creates a Reader using a file handle, and then uses the ReadString method with delimiter '\n' to read the file line by line and print its contents. I usually use Scanner, so this was new to me.
Program below | Go Playground Link
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
fileHandle, err := os.Open("someFile.log")
if err != nil {
log.Fatalln(err)
return
}
defer fileHandle.Close()
reader := bufio.NewReader(fileHandle)
for {
line, err := reader.ReadString('\n')
if err != nil {
log.Fatalln(err)
break
}
fmt.Print(line)
}
}
Now, ReadString takes a byte as its delimiter argument[https://golang.org/pkg/bufio/#Reader.ReadString]
So my question is, how in the world did '\n', which is a rune, get converted into a byte? I am not able to get my head around this. Especially since byte is an alias for uint8, and rune is an alias for int32.
I asked the same question in Gophers slack, and was told that '\n' is not a rune, but an untyped constant. If we actually created a rune using '\n' and passed it in, the compilation would fail. This actually confused me a bit more.
I was also given a link to a section of the Go spec regarding Type Identity => https://golang.org/ref/spec#Type_identity
If the program is not supposed to compile if it were an actual rune, why does the compiler allow an untyped constant to go through? Isn't this unsafe behaviour?
My guess is that this works due to a rule in the Assignability section in the Go spec, which says
x is an untyped constant representable by a value of type T.
Since '\n' can indeed be assigned to a variable of type byte, it is therefore converted.
Is my reasoning correct?

TL;DR Yes you are correct but there's something more.
'\n' is an untyped rune constant. It doesn't have a type but a default type which is int32 (rune is an alias for int32). It holds a single byte representing the literal "\n", which is the numeric value 10:
package main
import (
"fmt"
)
func main() {
fmt.Printf("%T %v %c\n", '\n', '\n', '\n') // int32 10 (newline)
}
https://play.golang.org/p/lMjrTFDZUM
The part of the spec that answers your question lies in the § Calls (emphasis mine):
Given an expression f of function type F,
f(a1, a2, … an)
calls f with arguments a1, a2, … an. Except for one
special case, arguments must be single-valued expressions assignable
to the parameter types of F and are evaluated before the function is
called.
"assignable" is the key term here and the part of the spec you quoted explains what it means. As you correctly guessed, among the various rules of assignability, the one that applies here is the following:
x is an untyped constant representable by a value of type T.
In our case this translates to:
'\n' is an untyped (rune) constant representable by a value of type byte
The fact that '\n' is actually converted to a byte when calling ReadString() is more apparent if we try passing an untyped rune constant wider than 1 byte, to a function that expects a byte:
package main
func main() {
foo('α')
}
func foo(b byte) {}
https://play.golang.org/p/W0EUZppWHH
The code above fails with:
tmp/sandbox120896917/main.go:9: constant 945 overflows byte
That's because 'α' is actually 2 bytes, which means it cannot be converted to a value of type byte (the maximum integer a byte can hold is 255 while 'α' is actually 945).
All this is explained in the official blog post, Constants.

Yes, your reading is correct. Spec: Assignability section applies here as the value you want to pass must be assignable to the type of the parameter.
When you pass the value '\n', that is an untyped constant specified by a rune literal. It represents a number equal to the Unicode code of the '\n' character (which is 10 by the way). The rule you quoted applies here:
x is an untyped constant representable by a value of type T.
Constants have a default type, which will be used when a type is "missing" from the context where the value is used. Such an example is the short variable declaration:
r := '\n'
fmt.Printf("%T", r)
The default type of a rune literal is that: rune. The above code prints int32 because the rune type is an alias for int32 (they are "identical", interchangable). Try it on the Go Playground.
Now if you try to pass the variable r to a function which expects a value of type byte, it is a compile time error, because this case matches none of the assignability rules. You need explicit type conversion to make such a case work:
r := '\n'
line, err := reader.ReadString(byte(r))
See related blog posts and questions:
Spec: Constants
The Go Blog: Constants
Defining a variable in Go programming language
Custom type passed to function as a parameter
Why do these two float64s have different values?
Does go compiler's evaluation differ for constant expression and other expression

Related

How does unkeyed literals prevention work

Per this doc
type Point struct {
X, Y float64
_ struct{} // to prevent unkeyed literals
}
For Point{X: 1, Y: 1} everything will be fine, but for Point{1,1} you will get a compile error:
./file.go:1:11: too few values in &Pointer literal
Then I tried it in another data type _ byte and _ func() as below
type Pointer struct {
X, Y int
//_ byte // to prevent unkeyed literals
//_ func() // to prevent unkeyed literals
}
Both of them could prevent unkeyed literals. How does it prevent unkeyed literal? Is _ struct{} more efficient?
Unkeyed structs require you to specify all struct keys; it is an error if you don't specify the value for Y for example:
type Point struct {
X, Y float64
}
_ = Point{1}
// Output:
// ./main.go:8:8: too few values in Point{...}
The _ struct{} field doesn't really prevent unkeyed literals from the same package, as you can still do:
type Point struct {
X, Y float64
_ struct{} // to prevent unkeyed literals
}
_ = Point{1, 2, struct{}{}}
// Ugly and weird, but valid!
But in order to be able to assign values in struct fields from other packages they need to be "exported", that is, start with a capital letter, and _ doesn't, so this is an error:
_ = x.Point{1, 2, struct{}{}}
// Output
// ./main.go:6:28: implicit assignment of unexported field '_' in x.Point literal
There is nothing special about _; you can use anything else that doesn't start with a capital as well, such as noexport struct{} or whatnot.
Why struct{} and not byte or int? Well, those types allocate some amount of memory; for an int it's usually 8 bytes (or 4 bytes on a 32bit system), and byte is an alias for uint8 and allocates one byte.
struct{} on the other hand is an "empty" type (you can't assign anything to it) and won't use any memory. This is a very small optimisation, but if you're going to type something you might as well type struct{}.
Is all of this worth it? In my opinion it's not; if someone wants to use unkeyed struct literals with your library then that's their choice. Many lint tools will already warn on this, including the built-in go vet:
$ go vet main.go
./main.go:8:6: net/mail.Address composite literal uses unkeyed fields
How does it prevent unkeys literals works?
An unkeyed struct literal must specify all fields; by adding a field that cannot be specified from outside the package, it makes it impossible to use this format, so it requires a keyed literal. "Keyed" or "unkeyed" refers to whether the field names appear in the struct literal.
Does _ struct{} more efficient?
Yes, because it has a width of zero, so it doesn't consume any memory. All other types would increase the memory footprint of the struct unnecessarily.

Converting between rune and byte (slice)

Go allows conversion from rune to byte. But the underlying type for rune is int32 (because Go uses UTF-8) and for byte it is uint8, the conversion therefore results in a loss of information. However it is not possible to convert from a rune to []byte.
var b byte = '©'
bs := []byte(string('©'))
fmt.Println(b)
fmt.Println(bs)
// Output
169
[194 169]
Working example
Why does Go allow conversion from rune to byte instead of rune to []byte?
Go supports conversion from rune to byte as it does for all pairs of numeric types. It would be a surprising special case if int32 to byte conversion was not allowed.
But the underlying type for rune is int32 (because Go uses UTF-8)
This misses an important detail: rune is an alias for int32. They are the same type.
It's true that the underlying type is rune is int32, but that's because rune and int32 are the same type and the underlying type of a builtin type is the type itself.
The representation of Unicode code points as int32 values is unrelated to UTF-8 encoding.
the conversion therefore results in a loss of information
Yes, conversions between numeric types can result in loss of information. This is one reason why conversions in Go must be explicit.
Note that the statement var b byte = '©' does not do any conversions. The expression '#' is an untyped constant.
The compiler reports an error if the assignment of an untyped constant results in a loss of information. For example, the statement var b byte = '世' causes a compilation error.
All UTF-8 encoding functionality in the language is related to the string type. The UTF-8 aware conversions are all to or from the string type. The []byte(numericType) conversion could be supported, but that would bring UTF-8 encoding outside of the string type.
The Go authors regret including the string(numericType) conversion because it's not very useful in practice and the conversion is not what some people expect. A library function is a better place for the functionality.
Yes it is possible to convert from a rune to []byte (for example via a byte) and back again.
package main
import "fmt"
func main() {
var b byte = '©'
bs := []byte{b}
fmt.Printf("%T %v\n", b, b) // uint8 169
fmt.Printf("%T %v\n", bs, bs) // []uint8 [169]
s := string(bs[0]) // s := string(b) works too.
r2 := rune(s[0]) // r2 := rune(b) works too.
fmt.Printf("%T %v\n", s, s) // string ©
fmt.Printf("%T %v\n", r2, r2) // int32 169
}
The reason for this behaviour is the same reason why it's legal to do
var b int32
b = 1000000
fmt.Printf("%b\n", b)
fmt.Printf("%b", uint8(b))
// Output:
// 11110100001001000000
// 1000000
You should expect the conversion to loose data when you put data of a type with larger memory footprint into one with a smaller memory footprint.
Also, for encoding a rune you can use EncodeRune which indeed uses a []byte.

Why does %T not print the type of my constant?

I'm just learning golang using the official tour/tutorial. In one of the examples, I see a note that says An untyped constant takes the type needed by its context.
I'm trying this:
package main
import "fmt"
const (
// Create a huge number by shifting a 1 bit left 100 places.
// In other words, the binary number that is 1 followed by 100 zeroes.
Big = 1 << 100
)
func main() {
fmt.Printf("Big is of type %T\n", Big)
}
But this fails when I run it, with:
# command-line-arguments
./compile63.go:12:13: constant 1267650600228229401496703205376 overflows int
Why am I unable to discover the type of the constant this way? (Please note I'm a total noob and quite possibly haven't yet discovered enough about the language to be able to solve this myself).
func Printf
func Printf(format string, a ...interface{}) (n int, err error)
Printf formats according to a format specifier and writes to standard
output. It returns the number of bytes written and any write error
encountered.
The Go Programming Language
Specification
Variable
declarations
A variable declaration creates one or more variables, binds
corresponding identifiers to them, and gives each a type and an
initial value.
If a list of expressions is given, the variables are initialized with
the expressions following the rules for assignments. Otherwise, each
variable is initialized to its zero value.
If a type is present, each variable is given that type. Otherwise,
each variable is given the type of the corresponding initialization
value in the assignment. If that value is an untyped constant, it is
first converted to its default type; if it is an untyped boolean
value, it is first converted to type bool. The predeclared value nil
cannot be used to initialize a variable with no explicit type.
Constants
Constants may be typed or untyped. Literal constants, true, false,
iota, and certain constant expressions containing only untyped
constant operands are untyped.
A constant may be given a type explicitly by a constant declaration or
conversion, or implicitly when used in a variable declaration or an
assignment or as an operand in an expression. It is an error if the
constant value cannot be represented as a value of the respective
type.
An untyped constant has a default type which is the type to which the
constant is implicitly converted in contexts where a typed value is
required, for instance, in a short variable declaration such as i := 0
where there is no explicit type. The default type of an untyped
constant is bool, rune, int, float64, complex128 or string
respectively, depending on whether it is a boolean, rune, integer,
floating-point, complex, or string constant.
Numeric types
A numeric type represents sets of integer or floating-point values.
Some predeclared architecture-independent numeric types:
int32 the set of all signed 32-bit integers (-2147483648 to 2147483647)
int64 the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807)
The value of an n-bit integer is n bits wide and represented using
two's complement arithmetic.
There are also some predeclared numeric types with
implementation-specific sizes:
uint either 32 or 64 bits
int same size as uint
Big is an untyped constant. There is no type to discover. It is given a type when used in a variable or assignment. The default type for the untyped constant Big is int.
const Big = 1 << 100
In Go, all arguments are passed by value as if by assignment. For fmt.Printf, the second argument is of type interface{}. Therefore, equivalently,
var arg2 interface{} = Big // constant 1267650600228229401496703205376 overflows int
fmt.Printf("Big is of type %T\n", arg2)
The default type for an untyped integer constant is int. Overflow is a compile-time error.
For example,
package main
import "fmt"
const (
// Create a huge number by shifting a 1 bit left 100 places.
// In other words, the binary number that is 1 followed by 100 zeroes.
Big = 1 << 100
)
func main() {
var arg2 interface{} = Big
fmt.Printf("Big is of type %T\n", arg2)
fmt.Printf("Big is of type %T\n", Big)
}
Playground: https://play.golang.org/p/9tynPTek3wN
Output:
prog.go:12:6: constant 1267650600228229401496703205376 overflows int
prog.go:15:13: constant 1267650600228229401496703205376 overflows int
Reference: The Go Blog: Constants
While it is true that an untyped constant takes the type needed by its context, the type it can assume is limited by the primitives of the language, so a constant that big is not actually usable anywhere in the code, due to the fact that it does not fit even in an uint64. The only use it could have would be that of using it in another constant expression, because otherwise that error will always be thrown.
Note that in Printf (and similar functions), the constant is converted to an interface{}, and thus it takes a type of int by default. For 32-bit machines, you need to do a type conversion first if you have a constant expression which overflows int32.
const i = 1 << 50
fmt.Println(i) // => constant 1125899906842624 overflows int
fmt.Println(int64(i)) // => 1125899906842624
If you want to do proper arithmetic on arbitrarily big numbers, there is a handy package: math/big.
i := big.NewInt(1)
i.Lsh(i, 100)
fmt.Println(i.String())
Playground

golang how does the rune() function work

I came across a function posted online that used the rune() function in golang, but I am having a hard time looking up what it is. I am going through the tutorial and inexperienced with the docs so it is hard to find what I am looking for.
Specifically, I am trying to see why this fails...
fmt.Println(rune("foo"))
and this does not
fmt.Println([]rune("foo"))
rune is a type in Go. It's just an alias for int32, but it's usually used to represent Unicode points. rune() isn't a function, it's syntax for type conversion into rune. Conversions in Go always have the syntax type() which might make them look like functions.
The first bit of code fails because conversion of strings to numeric types isn't defined in Go. However conversion of strings to slices of runes/int32s is defined like this in language specification:
Converting a value of a string type to a slice of runes type yields a
slice containing the individual Unicode code points of the string.
[golang.org]
So your example prints a slice of runes with values 102, 111 and 111
As stated in #Michael's first-rate comment fmt.Println([]rune("foo")) is a conversion of a string to a slice of runes []rune. When you convert from string to []rune, each utf-8 char in that string becomes a Rune. See https://stackoverflow.com/a/51611567/12817546. Similarly, in the reverse conversion, when converted from []rune to string, each rune becomes a utf-8 char in the string. See https://stackoverflow.com/a/51611567/12817546. A []rune can also be set to a byte, float64, int or a bool.
package main
import (
. "fmt"
)
func main() {
r := []rune("foo")
c := []interface{}{byte(r[0]), float64(r[0]), int(r[0]), r, string(r), r[0] != 0}
checkType(c)
}
func checkType(s []interface{}) {
for k, _ := range s {
Printf("%T %v\n", s[k], s[k])
}
}
byte(r[0]) is set to “uint8 102”, float64(r[0]) is set to “float64 102”,int(r[0]) is set to “int 102”, r is the rune” []int32 [102 111 111]”, string(r) prints “string foo”, r[0] != 0 and shows “bool true”.
[]rune to string conversion is supported natively by the spec. See the comment in https://stackoverflow.com/a/46021588/12817546. In Go then a string is a sequence of bytes. However, since multiple bytes can represent a rune code-point, a string value can also contain runes. So, it can be converted to a []rune , or vice versa. See https://stackoverflow.com/a/19325804/12817546.
Note, there are only two built-in type aliases in Go, byte (alias of uint8) and rune (alias of int32). See https://Go101.org/article/type-system-overview.html. Rune literals are just 32-bit integer values. For example, the rune literal 'a' is actually the number "97". See https://stackoverflow.com/a/19311218/12817546. Quotes edited.

what's wrong with golang constant overflows uint64

userid := 12345
did := (userid & ^(0xFFFF << 48))
when compiling this code, I got:
./xxxx.go:511: constant -18446462598732840961 overflows int
Do you know what is the matter with this and how to solve it ?
Thanks.
^(0xFFFF << 48) is an untyped constant, which in go is an arbitrarily large value.
0xffff << 48 is 0xffff000000000000. When you negate it, you get -0xffff000000000001 (since with two's complement, -x = ^x + 1, or ^x = -(x + 1)).
When you write userid := 12345, userid gets the type int. Then when you try to and (&) it with the untyped constant -0xffff000000000001 the compiler figures that this constant needs to be an int. At this point, the compiler complains because the value is too large in magnitude to be an int.
If you're trying to get the constant 0x0000ffffffffffff, then you can use 1<<48 - 1, which (if you've got 64-bit ints), will fit. Since your code will never work if int is 32-bits, then you should probably use int64 in your code rather than int to make it portable.
The blog post https://blog.golang.org/constants explains how constants work, and some background on why they are the way they are.
The Go Programming Language Specification
Constants
Numeric constants represent values of arbitrary precision and do not
overflow.
Constants may be typed or untyped.
A constant may be given a type explicitly by a constant declaration or
conversion, or implicitly when used in a variable declaration or an
assignment or as an operand in an expression. It is an error if the
constant value cannot be represented as a value of the respective
type.
An untyped constant has a default type which is the type to which the
constant is implicitly converted in contexts where a typed value is
required, for instance, in a short variable declaration such as i := 0
where there is no explicit type. The default type of an untyped
constant is bool, rune, int, float64, complex128 or string
respectively, depending on whether it is a boolean, rune, integer,
floating-point, complex, or string constant.
Numeric types
int is an implementation-specific size, either 32 or 64 bits.
userid is of type int. For example,
package main
import "fmt"
func main() {
userid := 12345
did := uint64(userid) & ^uint64(0xFFFF<<48)
fmt.Println(userid, did)
}
Output:
12345 12345

Resources