String literal in struct is longer than the same string outside struct - go

I'm running into a really weird issue in my Go code. It seems that identical strings, one declared inside a struct and one outside, have different lengths when used. The following code shows an example:
type evaluateTest struct {
name string
expected int
fen string
}
func TestEvaluate(t *testing.T) {
cases := []evaluateTest{
{"Pawn testing", 330, "​​8/8/8/8/4P3/3P4/2P5/8 w KQkq - 0 11"},
}
for _, test := range cases {
outside := "​8/8/8/8/4P3/3P4/2P5/8 w KQkq - 0 11"
fmt.Printf("String in struct has length %v\n", len(test.fen))
fmt.Printf("String outside struct has length %v\n", len(outside))
This outputs:
String in struct has length 41
String outside struct has length 38
Looping through the string and printing the character codes gives junk characters in the first three positions (decimal 226, 128, 139) of the string in the struct, and none in the one declared outside.
I'm really at a loss as to what's going on here. Any help is very appreciated.

One string starts with two zero width spaces (\u200b). The other starts with one.
In situations like this, it's helpful to print using %q to see what's going on. See the playground example.

Related

golang, £ char causing weird  character

I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £
I've reproduced it to the following minimal example:
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
I would expect this to return
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*
But it doesn't, it produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*
^
where did you come from ?
if I take the £ sign out of the original validChars string, that weird A goes away.
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
This produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!$%^&*
A string is a type alias for []byte. Your mental model of a string is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune.
For many runes in your validChars string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £ rune is represented as 2 bytes.
Now if we consider a string £, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £. When you convert it back to a string, it gives you an unexpected rune.
The fix for your problem is to first convert string validChars to a []rune. Then, you can access its individual runes (rather than bytes) by index, and foo will work as expected. You can see it in action in this playground.
Also note that len(validChars) will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString instead.
Finally, here's a blog post from Rob Pike on the subject that you may find interesting.

How does an untyped constant '\n' get converted into a byte when passed as method arg?

I was watching this talk given at FOSDEM '17 about implementing "tail -f" in Go => https://youtu.be/lLDWF59aZAo
In the author's initial example program, he creates a Reader using a file handle, and then uses the ReadString method with delimiter '\n' to read the file line by line and print its contents. I usually use Scanner, so this was new to me.
Program below | Go Playground Link
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
fileHandle, err := os.Open("someFile.log")
if err != nil {
log.Fatalln(err)
return
}
defer fileHandle.Close()
reader := bufio.NewReader(fileHandle)
for {
line, err := reader.ReadString('\n')
if err != nil {
log.Fatalln(err)
break
}
fmt.Print(line)
}
}
Now, ReadString takes a byte as its delimiter argument[https://golang.org/pkg/bufio/#Reader.ReadString]
So my question is, how in the world did '\n', which is a rune, get converted into a byte? I am not able to get my head around this. Especially since byte is an alias for uint8, and rune is an alias for int32.
I asked the same question in Gophers slack, and was told that '\n' is not a rune, but an untyped constant. If we actually created a rune using '\n' and passed it in, the compilation would fail. This actually confused me a bit more.
I was also given a link to a section of the Go spec regarding Type Identity => https://golang.org/ref/spec#Type_identity
If the program is not supposed to compile if it were an actual rune, why does the compiler allow an untyped constant to go through? Isn't this unsafe behaviour?
My guess is that this works due to a rule in the Assignability section in the Go spec, which says
x is an untyped constant representable by a value of type T.
Since '\n' can indeed be assigned to a variable of type byte, it is therefore converted.
Is my reasoning correct?
TL;DR Yes you are correct but there's something more.
'\n' is an untyped rune constant. It doesn't have a type but a default type which is int32 (rune is an alias for int32). It holds a single byte representing the literal "\n", which is the numeric value 10:
package main
import (
"fmt"
)
func main() {
fmt.Printf("%T %v %c\n", '\n', '\n', '\n') // int32 10 (newline)
}
https://play.golang.org/p/lMjrTFDZUM
The part of the spec that answers your question lies in the § Calls (emphasis mine):
Given an expression f of function type F,
f(a1, a2, … an)
calls f with arguments a1, a2, … an. Except for one
special case, arguments must be single-valued expressions assignable
to the parameter types of F and are evaluated before the function is
called.
"assignable" is the key term here and the part of the spec you quoted explains what it means. As you correctly guessed, among the various rules of assignability, the one that applies here is the following:
x is an untyped constant representable by a value of type T.
In our case this translates to:
'\n' is an untyped (rune) constant representable by a value of type byte
The fact that '\n' is actually converted to a byte when calling ReadString() is more apparent if we try passing an untyped rune constant wider than 1 byte, to a function that expects a byte:
package main
func main() {
foo('α')
}
func foo(b byte) {}
https://play.golang.org/p/W0EUZppWHH
The code above fails with:
tmp/sandbox120896917/main.go:9: constant 945 overflows byte
That's because 'α' is actually 2 bytes, which means it cannot be converted to a value of type byte (the maximum integer a byte can hold is 255 while 'α' is actually 945).
All this is explained in the official blog post, Constants.
Yes, your reading is correct. Spec: Assignability section applies here as the value you want to pass must be assignable to the type of the parameter.
When you pass the value '\n', that is an untyped constant specified by a rune literal. It represents a number equal to the Unicode code of the '\n' character (which is 10 by the way). The rule you quoted applies here:
x is an untyped constant representable by a value of type T.
Constants have a default type, which will be used when a type is "missing" from the context where the value is used. Such an example is the short variable declaration:
r := '\n'
fmt.Printf("%T", r)
The default type of a rune literal is that: rune. The above code prints int32 because the rune type is an alias for int32 (they are "identical", interchangable). Try it on the Go Playground.
Now if you try to pass the variable r to a function which expects a value of type byte, it is a compile time error, because this case matches none of the assignability rules. You need explicit type conversion to make such a case work:
r := '\n'
line, err := reader.ReadString(byte(r))
See related blog posts and questions:
Spec: Constants
The Go Blog: Constants
Defining a variable in Go programming language
Custom type passed to function as a parameter
Why do these two float64s have different values?
Does go compiler's evaluation differ for constant expression and other expression

golang how does the rune() function work

I came across a function posted online that used the rune() function in golang, but I am having a hard time looking up what it is. I am going through the tutorial and inexperienced with the docs so it is hard to find what I am looking for.
Specifically, I am trying to see why this fails...
fmt.Println(rune("foo"))
and this does not
fmt.Println([]rune("foo"))
rune is a type in Go. It's just an alias for int32, but it's usually used to represent Unicode points. rune() isn't a function, it's syntax for type conversion into rune. Conversions in Go always have the syntax type() which might make them look like functions.
The first bit of code fails because conversion of strings to numeric types isn't defined in Go. However conversion of strings to slices of runes/int32s is defined like this in language specification:
Converting a value of a string type to a slice of runes type yields a
slice containing the individual Unicode code points of the string.
[golang.org]
So your example prints a slice of runes with values 102, 111 and 111
As stated in #Michael's first-rate comment fmt.Println([]rune("foo")) is a conversion of a string to a slice of runes []rune. When you convert from string to []rune, each utf-8 char in that string becomes a Rune. See https://stackoverflow.com/a/51611567/12817546. Similarly, in the reverse conversion, when converted from []rune to string, each rune becomes a utf-8 char in the string. See https://stackoverflow.com/a/51611567/12817546. A []rune can also be set to a byte, float64, int or a bool.
package main
import (
. "fmt"
)
func main() {
r := []rune("foo")
c := []interface{}{byte(r[0]), float64(r[0]), int(r[0]), r, string(r), r[0] != 0}
checkType(c)
}
func checkType(s []interface{}) {
for k, _ := range s {
Printf("%T %v\n", s[k], s[k])
}
}
byte(r[0]) is set to “uint8 102”, float64(r[0]) is set to “float64 102”,int(r[0]) is set to “int 102”, r is the rune” []int32 [102 111 111]”, string(r) prints “string foo”, r[0] != 0 and shows “bool true”.
[]rune to string conversion is supported natively by the spec. See the comment in https://stackoverflow.com/a/46021588/12817546. In Go then a string is a sequence of bytes. However, since multiple bytes can represent a rune code-point, a string value can also contain runes. So, it can be converted to a []rune , or vice versa. See https://stackoverflow.com/a/19325804/12817546.
Note, there are only two built-in type aliases in Go, byte (alias of uint8) and rune (alias of int32). See https://Go101.org/article/type-system-overview.html. Rune literals are just 32-bit integer values. For example, the rune literal 'a' is actually the number "97". See https://stackoverflow.com/a/19311218/12817546. Quotes edited.

string() does what I hoped strconv.Itoa() would do

I have a short program that converts a few binary numbers into their ASCII equivalents. I tried translating this into go today and found that strconv.Itoa() doesn't work as I expected.
// translate Computer History Museum t-shirt
// http://i.ebayimg.com/images/g/qksAAOSwaB5XjsI1/s-l300.jpg
package main
import (
"fmt"
"strconv"
)
func main() {
var binaryStrings [3]string
binaryStrings = [3]string{"01000011","01001000","01001101"}
for _,bin := range binaryStrings {
if decimal, err := strconv.ParseInt(bin, 2, 64); err != nil {
fmt.Println(err)
} else {
letter := strconv.Itoa(int(decimal))
fmt.Println(bin, decimal, letter, string(decimal))
}
}
}
which outputs
$ go run chm-tshirt.go
01000011 67 67 C
01001000 72 72 H
01001101 77 77 M
So it seems like string() is doing what I thought strconv.Itoa() would do. I was expecting the third column to show what I get in the fourth column. Is this a bug or what am I missing?
strconv.Itoa formats an integer as a decimal string. Example: strconv.Itoa(65) and strconv.Itoa('A') return the string "65".
string(intValue) yields a string containing the UTF-8 representation of the integer. Example: string('A') and string(65) evaluate to the string "A".
Experience has shown that many people erroneously expect string(intValue) to return the decimal representation of the integer value. Because this expectation is so common, the Go 1.15 version of go vet warns about string(intValue) conversions when the type of the integer value is not rune or byte (read details here).

How to check if Scanln throws an error in Golang

I am new to Go. I've been searching for answers and I know there really is one I just haven't found it.
To better explain my question, here is my code:
func main() {
...
inputs := new(Inputs)
fmt.Println("Input two numbers: ")
fmt.Scanln(&inputs.A)
fmt.Scanln(&inputs.B)
fmt.Println("Sum is:", inputs.A + inputs.B)
}
And here is my struct:
type Inputs struct {
A, B int
}
If I will input '123' for Input A and another '123' for Input B, I will have an output of "Sum is: 246". But if I will mistakenly input '123j' , it will no longer work since A and B accept int(s) only.
Now, how to catch the panic from fmt.Scanln or is there a way? Thanks in advance.
Scanln returns values ... don't ignore them.
You're ignoring two important return values. The count of the scan and an error .. if there was one.
n, err := fmt.Scanln(&inputs.A)
...will give you what you need to check. err will tell you that a newline was expected and wasn't found .. because you've tried to store the value in an int .. and it errors when the last character in the available value isn't a newline.

Resources