Golang - ToUpper() on a single byte? - go

I have a []byte, b, and I want to select a single byte, b[pos] and change it too upper case (and then lower case) The bytes type has a method called ToUpper(). How can I use this for a single byte?
Calling ToUpper on single Byte
OneOfOne gave the most efficient (when calling thousands of times), I use
val = byte(unicode.ToUpper(rune(b[pos])))
in order to find the byte and change the value
b[pos] = val
Checking if byte is Upper
Sometimes, instead of changing the case of a byte, I want to check if a byte is upper or lower case; All the upper case roman-alphabet bytes are lower than the value of the lower case bytes.
func (b Board) isUpper(x int) bool {
return b.board[x] < []byte{0x5a}[0]
}

For a single byte/rune, you can use unicode.ToUpper.
b[pos] = byte(unicode.ToUpper(rune(b[pos])))

I want to remind OP that bytes.ToUpper() operates on unicode code points encoded using UTF-8 in a byte slice while unicode.ToUpper() operates on a single unicode code point.
By asking to convert a single byte to upper case, OP is implying that the "b" byte slice contains something other than UTF-8, perhaps ASCII-7 or some 8-bit encoding such as ISO Latin-1 (e.g.). In that case OP needs to write an ISO Latin-1 (e.g.) ToUpper() function or OP must convert the ISO Latin-1 (e.g.) bytes to UTF-8 or unicode before using the bytes.ToUpper() or unicode.ToUpper() function.
Anything less creates a pending bug. Neither of the previously mentioned functions will properly convert all possible ISO Latin-1 (e.g.) encoded characters to upper case.

Use the following code to test if an element of the board is an ASCII uppercase letter:
func (b Board) isUpper(x int) bool {
v := b.board[x]
return 'A' <= v && v <= 'Z'
}
If the application only needs to distinguish between upper and lowercase letters, then there's no need for the lower bound test:
func (b Board) isUpper(x int) bool {
return b.board[x] <= 'Z'
}
The code in this answer improves on the code in the question in a few ways:
The code in the answer returns the correct value for a board element containing 'Z' (run playground example below for demonstration).
'Z' and 0x85 are the same value, but the code is easier to understand with 'Z'.
It's simpler to compare directly with the value 'Z'. No need to create a slice.
playground example
Edit: Revamped answer based on new information in the question since time of my original answer.

You can use bytes.ToUpper, you just need to deal with making the input a slice,
and making the output a byte:
package main
import "bytes"
func main() {
b, pos := []byte("north"), 1
b[pos] = bytes.ToUpper(b)[pos]
println(string(b) == "nOrth")
}

Related

Does the conversion from string to rune slice make a copy?

I'm teaching myself Go from a C background.
The code below works as I expect (the first two Printf() will access bytes, the last two Printf() will access codepoints).
What I am not clear is if this involves any copying of data.
package main
import "fmt"
var a string
func main() {
a = "èe"
fmt.Printf("%d\n", a[0])
fmt.Printf("%d\n", a[1])
fmt.Println("")
fmt.Printf("%d\n", []rune(a)[0])
fmt.Printf("%d\n", []rune(a)[1])
}
In other words:
does []rune("string") create an array of runes and fill it with the runes corresponding to "string", or it's just the compiler that figures out how to get runes from the string bytes?
It is not possible to turn []uint8 (i.e. a string) into []int32 (an alias for []rune) without allocating an array.
Also, strings are immutable in Go but slices are not, so the conversion to both []byte and []rune must copy the string's bytes in some way or another.
It involves a copy because:
strings are immutable; if the conversion []rune(s) didn't make a copy, you would be able to index the rune slice and change the string contents
a string value is a "(possibly empty) sequence of bytes", where byte is an alias of uint8, whereas a rune is a "an integer value identifying a Unicode code point" and an alias of int32. The types are not identical and even the lengths may not be the same:
a = "èe"
r := []rune(a)
fmt.Println(len(a)) // 3 (3 bytes)
fmt.Println(len(r)) // 2 (2 Unicode code points)

How does an untyped constant '\n' get converted into a byte when passed as method arg?

I was watching this talk given at FOSDEM '17 about implementing "tail -f" in Go => https://youtu.be/lLDWF59aZAo
In the author's initial example program, he creates a Reader using a file handle, and then uses the ReadString method with delimiter '\n' to read the file line by line and print its contents. I usually use Scanner, so this was new to me.
Program below | Go Playground Link
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
fileHandle, err := os.Open("someFile.log")
if err != nil {
log.Fatalln(err)
return
}
defer fileHandle.Close()
reader := bufio.NewReader(fileHandle)
for {
line, err := reader.ReadString('\n')
if err != nil {
log.Fatalln(err)
break
}
fmt.Print(line)
}
}
Now, ReadString takes a byte as its delimiter argument[https://golang.org/pkg/bufio/#Reader.ReadString]
So my question is, how in the world did '\n', which is a rune, get converted into a byte? I am not able to get my head around this. Especially since byte is an alias for uint8, and rune is an alias for int32.
I asked the same question in Gophers slack, and was told that '\n' is not a rune, but an untyped constant. If we actually created a rune using '\n' and passed it in, the compilation would fail. This actually confused me a bit more.
I was also given a link to a section of the Go spec regarding Type Identity => https://golang.org/ref/spec#Type_identity
If the program is not supposed to compile if it were an actual rune, why does the compiler allow an untyped constant to go through? Isn't this unsafe behaviour?
My guess is that this works due to a rule in the Assignability section in the Go spec, which says
x is an untyped constant representable by a value of type T.
Since '\n' can indeed be assigned to a variable of type byte, it is therefore converted.
Is my reasoning correct?
TL;DR Yes you are correct but there's something more.
'\n' is an untyped rune constant. It doesn't have a type but a default type which is int32 (rune is an alias for int32). It holds a single byte representing the literal "\n", which is the numeric value 10:
package main
import (
"fmt"
)
func main() {
fmt.Printf("%T %v %c\n", '\n', '\n', '\n') // int32 10 (newline)
}
https://play.golang.org/p/lMjrTFDZUM
The part of the spec that answers your question lies in the § Calls (emphasis mine):
Given an expression f of function type F,
f(a1, a2, … an)
calls f with arguments a1, a2, … an. Except for one
special case, arguments must be single-valued expressions assignable
to the parameter types of F and are evaluated before the function is
called.
"assignable" is the key term here and the part of the spec you quoted explains what it means. As you correctly guessed, among the various rules of assignability, the one that applies here is the following:
x is an untyped constant representable by a value of type T.
In our case this translates to:
'\n' is an untyped (rune) constant representable by a value of type byte
The fact that '\n' is actually converted to a byte when calling ReadString() is more apparent if we try passing an untyped rune constant wider than 1 byte, to a function that expects a byte:
package main
func main() {
foo('α')
}
func foo(b byte) {}
https://play.golang.org/p/W0EUZppWHH
The code above fails with:
tmp/sandbox120896917/main.go:9: constant 945 overflows byte
That's because 'α' is actually 2 bytes, which means it cannot be converted to a value of type byte (the maximum integer a byte can hold is 255 while 'α' is actually 945).
All this is explained in the official blog post, Constants.
Yes, your reading is correct. Spec: Assignability section applies here as the value you want to pass must be assignable to the type of the parameter.
When you pass the value '\n', that is an untyped constant specified by a rune literal. It represents a number equal to the Unicode code of the '\n' character (which is 10 by the way). The rule you quoted applies here:
x is an untyped constant representable by a value of type T.
Constants have a default type, which will be used when a type is "missing" from the context where the value is used. Such an example is the short variable declaration:
r := '\n'
fmt.Printf("%T", r)
The default type of a rune literal is that: rune. The above code prints int32 because the rune type is an alias for int32 (they are "identical", interchangable). Try it on the Go Playground.
Now if you try to pass the variable r to a function which expects a value of type byte, it is a compile time error, because this case matches none of the assignability rules. You need explicit type conversion to make such a case work:
r := '\n'
line, err := reader.ReadString(byte(r))
See related blog posts and questions:
Spec: Constants
The Go Blog: Constants
Defining a variable in Go programming language
Custom type passed to function as a parameter
Why do these two float64s have different values?
Does go compiler's evaluation differ for constant expression and other expression

golang how does the rune() function work

I came across a function posted online that used the rune() function in golang, but I am having a hard time looking up what it is. I am going through the tutorial and inexperienced with the docs so it is hard to find what I am looking for.
Specifically, I am trying to see why this fails...
fmt.Println(rune("foo"))
and this does not
fmt.Println([]rune("foo"))
rune is a type in Go. It's just an alias for int32, but it's usually used to represent Unicode points. rune() isn't a function, it's syntax for type conversion into rune. Conversions in Go always have the syntax type() which might make them look like functions.
The first bit of code fails because conversion of strings to numeric types isn't defined in Go. However conversion of strings to slices of runes/int32s is defined like this in language specification:
Converting a value of a string type to a slice of runes type yields a
slice containing the individual Unicode code points of the string.
[golang.org]
So your example prints a slice of runes with values 102, 111 and 111
As stated in #Michael's first-rate comment fmt.Println([]rune("foo")) is a conversion of a string to a slice of runes []rune. When you convert from string to []rune, each utf-8 char in that string becomes a Rune. See https://stackoverflow.com/a/51611567/12817546. Similarly, in the reverse conversion, when converted from []rune to string, each rune becomes a utf-8 char in the string. See https://stackoverflow.com/a/51611567/12817546. A []rune can also be set to a byte, float64, int or a bool.
package main
import (
. "fmt"
)
func main() {
r := []rune("foo")
c := []interface{}{byte(r[0]), float64(r[0]), int(r[0]), r, string(r), r[0] != 0}
checkType(c)
}
func checkType(s []interface{}) {
for k, _ := range s {
Printf("%T %v\n", s[k], s[k])
}
}
byte(r[0]) is set to “uint8 102”, float64(r[0]) is set to “float64 102”,int(r[0]) is set to “int 102”, r is the rune” []int32 [102 111 111]”, string(r) prints “string foo”, r[0] != 0 and shows “bool true”.
[]rune to string conversion is supported natively by the spec. See the comment in https://stackoverflow.com/a/46021588/12817546. In Go then a string is a sequence of bytes. However, since multiple bytes can represent a rune code-point, a string value can also contain runes. So, it can be converted to a []rune , or vice versa. See https://stackoverflow.com/a/19325804/12817546.
Note, there are only two built-in type aliases in Go, byte (alias of uint8) and rune (alias of int32). See https://Go101.org/article/type-system-overview.html. Rune literals are just 32-bit integer values. For example, the rune literal 'a' is actually the number "97". See https://stackoverflow.com/a/19311218/12817546. Quotes edited.

Why is there no type mismatch error?

I define the number to be input by user as var input float64 and I input an integer and I would expect to get an error but I get err = <nil>. What am I missing?
package main
import (
"fmt"
)
func main() {
var input float64
fmt.Print("Enter a number:")
n, err := fmt.Scanf("%f\n", &input)
fmt.Printf("err = %v\n", err)
if err != nil {
fmt.Printf("%v is not a float - exiting with error\n", input, err)
return
}
fmt.Printf("n is %v:", n)
}
This is the output:
C:\Go\src\play\exercise>go run exercise2.go
Enter a number to take its square root: 1
err = <nil>
n is 1:
The Go Programming Language Specification
Conversions
Specific rules apply to (non-constant) conversions between numeric
types.
Conversions between numeric types
For the conversion of non-constant numeric values, the following rules
apply:
When converting between integer types, if the value is a signed
integer, it is sign extended to implicit infinite precision;
otherwise it is zero extended. It is then truncated to fit in the
result type's size. For example, if v := uint16(0x10F0), then
uint32(int8(v)) == 0xFFFFFFF0. The conversion always yields a valid
value; there is no indication of overflow.
When converting a floating-point number to an integer, the fraction
is discarded (truncation towards zero).
When converting an integer or floating-point number to a
floating-point type, or a complex number to another complex type,
the result value is rounded to the precision specified by the
destination type. For instance, the value of a variable x of type
float32 may be stored using additional precision beyond that of an
IEEE-754 32-bit number, but float32(x) represents the result of
rounding x's value to 32-bit precision. Similarly, x + 0.1 may use
more than 32 bits of precision, but float32(x + 0.1) does not.
In all non-constant conversions involving floating-point or complex
values, if the result type cannot represent the value the conversion
succeeds but the result value is implementation-dependent.
In Go, you can convert from an integer type to a floating-point type. Therefore, there is no reason not to be forgiving.
Robustness principle
In computing, the robustness principle is a general design guideline
for software:
Be conservative in what you do, be liberal in what you accept from others
(often reworded as "Be conservative in what you send, be
liberal in what you accept").
The principle is also known as Postel's law, after Internet pioneer
Jon Postel, who wrote in an early specification of the Transmission
Control Protocol that:1
TCP implementations should follow a general principle of robustness: be conservative in > what you do, be liberal in what you
accept from others.
In other words, code that sends commands or data to other machines (or
to other programs on the same machine) should conform completely to
the specifications, but code that receives input should accept
non-conformant input as long as the meaning is clear.
Scanf scans text read from standard input, storing successive space-separated values into successive arguments as determined by the format. It returns the number of items successfully scanned.
Scanf returns the number of things it read. In this case n == 1 because... you entered one token followed by a newline. Presumably, you want the value of input, not n.

How to be definite about the number of whitespace fmt.Fscanf consumes?

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
Whitespace.
The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
A single whitespace character (usually a newline).
I try to decode this header with the fmt.Fscanf function. The following call to
fmt.Fscanf parses the header (not addressing the caveat explained below):
var magic string
var width, height, maxVal uint
fmt.Fscanf(input,"%2s %d %d %d",&magic,&width,&height,&maxVal)
The documentation of fmt states:
Note: Fscan etc. can read one character (rune) past the input they
return, which means that a loop calling a scan routine may skip some
of the input. This is usually a problem only when there is no space
between input values. If the reader provided to Fscan implements
ReadRune, that method will be used to read characters. If the reader
also implements UnreadRune, that method will be used to save the
character and successive calls will not lose data. To attach ReadRune
and UnreadRune methods to a reader without that capability, use
bufio.NewReader.
As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.
Some testing suggests that parsing a dummy character at the end solves the issues:
var magic string
var width, height, maxVal uint
var dummy byte
fmt.Fscanf(input,"%2s %d %d %d%c",&magic,&width,&height,&maxVal,&dummy)
Is that guaranteed to work according to the specification?
No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.
By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.
buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.
Edit:
As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:
func TestFmtBehavior(t *testing.T) {
// use multireader to prevent r from implementing io.RuneScanner
r := io.MultiReader(bytes.NewReader([]byte("data ")))
n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
if n != 2 || err != nil {
t.Error("failed scan", n, err)
}
// the dummy char read 1 extra char past "data".
// one byte should still remain
if n, err := r.Read(make([]byte, 5)); n != 1 {
t.Error("assertion failed", n, err)
}
}

Resources