How can I encode my string as ASCII byte array?
If you're looking for a conversion, just do byteArray := []byte(myString)
The language spec details conversions between strings and certain types of arrays (byte for bytes, int for Unicode points)
You may not need to do anything. If you only need to read bytes of a string, you can do that directly:
c := s[3]
cthom06's answer gives you a byte slice you can manipulate:
b := []byte(s)
b[3] = c
Then you can create a new string from the modified byte slice if you like:
s = string(b)
But you mentioned ASCII. If your string is ASCII to begin with, then you are done. If it contains something else, you have more to deal with and might want to post another question with more details about your data.
Related
I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £
I've reproduced it to the following minimal example:
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
I would expect this to return
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*
But it doesn't, it produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!£$%^&*
^
where did you come from ?
if I take the £ sign out of the original validChars string, that weird A goes away.
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
This produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~#:!$%^&*
A string is a type alias for []byte. Your mental model of a string is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune.
For many runes in your validChars string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £ rune is represented as 2 bytes.
Now if we consider a string £, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £. When you convert it back to a string, it gives you an unexpected rune.
The fix for your problem is to first convert string validChars to a []rune. Then, you can access its individual runes (rather than bytes) by index, and foo will work as expected. You can see it in action in this playground.
Also note that len(validChars) will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString instead.
Finally, here's a blog post from Rob Pike on the subject that you may find interesting.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying to ensure a string coming from an http request is valid for use in a base64 url param. I've been experimenting with base64.RawURLEncoding as I assumed encoding an invalid string would throw an err, or at least decoding the result of this would fail, however it quite happily encodes/decodes the string regardless of the input.
https://play.golang.org/p/3sHUfl2NSJK
I have created the above playground showing the issue I'm having (albeit an extreme example). Is there another way of ascertaining whether a string consists entirely of valid base64 characters?
To clarify, Base64 is an encoding scheme which allows you to take arbitrary binary data and safely encode it into ASCII characters which can later be decoded into the original binary string.
That means that the "Base64-encode" operation can take literally any input and produce valid, encoded data. However, the "Base64-decode" operation will fail if its input string contains characters outside of set of ASCII characters that the encoding uses (meaning that the given string was not produced by a valid Base64-encoder).
To test if a string contains a valid Base64 encoded sequence, you just need to call base64.Encoding.DecodeString(...) and test if the error is "nil".
For example (Go Playground):
func IsValidBase64(s string) bool {
_, err := base64.StdEncoding.DecodeString(s)
return err == nil
}
func main() {
ss := []string{"ABBA", "T0sh", "Foo=", "Bogus\x01"}
for _, s := range ss {
if IsValidBase64(s) {
fmt.Printf("OK: valid Base64 %q\n", s)
} else {
fmt.Printf("ERR: invalid Base64 %q\n", s)
}
}
// OK: valid Base64 "ABBA"
// OK: valid Base64 "T0sh"
// OK: valid Base64 "Foo="
// ERR: invalid Base64 "Bogus\x01"
}
base64 encoding works by interpreting an arbitrary bit stream as a string of 6-bit integers, which are then mapped one-by-one to the chosen base64 alphabet.
Your example string starts with these 8-bit bytes:
11000010 10111010 11000010 10101010 11100010 10000000
Re-arrange them into 6-bit numbers:
110000 101011 101011 000010 101010 101110 001010 000000
And map them to a base64 alphabet (here URL encoding):
w r r C q u K A
Since every 6-bit number can be mapped to a character in the alphabet (there's exactly 64 of them), there are no invalid inputs to base64. This is precisely what base64 is used for: turn arbitrary input into printable ASCII characters.
Decoding, on the other hand, can and will fail if the input contains bytes outside of the base64 alphabet — they can't be mapped back to the 6-bit integer.
I came across a function posted online that used the rune() function in golang, but I am having a hard time looking up what it is. I am going through the tutorial and inexperienced with the docs so it is hard to find what I am looking for.
Specifically, I am trying to see why this fails...
fmt.Println(rune("foo"))
and this does not
fmt.Println([]rune("foo"))
rune is a type in Go. It's just an alias for int32, but it's usually used to represent Unicode points. rune() isn't a function, it's syntax for type conversion into rune. Conversions in Go always have the syntax type() which might make them look like functions.
The first bit of code fails because conversion of strings to numeric types isn't defined in Go. However conversion of strings to slices of runes/int32s is defined like this in language specification:
Converting a value of a string type to a slice of runes type yields a
slice containing the individual Unicode code points of the string.
[golang.org]
So your example prints a slice of runes with values 102, 111 and 111
As stated in #Michael's first-rate comment fmt.Println([]rune("foo")) is a conversion of a string to a slice of runes []rune. When you convert from string to []rune, each utf-8 char in that string becomes a Rune. See https://stackoverflow.com/a/51611567/12817546. Similarly, in the reverse conversion, when converted from []rune to string, each rune becomes a utf-8 char in the string. See https://stackoverflow.com/a/51611567/12817546. A []rune can also be set to a byte, float64, int or a bool.
package main
import (
. "fmt"
)
func main() {
r := []rune("foo")
c := []interface{}{byte(r[0]), float64(r[0]), int(r[0]), r, string(r), r[0] != 0}
checkType(c)
}
func checkType(s []interface{}) {
for k, _ := range s {
Printf("%T %v\n", s[k], s[k])
}
}
byte(r[0]) is set to “uint8 102”, float64(r[0]) is set to “float64 102”,int(r[0]) is set to “int 102”, r is the rune” []int32 [102 111 111]”, string(r) prints “string foo”, r[0] != 0 and shows “bool true”.
[]rune to string conversion is supported natively by the spec. See the comment in https://stackoverflow.com/a/46021588/12817546. In Go then a string is a sequence of bytes. However, since multiple bytes can represent a rune code-point, a string value can also contain runes. So, it can be converted to a []rune , or vice versa. See https://stackoverflow.com/a/19325804/12817546.
Note, there are only two built-in type aliases in Go, byte (alias of uint8) and rune (alias of int32). See https://Go101.org/article/type-system-overview.html. Rune literals are just 32-bit integer values. For example, the rune literal 'a' is actually the number "97". See https://stackoverflow.com/a/19311218/12817546. Quotes edited.
I have a []byte, b, and I want to select a single byte, b[pos] and change it too upper case (and then lower case) The bytes type has a method called ToUpper(). How can I use this for a single byte?
Calling ToUpper on single Byte
OneOfOne gave the most efficient (when calling thousands of times), I use
val = byte(unicode.ToUpper(rune(b[pos])))
in order to find the byte and change the value
b[pos] = val
Checking if byte is Upper
Sometimes, instead of changing the case of a byte, I want to check if a byte is upper or lower case; All the upper case roman-alphabet bytes are lower than the value of the lower case bytes.
func (b Board) isUpper(x int) bool {
return b.board[x] < []byte{0x5a}[0]
}
For a single byte/rune, you can use unicode.ToUpper.
b[pos] = byte(unicode.ToUpper(rune(b[pos])))
I want to remind OP that bytes.ToUpper() operates on unicode code points encoded using UTF-8 in a byte slice while unicode.ToUpper() operates on a single unicode code point.
By asking to convert a single byte to upper case, OP is implying that the "b" byte slice contains something other than UTF-8, perhaps ASCII-7 or some 8-bit encoding such as ISO Latin-1 (e.g.). In that case OP needs to write an ISO Latin-1 (e.g.) ToUpper() function or OP must convert the ISO Latin-1 (e.g.) bytes to UTF-8 or unicode before using the bytes.ToUpper() or unicode.ToUpper() function.
Anything less creates a pending bug. Neither of the previously mentioned functions will properly convert all possible ISO Latin-1 (e.g.) encoded characters to upper case.
Use the following code to test if an element of the board is an ASCII uppercase letter:
func (b Board) isUpper(x int) bool {
v := b.board[x]
return 'A' <= v && v <= 'Z'
}
If the application only needs to distinguish between upper and lowercase letters, then there's no need for the lower bound test:
func (b Board) isUpper(x int) bool {
return b.board[x] <= 'Z'
}
The code in this answer improves on the code in the question in a few ways:
The code in the answer returns the correct value for a board element containing 'Z' (run playground example below for demonstration).
'Z' and 0x85 are the same value, but the code is easier to understand with 'Z'.
It's simpler to compare directly with the value 'Z'. No need to create a slice.
playground example
Edit: Revamped answer based on new information in the question since time of my original answer.
You can use bytes.ToUpper, you just need to deal with making the input a slice,
and making the output a byte:
package main
import "bytes"
func main() {
b, pos := []byte("north"), 1
b[pos] = bytes.ToUpper(b)[pos]
println(string(b) == "nOrth")
}
I'm having some trouble while reading a file which has a fixed column length format. Some columns may contain umlauts.
Umlauts seem to use 2 bytes instead of one. This is not the behaviour I was expecting. Is there any kind of function which returns a substring? Slice does not seem to work in this case.
Here's some sample code:
http://play.golang.org/p/ZJ1axy7UXe
umlautsString := "Rhön"
fmt.Println(len(umlautsString))
fmt.Println(umlautsString[0:4])
Prints:
5
Rhö
In go, a slice of a string counts bytes, not runes. This is why "Rhön"[0:3] gives you Rh and the first byte of ö.
Characters encoded in UTF-8 are represented as runes because UTF-8 encodes characters in more than one
byte (up to four bytes) to provide a bigger range of characters.
If you want to slice a string with the [] syntax, convert the string to []rune before.
Example (on play):
umlautsString := "Rhön"
runes = []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rhö
Noteworthy: This golang blog post about string representation in go.
You can convert string to []rune and work with it:
package main
import "fmt"
func main() {
umlautsString := "Rhön"
fmt.Println(len(umlautsString))
subStrRunes:= []rune(umlautsString)
fmt.Println(len(subStrRunes))
fmt.Println(string(subStrRunes[0:4]))
}
http://play.golang.org/p/__WfitzMOJ
Hope that helps!
Another option is the utf8string package:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString("🧡💛💚💙💜")
// example 1
n := s.RuneCount()
println(n == 5)
// example 2
t := s.Slice(0, 2)
println(t == "🧡💛")
}
https://pkg.go.dev/golang.org/x/exp/utf8string