Cant explain why "55" is converted to "7" - go

package main
import (
"fmt"
"strconv"
)
func main() {
v := "55"
if s, err := strconv.Atoi(v); err == nil {
fmt.Println(string(v)) // 55
fmt.Println(s) // 55
fmt.Println(string(s)) // 7
}
}
https://play.golang.org/p/8V1npFiC9iH

s is an integer with the value 55, which is the ASCII (and UTF-8) encoding of the character "7". That's what's printed from the last statement.

When you call s, err := strconv.Atoi("55") you turn s into an integer. When you do string(s) afterwards, you're asking for a string that contains the character represented by that integer.
That character happens to be '7'. Try v := "65" and you'll get 'A', etc.

Related

How to convert strings to lower case in GO?

I am new to the language GO and working on an assignment where i should write a code that return the word frequencies of the text. However I know that the words 'Hello', 'HELLO' and 'hello' are all counted as 'hello', so I need to convert all strings to lower case.
I know that I should use strings.ToLower(), however I dont know where I should Included that in the class. Can someone please help me?
package main
import (
"fmt"
"io/ioutil"
"log"
"strings"
"time"
)
const DataFile = "loremipsum.txt"
// Return the word frequencies of the text argument.
func WordCount(text string) map[string]int {
fregs := make(map[string]int)
words := strings.Fields(text)
for _, word := range words {
fregs[word] += 1
}
return fregs
}
// Benchmark how long it takes to count word frequencies in text numRuns times.
//
// Return the total time elapsed.
func benchmark(text string, numRuns int) int64 {
start := time.Now()
for i := 0; i < numRuns; i++ {
WordCount(text)
}
runtimeMillis := time.Since(start).Nanoseconds() / 1e6
return runtimeMillis
}
// Print the results of a benchmark
func printResults(runtimeMillis int64, numRuns int) {
fmt.Printf("amount of runs: %d\n", numRuns)
fmt.Printf("total time: %d ms\n", runtimeMillis)
average := float64(runtimeMillis) / float64(numRuns)
fmt.Printf("average time/run: %.2f ms\n", average)
}
func main() {
// read in DataFile as a string called data
data, err:= ioutil.ReadFile("loremipsum.txt")
if err != nil {
log.Fatal(err)
}
// Convert []byte to string and print to screen
text := string(data)
fmt.Println(text)
fmt.Printf("%#v",WordCount(string(data)))
numRuns := 100
runtimeMillis := benchmark(string(data), numRuns)
printResults(runtimeMillis, numRuns)
}
You should convert words to lowercase when you are using them as map key
for _, word := range words {
fregs[strings.ToLower(word)] += 1
}
I get [a:822 a.:110 I want all a in the same. How do i a change the code so that a and a. is the same? – hello123
You need to carefully define a word. For example, a string of consecutive letters and numbers converted to lowercase.
func WordCount(s string) map[string]int {
wordFunc := func(r rune) bool {
return !unicode.IsLetter(r) && !unicode.IsNumber(r)
}
counts := make(map[string]int)
for _, word := range strings.FieldsFunc(s, wordFunc) {
counts[strings.ToLower(word)]++
}
return counts
}
to remove all non-word characters you could use a regular expression:
package main
import (
"bufio"
"fmt"
"log"
"regexp"
"strings"
)
func main() {
str1 := "This is some text! I want to count each word. Is it cool?"
re, err := regexp.Compile(`[^\w]`)
if err != nil {
log.Fatal(err)
}
str1 = re.ReplaceAllString(str1, " ")
scanner := bufio.NewScanner(strings.NewReader(str1))
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println(strings.ToLower(scanner.Text()))
}
}
See strings.EqualFold.
Here is an example.

Handling Unicode in string search

Suppose I have a string containing Unicode characters. For example:
s := "foo 日本 foo!"
I'm trying to find the last occurrence foo in the string:
index := strings.LastIndex(s, "foo")
The expected result here would be 7 but this will return 11 as the index due to the Unicode in the string.
Is there a way to handle this using standard library functions?
You're encountering the difference between runes in go and bytes. Strings are composed of bytes, not runes. If you haven't learned about this, you should read https://blog.golang.org/strings.
Here's my version of a quick function to calculate the number of runes preceding the last match of a substring in a string. The basic approach is to find the byte index, then iterate/count through the strings runes until that number of bytes have been consumed.
I'm not aware of a standard library method that will do this directly.
package main
import (
"fmt"
"strings"
)
func LastRuneIndex(s, substr string) (int, error) {
byteIndex := strings.LastIndex(s, substr)
if byteIndex < 0 {
return byteIndex, nil
}
reader := strings.NewReader(s)
count := 0
for byteIndex > 0 {
_, bytes, err := reader.ReadRune()
if err != nil {
return 0, err
}
byteIndex = byteIndex - bytes
count += 1
}
return count, nil
}
func main() {
s := "foo 日本 foo!"
count, err := LastRuneIndex(s, "foo")
fmt.Println(count, err)
// outputs:
// 7 <nil>
}
This gets pretty close:
package main
import (
"golang.org/x/text/language"
"golang.org/x/text/search"
)
func main() {
m := search.New(language.English)
start, end := m.IndexString("foo 日本 foo!", "foo")
println(start == 0, end == 3)
}
buts it's searching forward. I tried this:
m.IndexString("foo 日本 foo!", "foo", search.Backwards)
but I get this result:
panic: TODO: implement
https://pkg.go.dev/golang.org/x/text/search
https://github.com/golang/text/blob/v0.3.6/search/search.go#L222-L223

Illegal base64 data at input byte for seemingly valid png

I'm attempting to decode a data URL that was generated from a javascript canvas' toDataURL function.
The following golang application fails with the error illegal base64 data at input byte 129)
package main
import (
"encoding/base64"
"fmt"
"net/url"
"strings"
)
func main() {
pngData := "iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABHNCSVQICAgIfAhkiAAAALVJREFUGFdt0MsKQVEYhuG9CeU0VgamihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQhhHUTTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/o99s5Qf3v0sUAHK///JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuqV82oHOrphNEPw3UwfBVmbU4AAAAASUVORK5CYII="
pngData, err := url.PathUnescape(pngData)
if err != nil {
fmt.Printf("Failed to unescape", err.Error())
return
}
pngData = strings.Replace(pngData, "+", "", -1)
_, err = base64.URLEncoding.WithPadding(base64.NoPadding).DecodeString(pngData)
if err != nil {
fmt.Printf("Failed to decode", err.Error())
}
}
If I pass the value from pngData into a web-based base64 to png converter, it has no problem generating the image. (a horizontal line of white-ish values)
I have tried StdEncoding, RawURLEncoding, and their Raw counterparts. I've also tried with or without padding and I've tried the same pngData string with an additional = and without the trailing =.
Any thoughts on why Golang is refusing to decode this data?
Some of the images I get from the canvas decode just fine. But some, like this one, do not.
Steven Penny's answer shows a way to do this, but I have to ask:
Why do you call url.PathUnescape? The data contain no path escape characters (no %-encoding). The call is harmless but unnecessary.
Why did you use the alternate encoding (URLEncoding)? As we see in the base64 package documentation, the difference between the standard encoding and the alternate encoding is that the alternate encoding uses - and _ in place of + and /. But if we look at the data string, it contains plus signs and slashes, and has no dashes or underscores, so it has clearly been encoded with the standard encoding.
Why did you call for base64.NoPadding? The input data ends with =, which is a padding character.
Why did you call for base64.NoPadding via base64.URLEncoding.WithPadding(base64.NoPadding)? The documentation shows us that this can be spelled base64.RawURLEncoding.
Why did you explicitly ask to strip out + characters (not a good idea) but not / characters?
If we drop all of those (and split up a long input line for posting purposes) we get this (playground link):
package main
import (
"encoding/base64"
"fmt"
)
func main() {
data := "iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABH" +
"NCSVQICAgIfAhkiAAAALVJREFUGFdt0MsKQVEYhuG9CeU0Vgam" +
"ihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQh" +
"hHUTTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+" +
"vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/o99s5Qf3v0sUAHK/" +
"//JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuq" +
"V82oHOrphNEPw3UwfBVmbU4AAAAASUVORK5CYII="
b, err := base64.StdEncoding.DecodeString(data)
if err != nil {
fmt.Printf("Failed to decode: %s\n", err)
} else {
fmt.Printf("bytes begin with: %q\n", b[0:4])
}
}
This seems to work fine:
package main
import (
"encoding/base64"
"image"
"image/png"
"os"
"strings"
)
func main() {
s := `iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABHNCSVQICAgIfAhkiAAAALVJ
REFUGFdt0MsKQVEYhuG9CeU0VgamihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQhhHU
TTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/
o99s5Qf3v0sUAHK///JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuqV82oHOrphNEPw3U
wfBVmbU4AAAAASUVORK5CYII=`
d := base64.NewDecoder(base64.StdEncoding, strings.NewReader(s))
p, e := png.Decode(d)
if e != nil {
panic(e)
}
c, e := os.Create("a.png")
if e != nil {
panic(e)
}
png.Encode(c, p.(*image.NRGBA))
}

Golang index an array of strings

Hi I have a string that says this.
"Style: Saison
ABV: 7.7
IBU: 20"
I try to split it into an array so that I can get Saison
Here is how I convert to array.
style :=strings.Split(style, "Style:")
When I do
style[0]
It doesn't index Saison. I also tried style[1] and style[2] and nothing happens. What am I doing wrong?
Style = []string so it is a list of strings right?
You could use strings.FieldsFunc:
FieldsFunc splits the string s at each run of Unicode code points c
satisfying f(c) and returns an array of slices of s. If all code
points in s satisfy f(c) or the string is empty, an empty slice is
returned.
FieldsFunc makes no guarantees about the order in which it calls f(c)
and assumes that f always returns the same value for a given c.
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
str := `Style: Saison Drink
ABV: 7.7
IBU: 20`
f := func(c rune) bool {
return c == ':' || c == '\n'
}
strFields := strings.FieldsFunc(str, f)
fmt.Printf("%q\n", strFields)
styleValue := strings.TrimSpace(strFields[1])
fmt.Println(styleValue)
abvValue, err := strconv.ParseFloat(strings.TrimSpace(strFields[3]), 32)
if err != nil {
fmt.Println("Error parsing float!")
}
fmt.Printf("%.2f\n", abvValue)
ibuValue, err := strconv.ParseInt(strings.TrimSpace(strFields[5]), 10, 32)
if err != nil {
fmt.Println("Error parsing int!")
}
fmt.Printf("%d\n", ibuValue)
}
Output:
["Style" " Saison Drink" "ABV" " 7.7" "IBU" " 20"]
Saison Drink
7.70
20

String to UCS-2

I want to translate in Go my python program to convert an unicode string to a UCS-2 HEX string.
In python, it's quite simple:
u"Bien joué".encode('utf-16-be').encode('hex')
-> 004200690065006e0020006a006f007500e9
I am a beginner in Go and the simplest way I found is:
package main
import (
"fmt"
"strings"
)
func main() {
str := "Bien joué"
fmt.Printf("str: %s\n", str)
ucs2HexArray := []rune(str)
s := fmt.Sprintf("%U", ucs2HexArray)
a := strings.Replace(s, "U+", "", -1)
b := strings.Replace(a, "[", "", -1)
c := strings.Replace(b, "]", "", -1)
d := strings.Replace(c, " ", "", -1)
fmt.Printf("->: %s", d)
}
str: Bien joué
->: 004200690065006E0020006A006F007500E9
Program exited.
I really think it's clearly not efficient. How can-I improve it?
Thank you
Make this conversion a function then you can easily improve the conversion algorithm in the future. For example,
package main
import (
"fmt"
"strings"
"unicode/utf16"
)
func hexUTF16FromString(s string) string {
hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
}
func main() {
str := "Bien joué"
fmt.Println(str)
hex := hexUTF16FromString(str)
fmt.Println(hex)
}
Output:
Bien joué
004200690065006e0020006a006f007500e9
NOTE:
You say "convert an unicode string to a UCS-2 string" but your Python example uses UTF-16:
u"Bien joué".encode('utf-16-be').encode('hex')
The Unicode Consortium
UTF-16 FAQ
Q: What is the difference between UCS-2 and UTF-16?
A: UCS-2 is obsolete terminology which refers to a Unicode
implementation up to Unicode 1.1, before surrogate code points and
UTF-16 were added to Version 2.0 of the standard. This term should now
be avoided.
UCS-2 does not describe a data format distinct from UTF-16, because
both use exactly the same 16-bit code unit representations. However,
UCS-2 does not interpret surrogate code points, and thus cannot be
used to conformantly represent supplementary characters.
Sometimes in the past an implementation has been labeled "UCS-2" to
indicate that it does not support supplementary characters and doesn't
interpret pairs of surrogate code points as characters. Such an
implementation would not handle processing of character properties,
code point boundaries, collation, etc. for supplementary characters.
For anything other than trivially short input (and possibly even then), I'd use the golang.org/x/text/encoding/unicode package to convert to UTF-16 (as #peterSo and #JimB point out, slightly different from obsolete UCS-2).
The advantage (over unicode/utf16) of using this (and the golang.org/x/text/transform package) is that you get BOM support, big or little endian, and that you can encode/decode short strings or bytes, but you can also apply this as a filter to an io.Reader or to an io.Writer to transform your data as you process it instead of all up front (i.e. for a large stream of data you don't need to have it all in memory at once).
E.g.:
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"log"
"strings"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
const input = "Bien joué"
func main() {
// Get a `transform.Transformer` for encoding.
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
t := e.NewEncoder()
// For decoding, allows a Byte Order Mark at the start to
// switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE)
// otherwise we use `e` (UTF-16BE without BOM):
t2 := unicode.BOMOverride(e.NewDecoder())
_ = t2 // we don't show/use this
// If you have a string:
str := input
outstr, n, err := transform.String(t, str)
if err != nil {
log.Fatal(err)
}
fmt.Printf("string: n=%d, bytes=%02x\n", n, []byte(outstr))
// If you have a []byte:
b := []byte(input)
outbytes, n, err := transform.Bytes(t, b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("bytes: n=%d, bytes=%02x\n", n, outbytes)
// If you have an io.Reader for the input:
ir := strings.NewReader(input)
r := transform.NewReader(ir, t)
// Now just read from r as you normal would and the encoding will
// happen as you read, good for large sources to avoid pre-encoding
// everything. Here we'll just read it all in one go though which negates
// that benefit (normally avoid ioutil.ReadAll).
outbytes, err = ioutil.ReadAll(r)
if err != nil {
log.Fatal(err)
}
fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes)
// If you have an io.Writer for the output:
var buf bytes.Buffer
w := transform.NewWriter(&buf, t)
_, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever
if err != nil {
log.Fatal(err)
}
fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes())
}
// Whichever of these you need you could of
// course put in a single simple function. E.g.:
// NewUTF16BEWriter returns a new writer that wraps w
// by transforming the bytes written into UTF-16-BE.
func NewUTF16BEWriter(w io.Writer) io.Writer {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
return transform.NewWriter(w, e.NewEncoder())
}
// ToUTFBE converts UTF8 `b` into UTF-16-BE.
func ToUTF16BE(b []byte) ([]byte, error) {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
out, _, err := transform.Bytes(e.NewEncoder(), b)
return out, err
}
Gives:
string: n=10, bytes=004200690065006e0020006a006f007500e9
bytes: n=10, bytes=004200690065006e0020006a006f007500e9
reader: len=18, bytes=004200690065006e0020006a006f007500e9
writer: len=18, bytes=004200690065006e0020006a006f007500e9
The standard library has the built-in utf16.Encode() (https://golang.org/pkg/unicode/utf16/#Encode) function for this purpose.

Resources