Better to compare slices or bytes? - go

I'm just curious on which of these methods is better (or if there's an even better one that I'm missing). I'm trying to determine if the first letter and last letter of a word are the same, and there are two obvious solutions to me.
if word[:1] == word[len(word)-1:]
or
if word[0] == word[len(word)-1]
As I understand it, the first is just pulling slices of the string and doing a string comparison, while the second is pulling the character from either end and comparing as bytes.
I'm curious if there's a performance difference between the two, and if there's any "preferable" way to do this?

In Go, strings are UTF-8 encoded. UTF-8 is a variable-length encoding.
package main
import "fmt"
func main() {
word := "世界世"
fmt.Println(word[:1] == word[len(word)-1:])
fmt.Println(word[0] == word[len(word)-1])
}
Output:
false
false
If you really want to compare a byte, not a character, then be as precise as possible for the compiler. Obviously, compare a byte, not a slice.
BenchmarkSlice-4 200000000 7.55 ns/op
BenchmarkByte-4 2000000000 1.08 ns/op
package main
import "testing"
var word = "word"
func BenchmarkSlice(b *testing.B) {
for i := 0; i < b.N; i++ {
if word[:1] == word[len(word)-1:] {
}
}
}
func BenchmarkByte(b *testing.B) {
for i := 0; i < b.N; i++ {
if word[0] == word[len(word)-1] {
}
}
}

If by letter you mean rune, then use:
func eqRune(s string) bool {
if s == "" {
return false // or true if that makes more sense for the app
}
f, _ := utf8.DecodeRuneInString(s) // 2nd return value is rune size. ignore it.
l, _ := utf8.DecodeLastRuneInString(s) // 2nd return value is rune size. ignore it.
if f != l {
return false
}
if f == unicode.ReplacementChar {
// First and last are invalid UTF-8. Fallback to
// comparing bytes.
return s[0] == s[len(s)-1]
}
return true
}
If you mean byte, then use:
func eqByte(s string) bool {
if s == "" {
return false // or true if that makes more sense for the app
}
return s[0] == s[len(s)-1]
}
Comparing individual bytes is faster than comparing string slices as shown by the benchmark in another answer.
playground example

A string is a sequence of bytes. Your method works if you know the string contains only ASCII characters. Otherwise, you should use a method that handles multibyte characters instead of string indexing. You can convert it to a rune slice to process code points or characters, like this:
r := []rune(s)
return r[0] == r[len(r) - 1]
You can read more about strings, byte slices, runes, and code points in the official Go Blog post on the subject.
To answer your question, there's no significant performance difference between the two index expressions you posted.
Here's a runnable example:
package main
import "fmt"
func EndsMatch(s string) bool {
r := []rune(s)
return r[0] == r[len(r) - 1]
}
func main() {
tests := []struct{
s string
e bool
}{
{"foo", false},
{"eve", true},
{"世界世", true},
}
for _, t := range tests {
r := EndsMatch(t.s)
if r != t.e {
fmt.Printf("EndsMatch(%s) failed: expected %t, got %t\n", t.s, t.e, r)
}
}
}
Prints nothing.

Related

Golang code to check if first word can be formed from second word

I tried below golang code to check if first string can be formed from second string. Is there any improvement that can be done on this code?
package main
import (
"fmt"
"strings"
)
func main() {
words := []string{"hello", "ellhoo"}
result := "NO"
s := words[0]
for i := 0; i < len(words[0]); i++ {
if strings.Contains(words[1], string(s[i])) == false {
result = "NO"
break
} else {
result = "YES"
words[1] = strings.Replace(words[1],string(s[i]),"",1)
}
}
fmt.Println(result)
}
Record the count of each rune in the source string in a map. For each rune in the target string, fail if count in map is zero. Decrement count.
Here's the code:
// canmake reports whether t can constructed from the runes in s.
func canmake(t, s string) bool {
m := map[rune]int{}
for _, r := range s {
m[r]++
}
for _, r := range t {
if m[r] == 0 {
return false
}
m[r]--
}
return true
}
Here's an example showing how to use it:
func main() {
fmt.Println(canmake("hello", "ellhoo"))
fmt.Println(canmake("hello", "elhoo")) // insufficent number of l
fmt.Println(canmake("hello", "elloo")) // mising h
}

how to invoke a function from the result of another function

package main
import "fmt"
func Reverse(str string) string {
r := ""
for i := len(str) - 1; i >= 0; i-- {
r += string(str[i])
// fmt.Println(r)
}
return r
}
func Generate(str string) string {
str = Reverse(str)
// vowel := ""
for _, rne := range str {
if rne == 'a' {
str += "A"
}
if rne == 'e' {
str += "E"
}
if rne == 'i' {
str += "I"
}
if rne == 'o' {
str += "O"
}
if rne == 'u' {
str += "U"
}
}
return Reverse(str)
}
func main() {
fmt.Println(...("haigolang123"))
}
This program will accept a logic from the previous function, then combine it with the next function.
I wondering how to invoke a function from the result of another function.
expect output is "321gnAlOgIAh"
I didn't get why you are trying to reverse the string twice if your input is haigolang123 and expected output is 321gnAlOgIAh. Let's refactor step by step.
For vowels, if all you needed to do is convert lower case to upper, you can direct subtract number 32 from rune (since 'a'=97 & 'A'=65). So, use a function to common out the check.
func in(c rune, list []rune) bool {
for _, l := range list {
if c == l {
return true
}
}
return false
}
This can check as follows:
vowelsLower := []rune{'a', 'e', 'i', 'o', 'u'}
# Some code here
if in(c, vowelsLower) {
result += string(c-32)
}
There are many ways to append strings, refer here when working particularly with strings. However, we are working with runes. It is easier to append it to a byte slice. Looking at the bigger picture, []byte can be directly converted to string when needed.
var result []byte
# Some code here
if in(c, vowelsLower) {
result = append(result, byte(c-diff))
}
While returning,
return string(result)
This is your code with these changes.
Additionally, why to iterate twice (once in Generate, and again in Reverse). Try reverse iterating and do the vowel case switching. The noticeable difference of this approach is it uses bytes directly.
Range over string gives rune. Slicing the string gives byte. Of course they can be typecasted from one another.
Since we were already using bytes in previous approach, the code looks like this.
Happy coding!!
In Go, write
package main
import "fmt"
func toUpper(r rune) rune {
switch r {
case 'a', 'e', 'i', 'o', 'u':
r &= 0b1101_1111
}
return r
}
func Generate(s string) string {
g := []rune(s)
for i, j := 0, len(g)-1; i <= j; i, j = i+1, j-1 {
g[i], g[j] = toUpper(g[j]), toUpper(g[i])
}
return string(g)
}
func main() {
s := "haigolang123"
fmt.Printf("%q\n", s)
g := Generate(s)
fmt.Printf("%q\n", g)
}
https://go.dev/play/p/pGRas6qsi8O
"haigolang123"
"321gnAlOgIAh"
Go is designed for efficient solutions.
In Go, strings are immutable. concatenating strings a and b creates a new string of length len(a) + len(b) and copies both a and b to the new string. It can get expensive.
Testing characters for all the vowels, even after you have matched one, is unnecessary.
Refactor your functional decomposition of Generate to include reversing a string while using a toUpper function for vowels.

How one can do case insensitive sorting using sort.Strings() in Golang?

Is there any way to pass the custom function in the sort.Strings() to do the case-insensitive sorting on the list of strings?
data := []string{"A", "b", "D", "c"}
The output should be: A, b, c, D
The equivalent of the above requirement in Python is like :
li = sorted(data, key=lambda s: s.lower())
Do we have something like that in golang?
The translation of the Python code to Go is:
sort.Slice(data, func(i, j int) bool { return strings.ToLower(data[i]) < strings.ToLower(data[j]) })
Run it on the Go Playground.
This approach, like the Python code in the question, can allocate two strings for each comparison. The allocations are probably OK for the example in the question, but can be a problem in other scenarios.
To avoid allocations, compare the strings rune by rune:
func lessLower(sa, sb string) bool {
for {
rb, nb := utf8.DecodeRuneInString(sb)
if nb == 0 {
// The number of runes in sa is greater than or
// equal to the number of runes in sb. It follows
// that sa is not less than sb.
return false
}
ra, na := utf8.DecodeRuneInString(sa)
if na == 0 {
// The number of runes in sa is less than the
// number of runes in sb. It follows that sa
// is less than sb.
return true
}
rb = unicode.ToLower(rb)
ra = unicode.ToLower(ra)
if ra != rb {
return ra < rb
}
// Trim rune from the beginning of each string.
sa = sa[na:]
sb = sb[nb:]
}
}
⋮
sort.Slice(data, func(i, j int) bool { return lessLower(data[i], data[j]) })
Run it on the Go Playground.
Take a look at the collate package if you need to sort by language or culture specific sort orders.
The solution below is more verbose and more performant. The main difference is that in the other answers, using strings.ToLower at each comparison allocates some memory, and the code below takes care of comparing runes without creating any new string.
// lessCaseInsensitive compares s, t without allocating
func lessCaseInsensitive(s, t string) bool {
for {
if len(t) == 0 {
return false
}
if len(s) == 0 {
return true
}
c, sizec := utf8.DecodeRuneInString(s)
d, sized := utf8.DecodeRuneInString(t)
lowerc := unicode.ToLower(c)
lowerd := unicode.ToLower(d)
if lowerc < lowerd {
return true
}
if lowerc > lowerd {
return false
}
s = s[sizec:]
t = t[sized:]
}
}
sort.Slice(data, func(i, j int) bool { return lessCaseInsensitive(data[i], data[j]) })
You can see in this benchmark for example that avoiding allocs makes the case-insensitive sorting 5x faster.
You need a type that implements sort.Interface.
https://play.golang.org/p/JTm0AjuxCRV

How can I easily get a substring in Go while guarding against "slice bounds out of range" error?

Using Go, I want to truncate long strings to an arbitrary length (e.g. for logging).
const maxLen = 100
func main() {
myString := "This string might be longer, so we'll keep all except the first 100 bytes."
fmt.Println(myString[:10]) // Prints the first 10 bytes
fmt.Println(myString[:maxLen]) // panic: runtime error: slice bounds out of range
}
For now, I can solve it with an extra variable and if statement, but that seems very long-winded:
const maxLen = 100
func main() {
myString := "This string might be longer, so we'll keep all except the first 100 bytes."
limit := len(myString)
if limit > maxLen {
limit = maxLen
}
fmt.Println(myString[:limit]) // Prints the first 100 bytes, or the whole string if shorter
}
Is there a shorter/cleaner way?
Use a simple function to hide the implementation details. For example,
package main
import "fmt"
func maxString(s string, max int) string {
if len(s) > max {
r := 0
for i := range s {
r++
if r > max {
return s[:i]
}
}
}
return s
}
func main() {
s := "日本語"
fmt.Println(s)
fmt.Println(maxString(s, 2))
}
Output:
日本語
日本
Assuming you want to keep at most maxLen characters, i.e. what your code says, rather than what your string says.
If you don't need the original myString, you can overwrite it like this:
const maxLen = 100
func main() {
myString := "This string might be longer, so we'll keep the first 100 bytes."
if len(myString) >= maxLen {
myString = myString[:maxLen] // slicing is a constant time operation in go
}
fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}
This might cut unicode characters in half, leaving some garbage at the end. If you need to handle multi-byte unicode, which you probably do, try this:
func main() {
myString := "日本語"
mid := maxLen
for len(myString) >= mid && utf8.ValidString(myString[:mid]) == false {
mid++ // add another byte from myString until we have a whole multi-byte character
}
if len(myString) > mid {
myString = myString[:mid]
}
fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}
Or, if you can accept removing up to one character from the output, this version is a bit cleaner
func main() {
myString := "日本語"
for len(myString) >= maxLen || utf8.ValidString(myString) == false {
myString = myString[:len(myString)-1] // remove a byte
}
fmt.Println(myString) // Prints the first 100 bytes, or the whole string if shorter
}

Golang: find first character in a String that doesn't repeat

I'm trying to write a function that returns the finds first character in a String that doesn't repeat, so far I have this:
package main
import (
"fmt"
"strings"
)
func check(s string) string {
ss := strings.Split(s, "")
smap := map[string]int{}
for i := 0; i < len(ss); i++ {
(smap[ss[i]])++
}
for k, v := range smap {
if v == 1 {
return k
}
}
return ""
}
func main() {
fmt.Println(check("nebuchadnezzer"))
}
Unfortunately in Go when you iterate a map there's no guarantee of the order so every time I run the code I get a different value, any pointers?
Using a map and 2 loops :
play
func check(s string) string {
m := make(map[rune]uint, len(s)) //preallocate the map size
for _, r := range s {
m[r]++
}
for _, r := range s {
if m[r] == 1 {
return string(r)
}
}
return ""
}
The benfit of this is using just 2 loops vs multiple loops if you're using strings.ContainsRune, strings.IndexRune (each function will have inner loops in them).
Efficient (in time and memory) algorithms for grabbing all or the first unique byte http://play.golang.org/p/ZGFepvEXFT:
func FirstUniqueByte(s string) (b byte, ok bool) {
occur := [256]byte{}
order := make([]byte, 0, 256)
for i := 0; i < len(s); i++ {
b = s[i]
switch occur[b] {
case 0:
occur[b] = 1
order = append(order, b)
case 1:
occur[b] = 2
}
}
for _, b = range order {
if occur[b] == 1 {
return b, true
}
}
return 0, false
}
As a bonus, the above function should never generate any garbage. Note that I changed your function signature to be a more idiomatic way to express what you're describing. If you need a func(string) string signature anyway, then the point is moot.
That can certainly be optimized, but one solution (which isn't using map) would be:
(playground example)
func check(s string) string {
unique := ""
for pos, c := range s {
if strings.ContainsRune(unique, c) {
unique = strings.Replace(unique, string(c), "", -1)
} else if strings.IndexRune(s, c) == pos {
unique = unique + string(c)
}
}
fmt.Println("All unique characters found: ", unique)
if len(unique) > 0 {
_, size := utf8.DecodeRuneInString(unique)
return unique[:size]
}
return ""
}
This is after the question "Find the first un-repeated character in a string"
krait suggested below that the function should:
return a string containing the first full rune, not just the first byte of the utf8 encoding of the first rune.

Resources