Golang: find first character in a String that doesn't repeat - go

I'm trying to write a function that returns the finds first character in a String that doesn't repeat, so far I have this:
package main
import (
"fmt"
"strings"
)
func check(s string) string {
ss := strings.Split(s, "")
smap := map[string]int{}
for i := 0; i < len(ss); i++ {
(smap[ss[i]])++
}
for k, v := range smap {
if v == 1 {
return k
}
}
return ""
}
func main() {
fmt.Println(check("nebuchadnezzer"))
}
Unfortunately in Go when you iterate a map there's no guarantee of the order so every time I run the code I get a different value, any pointers?

Using a map and 2 loops :
play
func check(s string) string {
m := make(map[rune]uint, len(s)) //preallocate the map size
for _, r := range s {
m[r]++
}
for _, r := range s {
if m[r] == 1 {
return string(r)
}
}
return ""
}
The benfit of this is using just 2 loops vs multiple loops if you're using strings.ContainsRune, strings.IndexRune (each function will have inner loops in them).

Efficient (in time and memory) algorithms for grabbing all or the first unique byte http://play.golang.org/p/ZGFepvEXFT:
func FirstUniqueByte(s string) (b byte, ok bool) {
occur := [256]byte{}
order := make([]byte, 0, 256)
for i := 0; i < len(s); i++ {
b = s[i]
switch occur[b] {
case 0:
occur[b] = 1
order = append(order, b)
case 1:
occur[b] = 2
}
}
for _, b = range order {
if occur[b] == 1 {
return b, true
}
}
return 0, false
}
As a bonus, the above function should never generate any garbage. Note that I changed your function signature to be a more idiomatic way to express what you're describing. If you need a func(string) string signature anyway, then the point is moot.

That can certainly be optimized, but one solution (which isn't using map) would be:
(playground example)
func check(s string) string {
unique := ""
for pos, c := range s {
if strings.ContainsRune(unique, c) {
unique = strings.Replace(unique, string(c), "", -1)
} else if strings.IndexRune(s, c) == pos {
unique = unique + string(c)
}
}
fmt.Println("All unique characters found: ", unique)
if len(unique) > 0 {
_, size := utf8.DecodeRuneInString(unique)
return unique[:size]
}
return ""
}
This is after the question "Find the first un-repeated character in a string"
krait suggested below that the function should:
return a string containing the first full rune, not just the first byte of the utf8 encoding of the first rune.

Related

Golang code to check if first word can be formed from second word

I tried below golang code to check if first string can be formed from second string. Is there any improvement that can be done on this code?
package main
import (
"fmt"
"strings"
)
func main() {
words := []string{"hello", "ellhoo"}
result := "NO"
s := words[0]
for i := 0; i < len(words[0]); i++ {
if strings.Contains(words[1], string(s[i])) == false {
result = "NO"
break
} else {
result = "YES"
words[1] = strings.Replace(words[1],string(s[i]),"",1)
}
}
fmt.Println(result)
}
Record the count of each rune in the source string in a map. For each rune in the target string, fail if count in map is zero. Decrement count.
Here's the code:
// canmake reports whether t can constructed from the runes in s.
func canmake(t, s string) bool {
m := map[rune]int{}
for _, r := range s {
m[r]++
}
for _, r := range t {
if m[r] == 0 {
return false
}
m[r]--
}
return true
}
Here's an example showing how to use it:
func main() {
fmt.Println(canmake("hello", "ellhoo"))
fmt.Println(canmake("hello", "elhoo")) // insufficent number of l
fmt.Println(canmake("hello", "elloo")) // mising h
}

How to read large file by blocks with n length

I want to read and split large text file (near 3GB) to blocks with n symbols length. I was trying to read file and split using runes, but it takes a lot of memory.
func SplitSubN(s string, n int) []string {
sub := ""
subs := []string{}
runes := bytes.Runes([]byte(s))
l := len(runes)
for i, r := range runes {
sub = sub + string(r)
if (i+1)%n == 0 {
subs = append(subs, sub)
sub = ""
} else if (i + 1) == l {
subs = append(subs, sub)
}
}
return subs
}
I suppose it can be done in smarter way, like a phased reading of blocks of a certain length from file, but I don't know how to do it correctly.
Scan for rune start bytes and split based on that. This eliminates all allocations within the function except for the allocation of the result slice.
func SplitSubN(s string, n int) []string {
if len(s) == 0 {
return nil
}
m := 0
i := 0
j := 1
var result []string
for ; j < len(s); j++ {
if utf8.RuneStart(s[j]) {
if (m+1)%n == 0 {
result = append(result, s[i:j])
i = j
}
m++
}
}
if j > i {
result = append(result, s[i:j])
}
return result
}
The API specified in the question requires that the application allocate memory when converting the []byte read from the file to a string. This allocation can be avoided by changing the function to work on bytes:
func SplitSubN(s []byte, n int) [][]byte {
if len(s) == 0 {
return nil
}
m := 0
i := 0
j := 1
var result [][]byte
for ; j < len(s); j++ {
if utf8.RuneStart(s[j]) {
if (m+1)%n == 0 {
result = append(result, s[i:j])
i = j
}
m++
}
}
if j > i {
result = append(result, s[i:j])
}
return result
}
Both of these functions require that the application slurp the entire file into memory. I assume that's OK because the function in the question does as well. If you only need to process one chunk at a time, then the above code be be adapted to scan as the file is read incrementally.
Actually, the most interesting part is not parsing chunk itself but rather handling characters overlapping.
For example, if you read from a file say in chunks of N bytes but last multi-byte char is read partially (the remainder will be read in the next iteration).
Here is a solution that reads a text file by given chunks and handles characters overlapping in async manner:
package main
import (
"fmt"
"io"
"log"
"os"
"unicode/utf8"
)
func main() {
data, err := ReadInChunks("testfile", 1024*16)
competed := false
for ; !competed; {
select {
case next := <-data:
if next == nil {
competed = true
break
}
fmt.Printf(string(next))
case e := <-err:
if e != nil {
log.Fatalf("error: %s", e)
}
}
}
}
func ReadInChunks(path string, readChunkSize int) (data chan []rune, err chan error) {
var readChanel = make(chan []rune)
var errorChanel = make(chan error)
onDone := func() {
close(readChanel)
close(errorChanel)
}
onError := func(err error) {
errorChanel <- err
onDone()
}
go func() {
if _, err := os.Stat(path); os.IsNotExist(err) {
onError(fmt.Errorf("file [%s] does not exist", path))
return
}
f, err := os.Open(path)
if err != nil {
onError(err)
return
}
defer f.Close()
readBuf := make([]byte, readChunkSize)
reminder := 0
for {
read, err := f.Read(readBuf[reminder:])
if err == io.EOF {
onDone()
return
}
if err != nil {
onError(err)
}
runes, parsed := runes(readBuf[:reminder+read])
if reminder = readChunkSize - parsed; reminder > 0 {
copy(readBuf[:reminder], readBuf[readChunkSize-reminder:])
}
if len(runes) > 0 {
readChanel <- runes
}
}
}()
return readChanel, errorChanel
}
func runes(nextBuffer []byte) ([]rune, int) {
t := make([]rune, utf8.RuneCount(nextBuffer))
i := 0
var size = len(nextBuffer)
var read = 0
for len(nextBuffer) > 0 {
r, l := utf8.DecodeRune(nextBuffer)
runeLen := utf8.RuneLen(r)
if read+runeLen > size {
break
}
read += runeLen
t[i] = r
i++
nextBuffer = nextBuffer[l:]
}
return t[:i], read
}
It can be greatly simplified if the file is ACSII.
Alternativly, if you need to support unicode you can play aroud UTF-32 (which has fixed length) or UTF-16 (if you don't need to handle >2-bytes, you can treat it as fixed-size as well)

Convert int32 to string in Golang

I need to convert an int32 to string in Golang. Is it possible to convert int32 to string in Golang without converting to int or int64 first?
Itoa needs an int. FormatInt needs an int64.
One line answer is fmt.Sprint(i).
Anyway there are many conversions, even inside standard library function like fmt.Sprint(i), so you have some options (try The Go Playground):
1- You may write your conversion function (Fastest):
func String(n int32) string {
buf := [11]byte{}
pos := len(buf)
i := int64(n)
signed := i < 0
if signed {
i = -i
}
for {
pos--
buf[pos], i = '0'+byte(i%10), i/10
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
2- You may use fmt.Sprint(i) (Slow)
See inside:
// Sprint formats using the default formats for its operands and returns the resulting string.
// Spaces are added between operands when neither is a string.
func Sprint(a ...interface{}) string {
p := newPrinter()
p.doPrint(a)
s := string(p.buf)
p.free()
return s
}
3- You may use strconv.Itoa(int(i)) (Fast)
See inside:
// Itoa is shorthand for FormatInt(int64(i), 10).
func Itoa(i int) string {
return FormatInt(int64(i), 10)
}
4- You may use strconv.FormatInt(int64(i), 10) (Faster)
See inside:
// FormatInt returns the string representation of i in the given base,
// for 2 <= base <= 36. The result uses the lower-case letters 'a' to 'z'
// for digit values >= 10.
func FormatInt(i int64, base int) string {
_, s := formatBits(nil, uint64(i), base, i < 0, false)
return s
}
Comparison & Benchmark (with 50000000 iterations):
s = String(i) takes: 5.5923198s
s = String2(i) takes: 5.5923199s
s = strconv.FormatInt(int64(i), 10) takes: 5.9133382s
s = strconv.Itoa(int(i)) takes: 5.9763418s
s = fmt.Sprint(i) takes: 13.5697761s
Code:
package main
import (
"fmt"
//"strconv"
"time"
)
func main() {
var s string
i := int32(-2147483648)
t := time.Now()
for j := 0; j < 50000000; j++ {
s = String(i) //5.5923198s
//s = String2(i) //5.5923199s
//s = strconv.FormatInt(int64(i), 10) // 5.9133382s
//s = strconv.Itoa(int(i)) //5.9763418s
//s = fmt.Sprint(i) // 13.5697761s
}
fmt.Println(time.Since(t))
fmt.Println(s)
}
func String(n int32) string {
buf := [11]byte{}
pos := len(buf)
i := int64(n)
signed := i < 0
if signed {
i = -i
}
for {
pos--
buf[pos], i = '0'+byte(i%10), i/10
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
func String2(n int32) string {
buf := [11]byte{}
pos := len(buf)
i, q := int64(n), int64(0)
signed := i < 0
if signed {
i = -i
}
for {
pos--
q = i / 10
buf[pos], i = '0'+byte(i-10*q), q
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
The Sprint function converts a given value to string.
package main
import (
"fmt"
)
func main() {
var sampleInt int32 = 1
sampleString := fmt.Sprint(sampleInt)
fmt.Printf("%+V %+V\n", sampleInt, sampleString)
}
// %!V(int32=+1) %!V(string=1)
See this example.
Use a conversion and strconv.FormatInt to format int32 values as a string. The conversion has zero cost on most platforms.
s := strconv.FormatInt(int64(n), 10)
If you have many calls like this, consider writing a helper function similar to strconv.Itoa:
func formatInt32(n int32) string {
return strconv.FormatInt(int64(n), 10)
}
All of the low-level integer formatting code in the standard library works with int64 values. Any answer to this question using formatting code in the standard library (fmt package included) requires a conversion to int64 somewhere. The only way to avoid the conversion is to write formatting function from scratch, but there's little point in doing that.
func FormatInt32(value int32) string {
return fmt.Sprintf("%d", value)
}
Does this work?

check for equality on slices without order

I am trying to find a solution to check for equality in 2 slices. Unfortanely, the answers I have found require values in the slice to be in the same order. For example, http://play.golang.org/p/yV0q1_u3xR evaluates equality to false.
I want a solution that lets []string{"a","b","c"} == []string{"b","a","c"} evaluate to true.
MORE EXAMPLES
[]string{"a","a","c"} == []string{"c","a","c"} >>> false
[]string{"z","z","x"} == []string{"x","z","z"} >>> true
Here is an alternate solution, though perhaps a bit verbose:
func sameStringSlice(x, y []string) bool {
if len(x) != len(y) {
return false
}
// create a map of string -> int
diff := make(map[string]int, len(x))
for _, _x := range x {
// 0 value for int is 0, so just increment a counter for the string
diff[_x]++
}
for _, _y := range y {
// If the string _y is not in diff bail out early
if _, ok := diff[_y]; !ok {
return false
}
diff[_y] -= 1
if diff[_y] == 0 {
delete(diff, _y)
}
}
return len(diff) == 0
}
Try it on the Go Playground
You can use cmp.Diff together with cmpopts.SortSlices:
less := func(a, b string) bool { return a < b }
equalIgnoreOrder := cmp.Diff(x, y, cmpopts.SortSlices(less)) == ""
Here is a full example that runs on the Go Playground:
package main
import (
"fmt"
"github.com/google/go-cmp/cmp"
"github.com/google/go-cmp/cmp/cmpopts"
)
func main() {
x := []string{"a", "b", "c"}
y := []string{"a", "c", "b"}
less := func(a, b string) bool { return a < b }
equalIgnoreOrder := cmp.Diff(x, y, cmpopts.SortSlices(less)) == ""
fmt.Println(equalIgnoreOrder) // prints "true"
}
The other answers have better time complexity O(N) vs (O(N log(N)), that are in my answer, also my solution will take up more memory if elements in the slices are repeated frequently, but I wanted to add it because I think this is the most straight forward way to do it:
package main
import (
"fmt"
"sort"
"reflect"
)
func array_sorted_equal(a, b []string) bool {
if len(a) != len(b) {return false }
a_copy := make([]string, len(a))
b_copy := make([]string, len(b))
copy(a_copy, a)
copy(b_copy, b)
sort.Strings(a_copy)
sort.Strings(b_copy)
return reflect.DeepEqual(a_copy, b_copy)
}
func main() {
a := []string {"a", "a", "c"}
b := []string {"c", "a", "c"}
c := []string {"z","z","x"}
d := []string {"x","z","z"}
fmt.Println( array_sorted_equal(a, b))
fmt.Println( array_sorted_equal(c, d))
}
Result:
false
true
I would think the easiest way would be to map the elements in each array/slice to their number of occurrences, then compare the maps:
func main() {
x := []string{"a","b","c"}
y := []string{"c","b","a"}
xMap := make(map[string]int)
yMap := make(map[string]int)
for _, xElem := range x {
xMap[xElem]++
}
for _, yElem := range y {
yMap[yElem]++
}
for xMapKey, xMapVal := range xMap {
if yMap[xMapKey] != xMapVal {
return false
}
}
return true
}
You'll need to add some additional due dilligence, like short circuiting if your arrays/slices contain elements of different types or are of different length.
Generalizing the code of testify ElementsMatch, solution to compare any kind of objects (in the example []map[string]string):
https://play.golang.org/p/xUS2ngrUWUl
Like adrianlzt wrote in his answer, an implementation of assert.ElementsMatch from testify can be used to achieve that. But how about reusing actual testify module instead of copying that code when all you need is a bool result of the comparison? The implementation in testify is intended for tests code and usually takes testing.T argument.
It turns out that ElementsMatch can be quite easily used outside of testing code. All it takes is a dummy implementation of an interface with ErrorF method:
type dummyt struct{}
func (t dummyt) Errorf(string, ...interface{}) {}
func elementsMatch(listA, listB interface{}) bool {
return assert.ElementsMatch(dummyt{}, listA, listB)
}
Or test it on The Go Playground, which I've adapted from the adrianlzt's example.
Since I haven't got enough reputation to comment, I have to post yet another answer with a bit better code readability:
func AssertSameStringSlice(x, y []string) bool {
if len(x) != len(y) {
return false
}
itemAppearsTimes := make(map[string]int, len(x))
for _, i := range x {
itemAppearsTimes[i]++
}
for _, i := range y {
if _, ok := itemAppearsTimes[i]; !ok {
return false
}
itemAppearsTimes[i]--
if itemAppearsTimes[i] == 0 {
delete(itemAppearsTimes, i)
}
}
if len(itemAppearsTimes) == 0 {
return true
}
return false
}
The logic is the same as in this answer
I know its been answered but still I would like to add my answer. By following code here stretchr/testify we can have something like
func Elementsmatch(listA, listB []string) (string, bool) {
aLen := len(listA)
bLen := len(listB)
if aLen != bLen {
return fmt.Sprintf("Len of the lists don't match , len listA %v, len listB %v", aLen, bLen), false
}
visited := make([]bool, bLen)
for i := 0; i < aLen; i++ {
found := false
element := listA[i]
for j := 0; j < bLen; j++ {
if visited[j] {
continue
}
if element == listB[j] {
visited[j] = true
found = true
break
}
}
if !found {
return fmt.Sprintf("element %s appears more times in %s than in %s", element, listA, listB), false
}
}
return "", true
}
Now lets talk about performance of this solution compared to map based ones. Well it really depends on the size of the lists which you are comparing, If size of list is large (I would say greater than 20) then map approach is better else this would be sufficent.
Well on Go PlayGround it shows 0s always, but run this on local system and you can see the difference in time taken as size of list increases
So the solution I propose is, adding map based comparision from above solution
func Elementsmatch(listA, listB []string) (string, bool) {
aLen := len(listA)
bLen := len(listB)
if aLen != bLen {
return fmt.Sprintf("Len of the lists don't match , len listA %v, len listB %v", aLen, bLen), false
}
if aLen > 20 {
return elementsMatchByMap(listA, listB)
}else{
return elementsMatchByLoop(listA, listB)
}
}
func elementsMatchByLoop(listA, listB []string) (string, bool) {
aLen := len(listA)
bLen := len(listB)
visited := make([]bool, bLen)
for i := 0; i < aLen; i++ {
found := false
element := listA[i]
for j := 0; j < bLen; j++ {
if visited[j] {
continue
}
if element == listB[j] {
visited[j] = true
found = true
break
}
}
if !found {
return fmt.Sprintf("element %s appears more times in %s than in %s", element, listA, listB), false
}
}
return "", true
}
func elementsMatchByMap(x, y []string) (string, bool) {
// create a map of string -> int
diff := make(map[string]int, len(x))
for _, _x := range x {
// 0 value for int is 0, so just increment a counter for the string
diff[_x]++
}
for _, _y := range y {
// If the string _y is not in diff bail out early
if _, ok := diff[_y]; !ok {
return fmt.Sprintf(" %v is not present in list b", _y), false
}
diff[_y] -= 1
if diff[_y] == 0 {
delete(diff, _y)
}
}
if len(diff) == 0 {
return "", true
}
return "", false
}

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

How would you implement the deleteRecords function in the code below:
Example:
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords( l *RecordList, ids []int ) {
// Assume the RecordList can contain several 100 entries.
// and the number of the of the records to be removed is about 10.
// What is the fastest and cleanest ways to remove the records that match
// the id specified in the records list.
}
I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:
func deleteRecords(data []*Record, ids []int) []*Record {
w := 0 // write index
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
data[w] = x
w++
}
return data[:w]
}
You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.
func reorder(data []*Record, ids []int) []*Record {
n := len(data)
i := 0
loop:
for i < n {
r := data[i]
for _, id := range ids {
if id == r.id {
data[i] = data[n-1]
n--
continue loop
}
}
i++
}
return data[0:n]
}
As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.
If you wish to preserve original contents of the slice, something like this is more appropriate:
func deletePreserve(data []*Record, ids []int) []*Record {
wdata := make([]*Record, len(data))
w := 0
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
wdata[w] = x
w++
}
return wdata[0:w]
}
For a personal project, I did something like this:
func filter(sl []int, fn func(int) bool) []int {
result := make([]int, 0, len(sl))
last := 0
for i, v := range sl {
if fn(v) {
result = append(result, sl[last:i]...)
last = i + 1
}
}
return append(result, sl[last:]...)
}
It doesn't mutate the original, but should be relatively efficient.
It's probably better to just do:
func filter(sl []int, fn func(int) bool) (result []int) {
for _, v := range sl {
if !fn(v) {
result = append(result, v)
}
}
return
}
Simpler and cleaner.
If you want to do it in-place, you probably want something like:
func filter(sl []int, fn func(int) bool) []int {
outi := 0
res := sl
for _, v := range sl {
if !fn(v) {
res[outi] = v
outi++
}
}
return res[0:outi]
}
You can optimize this to use copy to copy ranges of elements, but that's twice
the code and probably not worth it.
So, in this specific case, I'd probably do something like:
func deleteRecords(l []*Record, ids []int) []*Record {
outi := 0
L:
for _, v := range l {
for _, id := range ids {
if v.id == id {
continue L
}
}
l[outi] = v
outi++
}
return l[0:outi]
}
(Note: untested.)
No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.
For the case you described, where len(ids) is approximately 10 and len(*l) is in the several hundreds, this should be relatively fast, since it minimizes memory allocations by updating in place.
package main
import (
"fmt"
"strconv"
)
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords(l *RecordList, ids []int) {
rl := *l
for i := 0; i < len(rl); i++ {
rid := rl[i].id
for j := 0; j < len(ids); j++ {
if rid == ids[j] {
copy(rl[i:len(*l)-1], rl[i+1:])
rl[len(rl)-1] = nil
rl = rl[:len(rl)-1]
break
}
}
}
*l = rl
}
func main() {
l := make(RecordList, 777)
for i := range l {
l[i] = &Record{int(i), "name #" + strconv.Itoa(i)}
}
ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)}
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
deleteRecords(&l, ids)
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
}
Output:
[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776}
[0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}
Instead of repeatedly searching ids, you could use a map. This code preallocates the full size of the map, and then just moves array elements in place. There are no other allocations.
func deleteRecords(l *RecordList, ids []int) {
m := make(map[int]bool, len(ids))
for _, id := range ids {
m[id] = true
}
s, x := *l, 0
for _, r := range s {
if !m[r.id] {
s[x] = r
x++
}
}
*l = s[0:x]
}
Use the vector package's Delete method as a guide, or just use a Vector instead of a slice.
Here is one option but I would hope there are cleaner/faster more functional looking ones:
func deleteRecords( l *RecordList, ids []int ) *RecordList {
var newList RecordList
for _, rec := range l {
toRemove := false
for _, id := range ids {
if rec.id == id {
toRemove = true
}
if !toRemove {
newList = append(newList, rec)
}
}
return newList
}
With large enough l and ids it will be more effective to Sort() both lists first and then do a single loop over them instead of two nested loops

Resources