Check whether a string slice contains a certain value in Go - set

What is the best way to check whether a certain value is in a string slice? I would use a Set in other languages, but Go doesn't have one.
My best try is this so far:
package main
import "fmt"
func main() {
list := []string{"a", "b", "x"}
fmt.Println(isValueInList("b", list))
fmt.Println(isValueInList("z", list))
}
func isValueInList(value string, list []string) bool {
for _, v := range list {
if v == value {
return true
}
}
return false
}
http://play.golang.org/p/gkwMz5j09n
This solution should be ok for small slices, but what to do for slices with many elements?

If you have a slice of strings in an arbitrary order, finding if a value exists in the slice requires O(n) time. This applies to all languages.
If you intend to do a search over and over again, you can use other data structures to make lookups faster. However, building these structures require at least O(n) time. So you will only get benefits if you do lookups using the data structure more than once.
For example, you could load your strings into a map. Then lookups would take O(1) time. Insertions also take O(1) time making the initial build take O(n) time:
set := make(map[string]bool)
for _, v := range list {
set[v] = true
}
fmt.Println(set["b"])
You can also sort your string slice and then do a binary search. Binary searches occur in O(log(n)) time. Building can take O(n*log(n)) time.
sort.Strings(list)
i := sort.SearchStrings(list, "b")
fmt.Println(i < len(list) && list[i] == "b")
Although in theory given an infinite number of values, a map is faster, in practice it is very likely searching a sorted list will be faster. You need to benchmark it yourself.

To replace sets you should use a map[string]struct{}. This is efficient and considered idiomatic, the "values" take absolutely no space.
Initialize the set:
set := make(map[string]struct{})
Put an item :
set["item"]=struct{}{}
Check whether an item is present:
_, isPresent := set["item"]
Remove an item:
delete(set, "item")

You can use a map, and have the value e.g. a bool
m := map[string] bool {"a":true, "b":true, "x":true}
if m["a"] { // will be false if "a" is not in the map
//it was in the map
}
There's also the sort package, so you could sort and binary search your slices

Related

Add []bytes append slice []bytes

I began to learn the language of GO and I do not quite understand something, maybe I'm just confused and tired.
Here is my code, there is an array of result (from encoded strings, size 2139614 elements). I need to decode them and use them further. But when I run an iteration, the resultrips is twice as large and the first half is completely empty. Therefore, I make a slice and add to it the desired range.
Why it happens?
It might be easier to decode the result immediately and re-record it, but I don’t know how to do it, well)))
maybe there is a completely different way and as a beginner I don’t know it yet
result := []string{}
for i, _ := range input {
result = append(result, i)
}
sort.Strings(result)
rips := make([][]byte, 2139614)
for _, i := range result {
c := Decode(i)
c = c[1:37]
rips = append(rips, c)
}
//len(result) == 2139614
for i := 2139610; i < 2139700; i++ {
fmt.Println(i, rips[i])
}
resultrips := rips[2139614:]
for _,i := range resultrips {
fmt.Println(i)
}
fmt.Println("All write: ", len(resultrips))
and this question: I do it right if I need an array of byte arrays (I do it so as not to do too much work and will check the values in bytes, because there is no any coding) ???
rips := make([][]byte, 2139614) //array []byte
in the end, I need an array of the type of the set in C ++ to check if there is an element in my set
in C ++ it was code:
if (resultrips.count > 0) { ... }
When you write:
make([][]byte, 2139614)
This creates a slice with length and capacity equal to 2139614. When you append to a slice, it always appends after the last element, thereby increasing the length. If you want to pre-allocate a large slice so that you can append into it, you want to specify a length of 0:
make([][]byte, 0, 2139614)
This pre-allocates 2139614 elements, but with a length of 0, subsequent append calls will start at the beginning of the slice; after the first append it will have a length of 1, and it will not need to have increased its capacity.
Length vs capacity is covered in the Tour of Go: https://tour.golang.org/moretypes/13
A quick note based on the text of your question - remember that slices and arrays are not the same thing. Arrays have a compile-time fixed length and their capacity is synonymous with their length. Slices are backed by arrays but have runtime dynamic independent length and capacity.

Count distinct values in array - performance tips

I'm having some issues optimizing a go map.
I want to generate a frequency table (count distinct occurrences) in an array of strings. My code holds nicely for small arrays, but as I start working with 100k+ structures -with many distinct values- it just isn't performant enough.
Right now, my approach is to generate an array with the distinct values, compare values and increasing the counter variable (mapped to the string).
counter := make( map[string]int )
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
counter[distinct]++
}
}
}
I've tried another approach, where with the input array previously sorted (to minimize the number of changes to the map). This is a bit faster.
count:=0
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
count++
}
}
counter[distinct] += count
count= 0
}
Do you have any suggestion of what I can do to optimize a simple count(distinct) type problem...? I'm open to anything.
thanks!
Without more context, I would dump the separate array of distinct values - generating it takes time, and using it necessitates the nested loop. Assuming there's no other purpose to the second array, I'd use something like:
counter := make( map[string]int )
for _, row := range StrArray {
counter[row]++
}
If you need the list of distinct strings without the counts for some separate purpose, you can easily get it afterward:
distinctStrings := make([]string, len(counter))
i := 0
for k := range counter {
distinctStrings[i] = k
i++
}
Iterating the array of distinct strings is O(n), while map access by key is O(log(n)). That takes your overall from O(n^2) to O(n*log(n)), which should be a significant improvement with larger datasets. But, as with any optimization: test, measure, analyze, optimize.

Why is iterating over a map so much slower than iterating over a slice in Golang?

I was implementing a sparse matrix using a map in Golang and I noticed that my code started taking much longer to complete after this change, after dismissing other possible causes, seems that the culprit is the iteration on the map itself. Go Playground link (doesn't work for some reason).
package main
import (
"fmt"
"time"
"math"
)
func main() {
z := 50000000
a := make(map[int]int, z)
b := make([]int, z)
for i := 0; i < z; i++ {
a[i] = i
b[i] = i
}
t0 := time.Now()
for key, value := range a {
if key != value { // never happens
fmt.Println("a", key, value)
}
}
d0 := time.Now().Sub(t0)
t1 := time.Now()
for key, value := range b {
if key != value { // never happens
fmt.Println("b", key, value)
}
}
d1 := time.Now().Sub(t1)
fmt.Println(
"a:", d0,
"b:", d1,
"diff:", math.Max(float64(d0), float64(d1)) / math.Min(float64(d0), float64(d1)),
)
}
Iterating over 50M items returns the following timings:
alix#local:~/Go/src$ go version
go version go1.3.3 linux/amd64
alix#local:~/Go/src$ go run b.go
a: 1.195424429s b: 68.588488ms diff: 17.777154632611037
I wonder, why is iterating over a map almost 20x as slow when compared to a slice?
This comes down to the representation in memory. How familiar are you with the representation of different data structures and the concept of algorithmic complexity? Iterating over an array or slice is simple. Values are contiguous in memory. However iterating over a map requires traversing the key space and doing lookups into the hash-table structure.
The dynamic ability of maps to insert keys of any value without using up tons of space allocating a sparse array, and the fact that look-ups can be done efficiently over the key space despite being not as fast as an array, are why hash tables are sometimes preferred over an array, although arrays (and slices) have a faster "constant" (O(1)) lookup time given an index.
It all comes down to whether you need the features of this or that data structure and whether you're willing to deal with the side-effects or gotchas involved.
Seems reasonable to put my comment as an answer. The underlying structures who's iteration performance you're comparing are a hash table and an array (https://en.wikipedia.org/wiki/Hash_table vs https://en.wikipedia.org/wiki/Array_data_structure). The range abstraction is actually (speculation, can't find the code) iterating all the keys, accessing each value, and assigning the two to k,v :=. If you're not familiar with accessing in the array it is constant time because you just add sizeof(type)*i to the starting pointer to get the item. I don't know what the internals of map are in golang but I know enough to know that it's memory representation and therefor access is nothing close that efficient.
The specs statement on the topic isn't much; http://golang.org/ref/spec#For_statements
If I find the time to look up the implementation of range for map and slice/array I will and put some more technical details.

best way to handle list of integers and to - find, add, and delete

I need to create a list of integers and to be able to quickly add, delete, and find items in that list. While I could create a string containing them and a function to handle the add/delete/locate, it obviously makes more sense if Go can handle it for me. I looked at container/list and it appeared not entirely suitable, but maybe I'm wrong.
To very quickly implement something, I am using an integer array, however that is far from ideal, and I need to find a better solution. The list will probably hold up to 1,000 values.
Can someone please advise the "best" way to handle this in Go? An example is worth 1,000 words.
There is no 'best' way to your question as you don't state what you would like to do or what
sort of performance is important to you. The problem with data structures is, that every structure
performs better or worse depending on the circumstances. Generally I would say that an integer slice
would perform reasonably well for 1000 entries and is not so hard to use. Also the solution Nick
proposed is appealing, as it offers you O(1) lookup time (average!) for your values instead of
O(n) (unsorted) or O(log n) (sorted) search time in an array.
Go offers some operations to implement a []int store as you proposed:
get: x[i]
insert: x[i] = j or x = append(x, j) or use sorted insertion
delete: x = append(x[:i], x[i+1:]...)
search: in case you used sorted insertion, you can use sort.SearchInts, otherwise you need to loop and search linearly.
For more operations on slices see here.
The following example (playground) offers you a []int
with O(log n) time for searching and O(n) for insertion. Retrieval, deletion and setting
by index is, of course, O(1).
type Ints []int
// Insert v so that ints is sorted
func (ints *Ints) Append(v int) {
i := sort.SearchInts(*ints, v)
*ints = append((*ints)[:i], append([]int{v}, (*ints)[i:]...)...)
}
// Delete by index
func (ints *Ints) Delete(i int) {
*ints = append((*ints)[:i], (*ints)[i+1:]...)
}
func (ints Ints) Search(v int) (int, bool) {
i := sort.SearchInts(ints, v)
return i, i < len(ints) && ints[i] == v
}
data := make(Ints, 0, 1000)
data.Append(100)
index,ok := data.Search(10)
As you can see in the example, Append searches for the place to insert the new value in, depending
on the size, effectively sorting the contents of the slice in ascending order. This makes it possible
to use binary search via sort.SearchInts, reducing the search time from O(n) to O(log n).
With that comes the cost to sort while inserting, which in turn is done by searching for a slot, which
costs O(log n) in worst case. Therefore, inserting is O(log n) as well.
In the interest of keeping it simple I would use a map. Maps are very fast, efficient and built in.
(playground link)
package main
import "fmt"
func main() {
// Make our collection of integers
xs := make(map[int]bool)
// Add some things to the collection
xs[1] = true
xs[2] = true
xs[3] = true
// Find them
if xs[2] {
fmt.Println("Found 2")
} else {
fmt.Println("Didn't Find 2")
}
if xs[8] {
fmt.Println("Found 8")
} else {
fmt.Println("Didn't Find 8")
}
// Delete them
delete(xs, 2)
// List them
for x := range xs {
fmt.Println("Contents", x)
}
}
Which produces
Found 2
Didn't Find 8
Contents 3
Contents 1
Possibly the only disadvantage of this solution is that the integers aren't kept in any particular order, which may or may not be important to your application.
This is really more of an abstract data structure question. The answer depends on your use case. A slice of ints would do fine for the general case (look at append and such), but if you want finding items to be better than O(n) then you'll want them to be sorted, and insertion into a sorted int[] has a worst case of O(n) iirc.
So the question is which do you want to optimize, index, add, remove, or search?

how to sort data at the time of adding it, not later?

I am new to algorithms so please forgive me if this sounds basic or stupid.
I want to know this : instead of adding data into some kind of list and then performing a sort on the list, is there a method (data structure+algorithm) that lets me sort the data at the time of adding itself, or to put it another way, inserts the data in its proper place?
eg: if I want to add '3' to {1,5,6}, instead of adding it at the start or end and then sorting the list, I want '3' to go after '1' "directly".
thanks
If you use a binary search tree instead of an array, the sorting would happen "automatically", because it's already done by the insert method of the nodes. So a binary tree is always sorted, and it's easy to traverse. The only problem is that when you have already (more or less) sorted data, the tree becomes inbalanced (which is where red-black-trees and other variations come into play).
You want to maintain a sorted array at all times, so you shall find a correct place in sequence for every new element you want to add to the array. This can be done efficiently (O(logn) complexity) by utilizing a modified binary search algorithm.
There are basically two different methods to insert a value in a list, which you use depend a bit on what kind of list you are using:
Use binary search to locate where the value should be inserted, and insert the value there.
Loop from the end of the list, moving all higher values one step up, and put the value in the gap before the lowest higher value.
The first method would typically be used if you are using a binary tree or a linked list, where you don't have to move items in the list to do the insert.
Yes but that's usually slower than adding all the data and sorting it afterwards because to insert the data as it is added, you need to traverse the list every time you add an element.
With binary search, you need not look at every element but even then, you often need to get more elements from the list as when you sort afterwards.
So from a performance view, sorted insert is inferior to post sorting.
Here is a golang code that sorts inputs on the fly
What I am doing here is that I am determining the position of where possibly the input will fit through binary search and then partitioning the already sorted array to fit the element, appending to part one and then rejoining the two parts.
time complexity = Log N(to determine the position) + 3N (to create slices and append for each input)
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
reader := bufio.NewReader(os.Stdin)
var a []int
for {
fmt.Print("Please enter a value:")
text := readLine(reader)
key, _ := strconv.ParseInt(text, 10, 0)
pos := binarySearch(a, 0, len(a)-1, int(key))
p1 := append([]int{}, a[:pos]...)
p2 := a[pos:]
p1 = append(p1, int(key))
p1 = append(p1, p2...)
a = p1
fmt.Println(a)
}
}
func binarySearch(a []int, low int, high int, key int) int {
var result int
if high == -1 {
return 0
} else if key >= a[high] {
return high + 1
} else if key <= a[low] {
return low
}
mid := (high + low) / 2
if a[mid] == key {
result = mid
} else if key < a[mid] {
return binarySearch(a, low, mid-1, key)
} else if key > a[mid] {
return binarySearch(a, mid+1, high, key)
}
return result
}
func readLine(reader *bufio.Reader) string {
text, err := reader.ReadString('\n')
if err != nil {
fmt.Println(err)
}
text = strings.TrimRight(text, "\n")
return text
}

Resources