How to get intersection of two slice in golang? - go

Is there any efficient way to get intersection of two slices in Go?
I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter

How do I get the intersection between two arrays as a new array?
Simple Intersection: Compare each element in A to each in B (O(n^2))
Hash Intersection: Put them into a hash table (O(n))
Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))
All of which are implemented here
https://github.com/juliangruber/go-intersect

simple, generic and mutiple slices ! (Go 1.18)
Time Complexity : may be linear
func interSection[T constraints.Ordered](pS ...[]T) []T {
hash := make(map[T]*int) // value, counter
result := make([]T, 0)
for _, slice := range pS {
duplicationHash := make(map[T]bool) // duplication checking for individual slice
for _, value := range slice {
if _, isDup := duplicationHash[value]; !isDup { // is not duplicated in slice
if counter := hash[value]; counter != nil { // is found in hash counter map
if *counter++; *counter >= len(pS) { // is found in every slice
result = append(result, value)
}
} else { // not found in hash counter map
i := 1
hash[value] = &i
}
duplicationHash[value] = true
}
}
}
return result
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Println(interSection(slice1, slice2))
// [foo bar]
ints1 := []int{1, 2, 3, 9, 8}
ints2 := []int{10, 4, 2, 4, 8, 9} // have duplicated values
ints3 := []int{2, 4, 8, 1}
fmt.Println(interSection(ints1, ints2, ints3))
// [2 8]
}
playground : https://go.dev/play/p/lE79D0kOznZ

It's a best method for intersection two slice. Time complexity is too low.
Time Complexity : O(m+n)
m = length of first slice.
n = length of second slice.
func intersection(s1, s2 []string) (inter []string) {
hash := make(map[string]bool)
for _, e := range s1 {
hash[e] = true
}
for _, e := range s2 {
// If elements present in the hashmap then append intersection list.
if hash[e] {
inter = append(inter, e)
}
}
//Remove dups from slice.
inter = removeDups(inter)
return
}
//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
encountered := make(map[string]bool)
for _, element := range elements {
if !encountered[element] {
nodups = append(nodups, element)
encountered[element] = true
}
}
return
}

if there exists no blank in your []string, maybe you need this simple code:
func filter(src []string) (res []string) {
for _, s := range src {
newStr := strings.Join(res, " ")
if !strings.Contains(newStr, s) {
res = append(res, s)
}
}
return
}
func intersections(section1, section2 []string) (intersection []string) {
str1 := strings.Join(filter(section1), " ")
for _, s := range filter(section2) {
if strings.Contains(str1, s) {
intersection = append(intersection, s)
}
}
return
}

Try it
https://go.dev/play/p/eGGcyIlZD6y
first := []string{"one", "two", "three", "four"}
second := []string{"two", "four"}
result := intersection(first, second) // or intersection(second, first)
func intersection(first, second []string) []string {
out := []string{}
bucket := map[string]bool{}
for _, i := range first {
for _, j := range second {
if i == j && !bucket[i] {
out = append(out, i)
bucket[i] = true
}
}
}
return out
}

https://github.com/viant/toolbox/blob/a46fd679bbc5d07294b1d1b646aeacd44e2c7d50/collections.go#L869-L920
Another O(m+n) Time Complexity solution that uses a hashmap.
It has two differences compared to the other solutions discussed here.
Passing the target slice as a parameter instead of new slice returned
Faster to use for commonly used types like string/int instead of reflection for all

Yes there are a few different ways to go about it.. Here's an example that can be optimized.
package main
import "fmt"
func intersection(a []string, b []string) (inter []string) {
// interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
low, high := a, b
if len(a) > len(b) {
low = b
high = a
}
done := false
for i, l := range low {
for j, h := range high {
// get future index values
f1 := i + 1
f2 := j + 1
if l == h {
inter = append(inter, h)
if f1 < len(low) && f2 < len(high) {
// if the future values aren't the same then that's the end of the intersection
if low[f1] != high[f2] {
done = true
}
}
// we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
high = high[:j+copy(high[j:], high[j+1:])]
break
}
}
// nothing in the future so we are done
if done {
break
}
}
return
}
func main() {
slice1 := []string{"foo", "bar", "hello", "bar"}
slice2 := []string{"foo", "bar"}
fmt.Printf("%+v\n", intersection(slice1, slice2))
}
Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.
func intersectionString(a []string, b []string) (inter []string),
func intersectionInt(a []int, b []int) (inter []int),
func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect
You can then create your own package and reuse once you settle how you want to implement it.
package intersection
func String(a []string, b []string) (inter []string)
func Int(a []int, b []int) (inter []int)
func Float64(a []Float64, b []Float64) (inter []Float64)

Related

Code to generate powerset in Golang gives wrong result

Next code in Golang to generate powerset produces wrong result on input {"A", "B", "C", "D", "E"}. I see [A B C E E] as the last generated set.
package main
import (
"fmt"
)
func main() {
for _, s := range PowerSet([]string{"A", "B", "C", "D", "E"}) {
fmt.Println(s)
}
}
func PowerSet(set []string) [][]string {
var powerSet [][]string
powerSet = append(powerSet, make([]string, 0))
for _, element := range set {
var moreSets [][]string
for _, existingSet := range powerSet {
newSet := append(existingSet, element)
moreSets = append(moreSets, newSet)
}
powerSet = append(powerSet, moreSets...)
}
return powerSet
}
How to fix it? How to write it idiomatically in Go?
The problem with your program is not the algorithm itself but this line:
newSet := append(existingSet, element)
You should not append and assign to a different variable.
As the documentation states (emphasis mine), "The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated.".
So, there might be cases where newSet := append(existingSet, element) will actually modify existingSet itself, which would break your logic.
If you change that to instead create a new array and append to that one, it works as you expect it.
newSet := make([]string, 0)
newSet = append(newSet, existingSet...)
newSet = append(newSet, element)
For instance, you can use algorithm like this one: https://stackoverflow.com/a/2779467/3805062.
func PowerSet(original []string) [][]string {
powerSetSize := int(math.Pow(2, float64(len(original))))
result := make([][]string, 0, powerSetSize)
var index int
for index < powerSetSize {
var subSet []string
for j, elem := range original {
if index& (1 << uint(j)) > 0 {
subSet = append(subSet, elem)
}
}
result = append(result, subSet)
index++
}
return result
}
Elaborating on #eugenioy's answer.
Look at this thread. Here is a working example : https://play.golang.org/p/dzoTk1kimf
func copy_and_append_string(slice []string, elem string) []string {
// wrong: return append(slice, elem)
return append(append([]string(nil), slice...), elem)
}
func PowerSet(s []string) [][]string {
if s == nil {
return nil
}
r := [][]string{[]string{}}
for _, es := range s {
var u [][]string
for _, er := range r {
u = append(u, copy_and_append_string(er, es))
}
r = append(r, u...)
}
return r
}

Simple mapReduce operation on strings

I have a list of strings
elems := [n]string{...}
I want to perform a simple mapReduce operation, such that I
Map every string to a different string, let's say string -> $string
Reduce all the strings to one string with a separator, e.g. {s1, s2, s3} -> s1#s2#s3
all in all: {s1, s2, s3} -> $s1#$s2#$s3
What's the best way to do this?
I'm looking for efficiency and readability
Bonus points if it's generic enough to work not only on strings
For mapping just a list, you won't have much choice other than to go over each string. If the transform algo is time-consuming and you need speed, you can consider splitting the job and use a go routine. Finally you can use the strings.Join function which has an option to specify a separator, this normally performs the reduce part efficiently. The size of the dataset can also be a consideration, and for larger sized lists you may want to compare performance with strings.Join and your own customized algo and see if you want to use multiple go routines/channels to achieve what you want to.
If you don't need to do the 2 things separately, the end result can be achieved simply by using strings.Join():
package main
import (
"fmt"
"strings"
)
func main() {
a := []string{"a", "b", "c"}
p := "$"
fmt.Println(p + strings.Join(a[:], "#"+p))
}
prints $a#$b#$c
playground
Go is explicitly NOT a functional programming language.
You map and reduce using a for loop.
a := []string{"a", "b", "c"}
result := "initvalue"
for n, i := range a {
result += i + string(n)
}
If you are not going to perform any sort of IO operations inside your map functions (means they are doing just some computations), making it concurrent would make it slower for sure and even if you are doing some IO, you should benchmark. Concurrency would not make things faster necessarily and some times add unnecessary complications. In many cases just a simple for loop is sufficient.
If the map functions here are IO bound or are doing some sort of computation heavy calculations that do benefit from going concurrent, solutions can vary. For example NATS can be used to go beyond one machine and distribute the workload.
This is a relatively simple sample. Reduce phase is not multistage and is blocking:
import (
"fmt"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
)
type elem struct {
index int
value interface{}
}
func feed(elems []interface{}) <-chan elem {
result := make(chan elem)
go func() {
for k, v := range elems {
e := elem{
index: k,
value: v,
}
result <- e
}
close(result)
}()
return result
}
func mapf(
input <-chan elem,
mapFunc func(elem) elem) <-chan elem {
result := make(chan elem)
go func() {
for e := range input {
eres := mapFunc(e)
result <- eres
}
close(result)
}()
return result
}
// is blocking
func reducef(
input <-chan elem,
reduceFunc func([]interface{}) interface{}) interface{} {
buffer := make(map[int]interface{})
l := 0
for v := range input {
buffer[v.index] = v.value
if v.index > l {
l = v.index
}
}
data := make([]interface{}, l+1)
for k, v := range buffer {
data[k] = v
}
return reduceFunc(data)
}
func fanOutIn(
elemFeed <-chan elem,
mapFunc func(elem) elem, mapCount int,
reduceFunc func([]interface{}) interface{}) interface{} {
MR := make(chan elem)
wg := &sync.WaitGroup{}
for i := 0; i < mapCount; i++ {
mapResult := mapf(elemFeed, mapFunc)
wg.Add(1)
go func() {
defer wg.Done()
for v := range mapResult {
MR <- v
}
}()
}
go func() {
wg.Wait()
close(MR)
}()
return reducef(MR, reduceFunc)
}
func Test01(t *testing.T) {
elemFeed := feed([]interface{}{1, 2, 3})
finalResult := fanOutIn(
elemFeed,
func(e elem) elem {
return elem{
index: e.index,
value: fmt.Sprintf("[%v]", e.value),
}
},
3,
func(sl []interface{}) interface{} {
strRes := make([]string, len(sl))
for k, v := range sl {
strRes[k] = v.(string)
}
return strings.Join(strRes, ":")
})
assert.Equal(t, "[1]:[2]:[3]", finalResult)
}
And since it uses interface{} as the element type, it can get generalized.

Short way to apply a function to all elements in a list in golang

Suppose I would like to apply a function to every element in a list, and then put the resulting values in another list so I can immediately use them. In python, I would do something like this:
list = [1,2,3]
str = ', '.join(multiply(x, 2) for x in list)
In Go, I do something like this:
list := []int{1,2,3}
list2 := []int
for _,x := range list {
list2 := append(list2, multiply(x, 2))
}
str := strings.Join(list2, ", ")
Is it possible to do this in a shorter way?
I would do exactly as you did, with a few tweaks to fix typos
import (
"fmt"
"strconv"
"strings"
)
func main() {
list := []int{1,2,3}
var list2 []string
for _, x := range list {
list2 = append(list2, strconv.Itoa(x * 2)) // note the = instead of :=
}
str := strings.Join(list2, ", ")
fmt.Println(str)
}
This is an old question, but was the top hit in my Google search, and I found information that I believe will be helpful to the OP and anyone else who arrives here, looking for the same thing.
There is a shorter way, although you have to write the map function yourself.
In go, func is a type, which allows you to write a function that accepts as input the subject slice and a function, and which iterates over that slice, applying that function.
See the Map function near the bottom of this Go by Example page : https://gobyexample.com/collection-functions
I've included it here for reference:
func Map(vs []string, f func(string) string) []string {
vsm := make([]string, len(vs))
for i, v := range vs {
vsm[i] = f(v)
}
return vsm
}
You then call it like so:
fmt.Println(Map(strs, strings.ToUpper))
So, yes: The shorter way you are looking for exists, although it is not built into the language itself.
I've created a small utility package with Mapand Filter methods now that generics have been introduced in 1.18 :)
https://pkg.go.dev/github.com/sa-/slicefunk
Example usage
package main
import (
"fmt"
sf "github.com/sa-/slicefunk"
)
func main() {
original := []int{1, 2, 3, 4, 5}
newArray := sf.Map(original, func(item int) int { return item + 1 })
newArray = sf.Map(newArray, func(item int) int { return item * 3 })
newArray = sf.Filter(newArray, func(item int) bool { return item%2 == 0 })
fmt.Println(newArray)
}
With go1.18+ you can write a much cleaner generic Map function:
func Map[T, V any](ts []T, fn func(T) V) []V {
result := make([]V, len(ts))
for i, t := range ts {
result[i] = fn(t)
}
return result
}
Usage, e.g:
input := []int{4, 5, 3}
outputInts := Map(input, func(item int) int { return item + 1 })
outputStrings := Map(input, func(item int) string { return fmt.Sprintf("Item:%d", item) })
Found a way to define a generic map array function
func Map(t interface{}, f func(interface{}) interface{} ) []interface{} {
switch reflect.TypeOf(t).Kind() {
case reflect.Slice:
s := reflect.ValueOf(t)
arr := make([]interface{}, s.Len())
for i := 0; i < s.Len(); i++ {
arr[i] = f(s.Index(i).Interface())
}
return arr
}
return nil
}
origin := []int{4,5,3}
newArray := Map(origin, func(item interface{}) interface{} { return item.(int) + 1})
You can use lo's Map in order to quickly apply a function to all elements. For example, in order to multiply by 2 and convert to string, you can use:
l := lo.Map[int, string]([]int{1, 2, 3, 4}, func(x int, _ int) string { return strconv.Itoa(x * 2) })
Then you can convert back to a comma delimited string like so:
strings.Join(l, ",")

How to convert interface{} to []int?

I am programming in Go programming language.
Say there's a variable of type interface{} that contains an array of integers. How do I convert interface{} back to []int?
I have tried
interface_variable.([]int)
The error I got is:
panic: interface conversion: interface is []interface {}, not []int
It's a []interface{} not just one interface{}, you have to loop through it and convert it:
the 2022 answer
https://go.dev/play/p/yeihkfIZ90U
func ConvertSlice[E any](in []any) (out []E) {
out = make([]E, 0, len(in))
for _, v := range in {
out = append(out, v.(E))
}
return
}
the pre-go1.18 answer
http://play.golang.org/p/R441h4fVMw
func main() {
a := []interface{}{1, 2, 3, 4, 5}
b := make([]int, len(a))
for i := range a {
b[i] = a[i].(int)
}
fmt.Println(a, b)
}
As others have said, you should iterate the slice and convert the objects one by one.
Is better to use a type switch inside the range in order to avoid panics:
a := []interface{}{1, 2, 3, 4, 5}
b := make([]int, len(a))
for i, value := range a {
switch typedValue := value.(type) {
case int:
b[i] = typedValue
break
default:
fmt.Println("Not an int: ", value)
}
}
fmt.Println(a, b)
http://play.golang.org/p/Kbs3rbu2Rw
Func return value is interface{} but real return value is []interface{}, so try this instead:
func main() {
values := returnValue.([]interface{})
for i := range values {
fmt.Println(values[i])
}
}

How to find the difference between two slices of strings

Here is my desired outcome
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
difference(slice1, slice2)
=> ["hello"]
I am looking for the difference between the two string slices!
Assuming Go maps are ~O(1), here is an ~O(n) difference function that works on unsorted slices.
// difference returns the elements in `a` that aren't in `b`.
func difference(a, b []string) []string {
mb := make(map[string]struct{}, len(b))
for _, x := range b {
mb[x] = struct{}{}
}
var diff []string
for _, x := range a {
if _, found := mb[x]; !found {
diff = append(diff, x)
}
}
return diff
}
Depending on the size of the slices, different solutions might be best.
My answer assumes order doesn't matter.
Using simple loops, only to be used with smaller slices:
package main
import "fmt"
func difference(slice1 []string, slice2 []string) []string {
var diff []string
// Loop two times, first to find slice1 strings not in slice2,
// second loop to find slice2 strings not in slice1
for i := 0; i < 2; i++ {
for _, s1 := range slice1 {
found := false
for _, s2 := range slice2 {
if s1 == s2 {
found = true
break
}
}
// String not found. We add it to return slice
if !found {
diff = append(diff, s1)
}
}
// Swap the slices, only if it was the first loop
if i == 0 {
slice1, slice2 = slice2, slice1
}
}
return diff
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "world", "bar", "foo"}
fmt.Printf("%+v\n", difference(slice1, slice2))
}
Output:
[hello world]
Playground: http://play.golang.org/p/KHTmJcR4rg
I use the map to solve this problem
package main
import "fmt"
func main() {
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar","world"}
diffStr := difference(slice1, slice2)
for _, diffVal := range diffStr {
fmt.Println(diffVal)
}
}
func difference(slice1 []string, slice2 []string) ([]string){
diffStr := []string{}
m :=map [string]int{}
for _, s1Val := range slice1 {
m[s1Val] = 1
}
for _, s2Val := range slice2 {
m[s2Val] = m[s2Val] + 1
}
for mKey, mVal := range m {
if mVal==1 {
diffStr = append(diffStr, mKey)
}
}
return diffStr
}
output:
hello
world
func diff(a, b []string) []string {
temp := map[string]int{}
for _, s := range a {
temp[s]++
}
for _, s := range b {
temp[s]--
}
var result []string
for s, v := range temp {
if v != 0 {
result = append(result, s)
}
}
return result
}
If you want to handle duplicated strings, the v in the map can do that. And you can pick a.Remove(b) ( v>0 ) or b.Remove(a) (v<0)
func unique(slice []string) []string {
encountered := map[string]int{}
diff := []string{}
for _, v := range slice {
encountered[v] = encountered[v]+1
}
for _, v := range slice {
if encountered[v] == 1 {
diff = append(diff, v)
}
}
return diff
}
func main() {
slice1 := []string{"hello", "michael", "dorner"}
slice2 := []string{"hello", "michael"}
slice3 := []string{}
fmt.Println(unique(append(slice1, slice2...))) // [dorner]
fmt.Println(unique(append(slice2, slice3...))) // [michael michael]
}
As mentioned by ANisus, different approaches will suit different sizes of input slices. This solution will work in linear time O(n) independent of input size, but assumes that the "equality" includes index position.
Thus, in the OP's examples of:
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
The entries foo and bar are equal not just due to value, but also due to their index in the slice.
Given these conditions, you can do something like:
package main
import "fmt"
func difference(s1, s2 []string) string {
var (
lenMin int
longest []string
out string
)
// Determine the shortest length and the longest slice
if len(s1) < len(s2) {
lenMin = len(s1)
longest = s2
} else {
lenMin = len(s2)
longest = s1
}
// compare common indeces
for i := 0; i < lenMin; i++ {
if s1[i] != s2[i] {
out += fmt.Sprintf("=>\t%s\t%s\n", s1[i], s2[i])
}
}
// add indeces not in common
for _, v := range longest[lenMin:] {
out += fmt.Sprintf("=>\t%s\n", v)
}
return out
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Print(difference(slice1, slice2))
}
Produces:
=> hello
Playground
If you change the slices to be:
func main() {
slice1 := []string{"foo", "baz", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Print(difference(slice1, slice2))
}
It will produce:
=> baz bar
=> hello
Most of the other solutions here will fail to return the correct answer in case the slices contain duplicated elements.
This solution is O(n) time and O(n) space if the slices are already sorted, and O(n*log(n)) time O(n) space if they are not, but has the nice property of actually being correct. 🤣
func diff(a, b []string) []string {
a = sortIfNeeded(a)
b = sortIfNeeded(b)
var d []string
i, j := 0, 0
for i < len(a) && j < len(b) {
c := strings.Compare(a[i], b[j])
if c == 0 {
i++
j++
} else if c < 0 {
d = append(d, a[i])
i++
} else {
d = append(d, b[j])
j++
}
}
d = append(d, a[i:len(a)]...)
d = append(d, b[j:len(b)]...)
return d
}
func sortIfNeeded(a []string) []string {
if sort.StringsAreSorted(a) {
return a
}
s := append(a[:0:0], a...)
sort.Strings(s)
return s
}
If you know for sure that the slices are already sorted, you can remove the calls to sortIfNeeded (the reason for the defensive slice copy in sortIfNeeded is because sorting is done in-place, so we would be modifying the slices that are passed to diff).
See https://play.golang.org/p/lH-5L0aL1qr for tests showing correctness in face of duplicated entries.
I have this example but it works only for the elements of the first array "not present" in the second array
with generics
type HandleDiff[T comparable] func(item1 T, item2 T) bool
func HandleDiffDefault[T comparable](val1 T, val2 T) bool {
return val1 == val2
}
func Diff[T comparable](items1 []T, items2 []T, callback HandleDiff[T]) []T {
acc := []T{}
for _, item1 := range items1 {
find := false
for _, item2 := range items2 {
if callback(item1, item2) {
find = true
break
}
}
if !find {
acc = append(acc, item1)
}
}
return acc
}
usage
diff := Diff(items1, items2, HandleDiffDefault[string])
Why not keep it simple and use labels?
// returns items unique to slice1
func difference(slice1, slice2 []string) []string {
var diff []string
outer:
for _, v1 := range slice1 {
for _, v2 := range slice2 {
if v1 == v2 {
continue outer
}
}
diff = append(diff, v1)
}
return diff
}
https://go.dev/play/p/H46zSpfocHp
I would add a small change to the solution by #peterwilliams97, so that we can ignore the order of the input.
func difference(a, b []string) []string {
// reorder the input,
// so that we can check the longer slice over the shorter one
longer, shorter := a, b
if len(b) > len(a) {
longer, shorter = b, a
}
mb := make(map[string]struct{}, len(shorter))
for _, x := range shorter {
mb[x] = struct{}{}
}
var diff []string
for _, x := range longer {
if _, found := mb[x]; !found {
diff = append(diff, x)
}
}
return diff
}
The code below gives the absolute difference between strings regardless of the order. Space complexity O(n) and Time complexity O(n).
// difference returns the elements in a that aren't in b
func difference(a, b string) string {
longest, shortest := longestString(&a, &b)
var builder strings.Builder
var mem = make(map[rune]bool)
for _, s := range longest {
mem[s] = true
}
for _, s := range shortest {
if _, ok := mem[s]; ok {
mem[s] = false
}
}
for k, v := range mem {
if v == true {
builder.WriteRune(k)
}
}
return builder.String()
}
func longestString(a *string, b *string) ([]rune, []rune) {
if len(*a) > len(*b) {
return []rune(*a), []rune(*b)
}
return []rune(*b), []rune(*a)
}

Resources