Golang's equivalent of itertools.chain? - go

In Golang, how can I iterate over three slices with a single for loop, without creating a new slice containing copies of the elements in all three slices? Is there something like Python's itertools.chain?

A simple solution using generics
package main
import "fmt"
func Chain[T any](f func(e T), slices ...[]T) {
for _, slice := range slices {
for _, e := range slice {
f(e)
}
}
}
func main() {
slice1 := []int{1, 2, 3}
slice2 := []int{10, 20, 30}
slice3 := []int{100, 200, 300}
Chain(func(e int) {
fmt.Println(e)
}, slice1, slice2, slice3)
}
Output:
1
2
3
10
20
30
100
200
300
The function Chain takes a function parameter, which will be executed for each element in each slice consecutively. This function will serve as your loop body code.
The solution can be extended to retain other looping features such as break and index numbers:
func Chain[T any](f func(i int, e T) bool, slices ...[]T) {
var i int
for _, slice := range slices {
for _, e := range slice {
if !f(i, e) {
return
}
i++
}
}
}
...
Chain(func(i int, e int) bool {
fmt.Println(i, "-", e)
return (e <= 20)
}, slice1, slice2, slice3)
...
The loop will "break" when f returns false.
Output:
0 - 1
1 - 2
2 - 3
3 - 10
4 - 20
5 - 30

You could use channels to build your own chain() method.
package main
import "fmt"
func main() {
slice1 := []int{2, 3, 5, 7, 11, 13}
slice2 := []int{21321, 12313, 213}
slice3 := []int{8987, 988, 675676, 6587686}
for val := range chain(slice1, slice2, slice3) {
fmt.Println(val)
}
}
func chain[T any](slices ...[]T) <-chan T {
channel := make(chan T)
go func() {
for _, slice := range slices {
for _, val := range slice {
channel <- val
}
}
close(channel)
}()
return channel
}
Here the link to Go Playground.
Essentially the idea is to use variadic argument slices which can hold and undetermined number of slices and in the function we then create a channel of the given type return that channel so a caller can retrieve values from it and before we do that we start a Goroutine which will actually send the values over the channel. In order to be able to use range we need to close() the channel once we have looped over all slices.
This uses Generics which are available as of Go 1.18 which has just recently been released.
Edit
As correctly indicated by Zombo you will leak a goroutine if you break in the range loop above as not all values sent to the channel will be retrieved from it.
In order to get a break behavior without leaking a go routine you could pass a predicate function which, when it returns true will close the channel and therefore prevent a leak.
Similarly you could pass callback functions for all kinds of stuff (e.g. filtering) according to your use case.
package main
import "fmt"
func main() {
slice1 := []int{2, 3, 5, 7, 11, 13}
slice2 := []int{21321, 12313, 213}
slice3 := []int{8987, 988, 675676, 6587686}
for val := range chainUntil(func(item int) bool {
return item > 6
}, slice1, slice2, slice3) {
fmt.Println(val)
}
}
func chainUntil[T any](predicate func(item T) bool, slices ...[]T) <-chan T {
channel := make(chan T)
go func() {
for _, slice := range slices {
for _, val := range slice {
if predicate(val) == true {
close(channel)
return
}
channel <- val
}
}
close(channel)
}()
return channel
}
Expected output:
2
3
5

Related

Simple mapReduce operation on strings

I have a list of strings
elems := [n]string{...}
I want to perform a simple mapReduce operation, such that I
Map every string to a different string, let's say string -> $string
Reduce all the strings to one string with a separator, e.g. {s1, s2, s3} -> s1#s2#s3
all in all: {s1, s2, s3} -> $s1#$s2#$s3
What's the best way to do this?
I'm looking for efficiency and readability
Bonus points if it's generic enough to work not only on strings
For mapping just a list, you won't have much choice other than to go over each string. If the transform algo is time-consuming and you need speed, you can consider splitting the job and use a go routine. Finally you can use the strings.Join function which has an option to specify a separator, this normally performs the reduce part efficiently. The size of the dataset can also be a consideration, and for larger sized lists you may want to compare performance with strings.Join and your own customized algo and see if you want to use multiple go routines/channels to achieve what you want to.
If you don't need to do the 2 things separately, the end result can be achieved simply by using strings.Join():
package main
import (
"fmt"
"strings"
)
func main() {
a := []string{"a", "b", "c"}
p := "$"
fmt.Println(p + strings.Join(a[:], "#"+p))
}
prints $a#$b#$c
playground
Go is explicitly NOT a functional programming language.
You map and reduce using a for loop.
a := []string{"a", "b", "c"}
result := "initvalue"
for n, i := range a {
result += i + string(n)
}
If you are not going to perform any sort of IO operations inside your map functions (means they are doing just some computations), making it concurrent would make it slower for sure and even if you are doing some IO, you should benchmark. Concurrency would not make things faster necessarily and some times add unnecessary complications. In many cases just a simple for loop is sufficient.
If the map functions here are IO bound or are doing some sort of computation heavy calculations that do benefit from going concurrent, solutions can vary. For example NATS can be used to go beyond one machine and distribute the workload.
This is a relatively simple sample. Reduce phase is not multistage and is blocking:
import (
"fmt"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
)
type elem struct {
index int
value interface{}
}
func feed(elems []interface{}) <-chan elem {
result := make(chan elem)
go func() {
for k, v := range elems {
e := elem{
index: k,
value: v,
}
result <- e
}
close(result)
}()
return result
}
func mapf(
input <-chan elem,
mapFunc func(elem) elem) <-chan elem {
result := make(chan elem)
go func() {
for e := range input {
eres := mapFunc(e)
result <- eres
}
close(result)
}()
return result
}
// is blocking
func reducef(
input <-chan elem,
reduceFunc func([]interface{}) interface{}) interface{} {
buffer := make(map[int]interface{})
l := 0
for v := range input {
buffer[v.index] = v.value
if v.index > l {
l = v.index
}
}
data := make([]interface{}, l+1)
for k, v := range buffer {
data[k] = v
}
return reduceFunc(data)
}
func fanOutIn(
elemFeed <-chan elem,
mapFunc func(elem) elem, mapCount int,
reduceFunc func([]interface{}) interface{}) interface{} {
MR := make(chan elem)
wg := &sync.WaitGroup{}
for i := 0; i < mapCount; i++ {
mapResult := mapf(elemFeed, mapFunc)
wg.Add(1)
go func() {
defer wg.Done()
for v := range mapResult {
MR <- v
}
}()
}
go func() {
wg.Wait()
close(MR)
}()
return reducef(MR, reduceFunc)
}
func Test01(t *testing.T) {
elemFeed := feed([]interface{}{1, 2, 3})
finalResult := fanOutIn(
elemFeed,
func(e elem) elem {
return elem{
index: e.index,
value: fmt.Sprintf("[%v]", e.value),
}
},
3,
func(sl []interface{}) interface{} {
strRes := make([]string, len(sl))
for k, v := range sl {
strRes[k] = v.(string)
}
return strings.Join(strRes, ":")
})
assert.Equal(t, "[1]:[2]:[3]", finalResult)
}
And since it uses interface{} as the element type, it can get generalized.

How to get intersection of two slice in golang?

Is there any efficient way to get intersection of two slices in Go?
I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter
How do I get the intersection between two arrays as a new array?
Simple Intersection: Compare each element in A to each in B (O(n^2))
Hash Intersection: Put them into a hash table (O(n))
Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))
All of which are implemented here
https://github.com/juliangruber/go-intersect
simple, generic and mutiple slices ! (Go 1.18)
Time Complexity : may be linear
func interSection[T constraints.Ordered](pS ...[]T) []T {
hash := make(map[T]*int) // value, counter
result := make([]T, 0)
for _, slice := range pS {
duplicationHash := make(map[T]bool) // duplication checking for individual slice
for _, value := range slice {
if _, isDup := duplicationHash[value]; !isDup { // is not duplicated in slice
if counter := hash[value]; counter != nil { // is found in hash counter map
if *counter++; *counter >= len(pS) { // is found in every slice
result = append(result, value)
}
} else { // not found in hash counter map
i := 1
hash[value] = &i
}
duplicationHash[value] = true
}
}
}
return result
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Println(interSection(slice1, slice2))
// [foo bar]
ints1 := []int{1, 2, 3, 9, 8}
ints2 := []int{10, 4, 2, 4, 8, 9} // have duplicated values
ints3 := []int{2, 4, 8, 1}
fmt.Println(interSection(ints1, ints2, ints3))
// [2 8]
}
playground : https://go.dev/play/p/lE79D0kOznZ
It's a best method for intersection two slice. Time complexity is too low.
Time Complexity : O(m+n)
m = length of first slice.
n = length of second slice.
func intersection(s1, s2 []string) (inter []string) {
hash := make(map[string]bool)
for _, e := range s1 {
hash[e] = true
}
for _, e := range s2 {
// If elements present in the hashmap then append intersection list.
if hash[e] {
inter = append(inter, e)
}
}
//Remove dups from slice.
inter = removeDups(inter)
return
}
//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
encountered := make(map[string]bool)
for _, element := range elements {
if !encountered[element] {
nodups = append(nodups, element)
encountered[element] = true
}
}
return
}
if there exists no blank in your []string, maybe you need this simple code:
func filter(src []string) (res []string) {
for _, s := range src {
newStr := strings.Join(res, " ")
if !strings.Contains(newStr, s) {
res = append(res, s)
}
}
return
}
func intersections(section1, section2 []string) (intersection []string) {
str1 := strings.Join(filter(section1), " ")
for _, s := range filter(section2) {
if strings.Contains(str1, s) {
intersection = append(intersection, s)
}
}
return
}
Try it
https://go.dev/play/p/eGGcyIlZD6y
first := []string{"one", "two", "three", "four"}
second := []string{"two", "four"}
result := intersection(first, second) // or intersection(second, first)
func intersection(first, second []string) []string {
out := []string{}
bucket := map[string]bool{}
for _, i := range first {
for _, j := range second {
if i == j && !bucket[i] {
out = append(out, i)
bucket[i] = true
}
}
}
return out
}
https://github.com/viant/toolbox/blob/a46fd679bbc5d07294b1d1b646aeacd44e2c7d50/collections.go#L869-L920
Another O(m+n) Time Complexity solution that uses a hashmap.
It has two differences compared to the other solutions discussed here.
Passing the target slice as a parameter instead of new slice returned
Faster to use for commonly used types like string/int instead of reflection for all
Yes there are a few different ways to go about it.. Here's an example that can be optimized.
package main
import "fmt"
func intersection(a []string, b []string) (inter []string) {
// interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
low, high := a, b
if len(a) > len(b) {
low = b
high = a
}
done := false
for i, l := range low {
for j, h := range high {
// get future index values
f1 := i + 1
f2 := j + 1
if l == h {
inter = append(inter, h)
if f1 < len(low) && f2 < len(high) {
// if the future values aren't the same then that's the end of the intersection
if low[f1] != high[f2] {
done = true
}
}
// we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
high = high[:j+copy(high[j:], high[j+1:])]
break
}
}
// nothing in the future so we are done
if done {
break
}
}
return
}
func main() {
slice1 := []string{"foo", "bar", "hello", "bar"}
slice2 := []string{"foo", "bar"}
fmt.Printf("%+v\n", intersection(slice1, slice2))
}
Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.
func intersectionString(a []string, b []string) (inter []string),
func intersectionInt(a []int, b []int) (inter []int),
func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect
You can then create your own package and reuse once you settle how you want to implement it.
package intersection
func String(a []string, b []string) (inter []string)
func Int(a []int, b []int) (inter []int)
func Float64(a []Float64, b []Float64) (inter []Float64)

How to collect values from N goroutines executed in a specific order?

Below is a struct of type Stuff. It has three ints. A Number, its Double and its Power. Let's pretend that calculating the double and power of a given list of ints is an expensive computation.
type Stuff struct {
Number int
Double int
Power int
}
func main() {
nums := []int{2, 3, 4} // given numbers
stuff := []Stuff{} // struct of stuff with transformed ints
double := make(chan int)
power := make(chan int)
for _, i := range nums {
go doubleNumber(i, double)
go powerNumber(i, power)
}
// How do I get the values back in the right order?
fmt.Println(stuff)
}
func doubleNumber(i int, c chan int) {
c <- i + i
}
func powerNumber(i int, c chan int) {
c <- i * i
}
The result of fmt.Println(stuff) should be the same as if stuff was initialized like:
stuff := []Stuff{
{Number: 2, Double: 4, Power: 4}
{Number: 3, Double: 6, Power: 9}
{Number: 4, Double: 8, Power: 16}
}
I know I can use <- double and <- power to collect values from the channels, but I don't know what double / powers belong to what numbers.
Goroutines run concurrently, independently, so without explicit synchronization you can't predict execution and completion order. So as it is, you can't pair returned numbers with the input numbers.
You can either return more data (e.g. the input number and the output, wrapped in a struct for example), or pass pointers to the worker functions (launched as new goroutines), e.g. *Stuff and have the goroutines fill the calculated data in the Stuff itself.
Returning more data
I will use a channel type chan Pair where Pair is:
type Pair struct{ Number, Result int }
So calculation will look like this:
func doubleNumber(i int, c chan Pair) { c <- Pair{i, i + i} }
func powerNumber(i int, c chan Pair) { c <- Pair{i, i * i} }
And I will use a map[int]*Stuff because collectable data comes from multiple channels (double and power), and I want to find the appropriate Stuff easily and fast (pointer is required so I can also modify it "in the map").
So the main function:
nums := []int{2, 3, 4} // given numbers
stuffs := map[int]*Stuff{}
double := make(chan Pair)
power := make(chan Pair)
for _, i := range nums {
go doubleNumber(i, double)
go powerNumber(i, power)
}
// How do I get the values back in the right order?
for i := 0; i < len(nums)*2; i++ {
getStuff := func(number int) *Stuff {
s := stuffs[number]
if s == nil {
s = &Stuff{Number: number}
stuffs[number] = s
}
return s
}
select {
case p := <-double:
getStuff(p.Number).Double = p.Result
case p := <-power:
getStuff(p.Number).Power = p.Result
}
}
for _, v := range nums {
fmt.Printf("%+v\n", stuffs[v])
}
Output (try it on the Go Playground):
&{Number:2 Double:4 Power:4}
&{Number:3 Double:6 Power:9}
&{Number:4 Double:8 Power:16}
Using pointers
Since now we're passing *Stuff values, we can "pre-fill" the input number in the Stuff itself.
But care must be taken, you can only read/write values with proper synchronization. Easiest is to wait for all "worker" goroutines to finish their jobs.
var wg = &sync.WaitGroup{}
func main() {
nums := []int{2, 3, 4} // given numbers
stuffs := make([]Stuff, len(nums))
for i, n := range nums {
stuffs[i].Number = n
wg.Add(2)
go doubleNumber(&stuffs[i])
go powerNumber(&stuffs[i])
}
wg.Wait()
fmt.Printf("%+v", stuffs)
}
func doubleNumber(s *Stuff) {
defer wg.Done()
s.Double = s.Number + s.Number
}
func powerNumber(s *Stuff) {
defer wg.Done()
s.Power = s.Number * s.Number
}
Output (try it on the Go Playground):
[{Number:2 Double:4 Power:4} {Number:3 Double:6 Power:9} {Number:4 Double:8 Power:16}]
Writing different slice elements concurrently
Also note that since you can write different array or slice elements concurrently (for details see Can I concurrently write different slice elements), you can write the results directly in a slice without channels. See Refactor code to use a single channel in an idiomatic way how this can be done.
Personally, I would use a chan Stuff to pass the results back on, then spin up goroutines computing a full Stuff and pass it back. If you need the various part of a single Stuff computed concurrently, you can spawn goroutines from each goroutine, using dedicated channels. Once you've collected all the results, you can then (optionally) sort the slice with the accumulated values.
Example of what I mean below (you could, in principle, use a sync.WaitGroup to coordinate things, but if the input count is known, you don't strictly speaking need it).
type Stuff struct {
number int64
double int64
square int64
}
// Compute a Stuff with individual computations in-line, send it out
func computeStuff(n int64, out chan<- Stuff) {
rv := Stuff{number: n}
rv.double = n * 2
rv.square = n * n
out <- rv
}
// Compute a Stuff with individual computations concurrent
func computeStuffConcurrent(n int64, out chan<- Stuff) {
rv := Stuff{number: n}
dc := make(chan int64)
sc := make(chan int64)
defer close(dc)
defer close(sc)
go double(n, dc)
go square(n, sc)
rv.double = <-dc
rv.square = <-sc
out <- rv
}
func double(n int64, result chan<- int) {
result <- n * 2
}
func square(n int64, result chan<- int) {
result <- n * n
}
func main() {
inputs := []int64{1, 2, 3}
results := []Stuff{}
resultChannel := make(chan Stuff)
for _, input := range inputs {
go computeStuff(input, resultChannel)
// Or the concurrent version, if the extra performance is needed
}
for c := 0; c < len(inputs); c++ {
results = append(results, <- resultChannel)
}
// We now have all results, sort them if you need them sorted
}

slice of channels and concurrent function execution

How to create slice of channels and run function double(i) concurrently inside slice iteration:
package main
import (
"fmt"
"time"
)
func double(i int) int {
result := 2 * i
fmt.Println(result)
time.Sleep(500000000)
return result
}
func notParallel(arr []int) (outArr []int) {
for _, i := range arr {
outArr = append(outArr, double(i))
}
return
}
// how to do the same as notParallel func in parallel way.
// For each element of array double func should evaluate concuruntly
// without waiting each next element to eval
func parallel(arr []int) (outArr []int) {
var chans []chan int
for i := 0; i < len(arr); i++ {
chans[i] = make(chan int) // i = 0 : panic: runtime error: index out of range
}
for counter, number := range arr {
go func() {
chans[counter] <- double(number)
}()
}
return
}
func main() {
arr := []int{7, 8, 9}
fmt.Printf("%d\n", notParallel(arr))
fmt.Printf("%d\n", parallel(arr))
}
playground
As function double(i) sleeps for 500 ms function notParallel(arr []int) works for 1500 ms for 3 elements of arr []int but function parallel(arr []int) would work about 500 ms.
In my implementation have error...
panic: runtime error: index out of range
... on line ...
chans[i] = make(chan int) // i = 0
In this case, you don't need to use chan.
package main
import (
"fmt"
"sync"
"time"
)
func double(i int) int {
result := 2 * i
fmt.Println(result)
time.Sleep(500000000)
return result
}
func notParallel(arr []int) (outArr []int) {
for _, i := range arr {
outArr = append(outArr, double(i))
}
return
}
// how to do the same as notParallel func in parallel way.
// For each element of array double func should evaluate concuruntly
// without waiting each next element to eval
func parallel(arr []int) (outArr []int) {
outArr = make([]int, len(arr))
var wg sync.WaitGroup
for counter, number := range arr {
wg.Add(1)
go func(counter int, number int) {
outArr[counter] = double(number)
wg.Done()
}(counter, number)
}
wg.Wait()
return
}
func main() {
arr := []int{7, 8, 9}
fmt.Printf("%d\n", notParallel(arr))
fmt.Printf("%d\n", parallel(arr))
}
Because parallel must wait all of finish of goroutine(s).
And I notice your code doesn't work because you refer counter, number in same function scope.

How to convert interface{} to []int?

I am programming in Go programming language.
Say there's a variable of type interface{} that contains an array of integers. How do I convert interface{} back to []int?
I have tried
interface_variable.([]int)
The error I got is:
panic: interface conversion: interface is []interface {}, not []int
It's a []interface{} not just one interface{}, you have to loop through it and convert it:
the 2022 answer
https://go.dev/play/p/yeihkfIZ90U
func ConvertSlice[E any](in []any) (out []E) {
out = make([]E, 0, len(in))
for _, v := range in {
out = append(out, v.(E))
}
return
}
the pre-go1.18 answer
http://play.golang.org/p/R441h4fVMw
func main() {
a := []interface{}{1, 2, 3, 4, 5}
b := make([]int, len(a))
for i := range a {
b[i] = a[i].(int)
}
fmt.Println(a, b)
}
As others have said, you should iterate the slice and convert the objects one by one.
Is better to use a type switch inside the range in order to avoid panics:
a := []interface{}{1, 2, 3, 4, 5}
b := make([]int, len(a))
for i, value := range a {
switch typedValue := value.(type) {
case int:
b[i] = typedValue
break
default:
fmt.Println("Not an int: ", value)
}
}
fmt.Println(a, b)
http://play.golang.org/p/Kbs3rbu2Rw
Func return value is interface{} but real return value is []interface{}, so try this instead:
func main() {
values := returnValue.([]interface{})
for i := range values {
fmt.Println(values[i])
}
}

Resources