Go: What is the fastest/cleanest way to remove multiple entries from a slice? - go

How would you implement the deleteRecords function in the code below:
Example:
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords( l *RecordList, ids []int ) {
// Assume the RecordList can contain several 100 entries.
// and the number of the of the records to be removed is about 10.
// What is the fastest and cleanest ways to remove the records that match
// the id specified in the records list.
}

I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:
func deleteRecords(data []*Record, ids []int) []*Record {
w := 0 // write index
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
data[w] = x
w++
}
return data[:w]
}
You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.
func reorder(data []*Record, ids []int) []*Record {
n := len(data)
i := 0
loop:
for i < n {
r := data[i]
for _, id := range ids {
if id == r.id {
data[i] = data[n-1]
n--
continue loop
}
}
i++
}
return data[0:n]
}
As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.
If you wish to preserve original contents of the slice, something like this is more appropriate:
func deletePreserve(data []*Record, ids []int) []*Record {
wdata := make([]*Record, len(data))
w := 0
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
wdata[w] = x
w++
}
return wdata[0:w]
}

For a personal project, I did something like this:
func filter(sl []int, fn func(int) bool) []int {
result := make([]int, 0, len(sl))
last := 0
for i, v := range sl {
if fn(v) {
result = append(result, sl[last:i]...)
last = i + 1
}
}
return append(result, sl[last:]...)
}
It doesn't mutate the original, but should be relatively efficient.
It's probably better to just do:
func filter(sl []int, fn func(int) bool) (result []int) {
for _, v := range sl {
if !fn(v) {
result = append(result, v)
}
}
return
}
Simpler and cleaner.
If you want to do it in-place, you probably want something like:
func filter(sl []int, fn func(int) bool) []int {
outi := 0
res := sl
for _, v := range sl {
if !fn(v) {
res[outi] = v
outi++
}
}
return res[0:outi]
}
You can optimize this to use copy to copy ranges of elements, but that's twice
the code and probably not worth it.
So, in this specific case, I'd probably do something like:
func deleteRecords(l []*Record, ids []int) []*Record {
outi := 0
L:
for _, v := range l {
for _, id := range ids {
if v.id == id {
continue L
}
}
l[outi] = v
outi++
}
return l[0:outi]
}
(Note: untested.)
No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.

For the case you described, where len(ids) is approximately 10 and len(*l) is in the several hundreds, this should be relatively fast, since it minimizes memory allocations by updating in place.
package main
import (
"fmt"
"strconv"
)
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords(l *RecordList, ids []int) {
rl := *l
for i := 0; i < len(rl); i++ {
rid := rl[i].id
for j := 0; j < len(ids); j++ {
if rid == ids[j] {
copy(rl[i:len(*l)-1], rl[i+1:])
rl[len(rl)-1] = nil
rl = rl[:len(rl)-1]
break
}
}
}
*l = rl
}
func main() {
l := make(RecordList, 777)
for i := range l {
l[i] = &Record{int(i), "name #" + strconv.Itoa(i)}
}
ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)}
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
deleteRecords(&l, ids)
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
}
Output:
[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776}
[0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}

Instead of repeatedly searching ids, you could use a map. This code preallocates the full size of the map, and then just moves array elements in place. There are no other allocations.
func deleteRecords(l *RecordList, ids []int) {
m := make(map[int]bool, len(ids))
for _, id := range ids {
m[id] = true
}
s, x := *l, 0
for _, r := range s {
if !m[r.id] {
s[x] = r
x++
}
}
*l = s[0:x]
}

Use the vector package's Delete method as a guide, or just use a Vector instead of a slice.

Here is one option but I would hope there are cleaner/faster more functional looking ones:
func deleteRecords( l *RecordList, ids []int ) *RecordList {
var newList RecordList
for _, rec := range l {
toRemove := false
for _, id := range ids {
if rec.id == id {
toRemove = true
}
if !toRemove {
newList = append(newList, rec)
}
}
return newList
}

With large enough l and ids it will be more effective to Sort() both lists first and then do a single loop over them instead of two nested loops

Related

Struct value doesn't change with for loop range

I have a for-loop that iterates over my slice of User. But:
-When i use for with range this is my result
for _, u := range users {
val := calcMem(u.sessionid)
// total += val
u.setMem(val)
}
Result:
[{user1 dp-tcp#64 2 0} {user2 dp-tcp#62 0} {user3 dp-tcp#83 4 0}]
-When i use a simple for loop:
for i := 0; i < len(users); i++ {
val := calcMem(users[i].sessionid)
// total += val
users[i].setMem(val)
}
Result:
[{user1 dp-tcp#64 2 5287.092000000001} {user2 dp-tcp#62 3589.383999999999} {user3 dp-tcp#83 4 3956.012}]
Where i doing something wrong ?
-setMem function:
func (u *user) setMem(value float64) {
u.memusage = value
}
If you iteratw with way
for _, u := range users {
}
you get a copy of the slice value. So if you change it you change this copy.
If you wish you may take value by index and modify it:
for i := range users {
users[i] = ....
}

How to group by then merge slices with duplicated values in Go

Excuse me, this is my first Stackoverflow question, so, any tips/advice on what I can do to improve it would be wonderful, in addition to some help.
The problem:
I have a slice that I am trying to group into smaller slices by certain criteria. I then need to merge the newly created slices with each other if they contain any of the same values in the slice. (Essentially, appending slices together that have "overlapping" values).
Some additional notes about the problem:
The number of items in the original slice will likely be between 1-50, in most cases, with outliers rarely exceeding 100.
Once gropued, the size of the 'inside' slices will be between 1-10 values.
Performance is a factor, as this operation will be run as part of a webservice where a single request will perform this operation 20+ times, and there can be many (hundreds - thousands) of requests per minute at peak times. However, clarity of code is also important.
My implementation is using ints, the final implementation would have more complex structs, though I was considering making a map and then use the implementation shown below based upon the keys. Is this a good idea?
I have broken the problem down into a few steps:
Create a 2D slice with groupings of values, based up criteria (the initial grouping phase)
Attempt to merge slices in place if they include a duplicated value.
I am running into two problems:
First, I think my implementation might not scale super well, as it tends to have some nested loops (however, these loops will be iterating on small slices, so that might be ok)
Second, my implementation is requiring an extra step at the end to remove duplicated values, ideally we should remove it.
Input: [ 100, 150, 300, 350, 600, 700 ]
Expected Output: [[100 150 300 350] [600 700]]
This is with the 'selection criteria' of grouping values that are within 150 units of at least one other value in the slice.
And the code (Go Playground link) :
package main
import (
"fmt"
"sort"
)
func filter(vs []int, f func(int) bool) []int {
vsf := make([]int, 0)
for _, v := range vs {
if f(v) {
vsf = append(vsf, v)
}
}
return vsf
}
func unique(intSlice []int) []int {
keys := make(map[int]bool)
list := []int{}
for _, entry := range intSlice {
if _, value := keys[entry]; !value {
keys[entry] = true
list = append(list, entry)
}
}
return list
}
func contains(intSlice []int, searchInt int) bool {
for _, value := range intSlice {
if value == searchInt {
return true
}
}
return false
}
func compare(a, b []int) bool {
if len(a) != len(b) {
return false
}
if (a == nil) != (b == nil) {
return false
}
b = b[:len(a)]
for i, v := range a {
if v != b[i] {
return false
}
}
return true
}
func main() {
fmt.Println("phase 1 - initial grouping")
s := []int{100, 150, 300, 350, 600, 700}
g := make([][]int, 0)
// phase 1
for _, v := range s {
t := filter(s, func(i int) bool { return i - v >= -150 && i - v <= 150 })
for _, v1 := range t {
t1 := filter(s, func(i int) bool { return i - v1 >= -150 && i - v1 <= 150})
t = unique(append(t, t1...))
sort.Ints(t)
}
g = append(g, t)
fmt.Println(g)
}
// phase 2
fmt.Println("phase 2 - merge in place")
for i, tf := range g {
for _, death := range tf {
if i < len(g) - 1 && contains(g[i+1], death) {
g[i+1] = unique(append(g[i], g[i+1]...))
g = g[i+1:]
} else if i == len(g) - 1 {
fmt.Println(g[i], g[i-1])
// do some cleanup to make sure the last two items of the array don't include duplicates
if compare(g[i-1], g[i]) {
g = g[:i]
}
}
}
fmt.Println(i, g)
}
}
Not sure what you are actually asking, and the problem isn't fully defined.
So here's a version that is more efficient
If input is not sorted and output order matters, then this is a bad solution.
Here it is (on Play)
package main
import (
"fmt"
)
// Input: [ 100, 150, 300, 350, 600, 700 ] Expected Output: [[100 150 300 350] [600 700]]
func main() {
input := []int{100, 150, 300, 350, 600, 700}
fmt.Println("Input:", input)
fmt.Println("Output:", groupWithin150(input))
}
func groupWithin150(ints []int) [][]int {
var ret [][]int
// Your example input was sorted, if the inputs aren't actually sorted, then uncomment this
// sort.Ints(ints)
var group []int
for idx, i := range ints {
if idx > 0 && i-150 > group[len(group)-1] {
ret = append(ret, group)
group = make([]int, 0)
}
group = append(group, i)
}
if len(group) > 0 {
ret = append(ret, group)
}
return ret
}

How to get intersection of two slice in golang?

Is there any efficient way to get intersection of two slices in Go?
I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter
How do I get the intersection between two arrays as a new array?
Simple Intersection: Compare each element in A to each in B (O(n^2))
Hash Intersection: Put them into a hash table (O(n))
Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))
All of which are implemented here
https://github.com/juliangruber/go-intersect
simple, generic and mutiple slices ! (Go 1.18)
Time Complexity : may be linear
func interSection[T constraints.Ordered](pS ...[]T) []T {
hash := make(map[T]*int) // value, counter
result := make([]T, 0)
for _, slice := range pS {
duplicationHash := make(map[T]bool) // duplication checking for individual slice
for _, value := range slice {
if _, isDup := duplicationHash[value]; !isDup { // is not duplicated in slice
if counter := hash[value]; counter != nil { // is found in hash counter map
if *counter++; *counter >= len(pS) { // is found in every slice
result = append(result, value)
}
} else { // not found in hash counter map
i := 1
hash[value] = &i
}
duplicationHash[value] = true
}
}
}
return result
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Println(interSection(slice1, slice2))
// [foo bar]
ints1 := []int{1, 2, 3, 9, 8}
ints2 := []int{10, 4, 2, 4, 8, 9} // have duplicated values
ints3 := []int{2, 4, 8, 1}
fmt.Println(interSection(ints1, ints2, ints3))
// [2 8]
}
playground : https://go.dev/play/p/lE79D0kOznZ
It's a best method for intersection two slice. Time complexity is too low.
Time Complexity : O(m+n)
m = length of first slice.
n = length of second slice.
func intersection(s1, s2 []string) (inter []string) {
hash := make(map[string]bool)
for _, e := range s1 {
hash[e] = true
}
for _, e := range s2 {
// If elements present in the hashmap then append intersection list.
if hash[e] {
inter = append(inter, e)
}
}
//Remove dups from slice.
inter = removeDups(inter)
return
}
//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
encountered := make(map[string]bool)
for _, element := range elements {
if !encountered[element] {
nodups = append(nodups, element)
encountered[element] = true
}
}
return
}
if there exists no blank in your []string, maybe you need this simple code:
func filter(src []string) (res []string) {
for _, s := range src {
newStr := strings.Join(res, " ")
if !strings.Contains(newStr, s) {
res = append(res, s)
}
}
return
}
func intersections(section1, section2 []string) (intersection []string) {
str1 := strings.Join(filter(section1), " ")
for _, s := range filter(section2) {
if strings.Contains(str1, s) {
intersection = append(intersection, s)
}
}
return
}
Try it
https://go.dev/play/p/eGGcyIlZD6y
first := []string{"one", "two", "three", "four"}
second := []string{"two", "four"}
result := intersection(first, second) // or intersection(second, first)
func intersection(first, second []string) []string {
out := []string{}
bucket := map[string]bool{}
for _, i := range first {
for _, j := range second {
if i == j && !bucket[i] {
out = append(out, i)
bucket[i] = true
}
}
}
return out
}
https://github.com/viant/toolbox/blob/a46fd679bbc5d07294b1d1b646aeacd44e2c7d50/collections.go#L869-L920
Another O(m+n) Time Complexity solution that uses a hashmap.
It has two differences compared to the other solutions discussed here.
Passing the target slice as a parameter instead of new slice returned
Faster to use for commonly used types like string/int instead of reflection for all
Yes there are a few different ways to go about it.. Here's an example that can be optimized.
package main
import "fmt"
func intersection(a []string, b []string) (inter []string) {
// interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
low, high := a, b
if len(a) > len(b) {
low = b
high = a
}
done := false
for i, l := range low {
for j, h := range high {
// get future index values
f1 := i + 1
f2 := j + 1
if l == h {
inter = append(inter, h)
if f1 < len(low) && f2 < len(high) {
// if the future values aren't the same then that's the end of the intersection
if low[f1] != high[f2] {
done = true
}
}
// we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
high = high[:j+copy(high[j:], high[j+1:])]
break
}
}
// nothing in the future so we are done
if done {
break
}
}
return
}
func main() {
slice1 := []string{"foo", "bar", "hello", "bar"}
slice2 := []string{"foo", "bar"}
fmt.Printf("%+v\n", intersection(slice1, slice2))
}
Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.
func intersectionString(a []string, b []string) (inter []string),
func intersectionInt(a []int, b []int) (inter []int),
func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect
You can then create your own package and reuse once you settle how you want to implement it.
package intersection
func String(a []string, b []string) (inter []string)
func Int(a []int, b []int) (inter []int)
func Float64(a []Float64, b []Float64) (inter []Float64)

How to check the uniqueness inside a for-loop?

Is there a way to check slices/maps for the presence of a value?
I would like to add a value to a slice only if it does not exist in the slice.
This works, but it seems verbose. Is there a better way to do this?
orgSlice := []int{1, 2, 3}
newSlice := []int{}
newInt := 2
newSlice = append(newSlice, newInt)
for _, v := range orgSlice {
if v != newInt {
newSlice = append(newSlice, v)
}
}
newSlice == [2 1 3]
Your approach would take linear time for each insertion. A better way would be to use a map[int]struct{}. Alternatively, you could also use a map[int]bool or something similar, but the empty struct{} has the advantage that it doesn't occupy any additional space. Therefore map[int]struct{} is a popular choice for a set of integers.
Example:
set := make(map[int]struct{})
set[1] = struct{}{}
set[2] = struct{}{}
set[1] = struct{}{}
// ...
for key := range(set) {
fmt.Println(key)
}
// each value will be printed only once, in no particular order
// you can use the ,ok idiom to check for existing keys
if _, ok := set[1]; ok {
fmt.Println("element found")
} else {
fmt.Println("element not found")
}
Most efficient is likely to be iterating over the slice and appending if you don't find it.
func AppendIfMissing(slice []int, i int) []int {
for _, ele := range slice {
if ele == i {
return slice
}
}
return append(slice, i)
}
It's simple and obvious and will be fast for small lists.
Further, it will always be faster than your current map-based solution. The map-based solution iterates over the whole slice no matter what; this solution returns immediately when it finds that the new value is already present. Both solutions compare elements as they iterate. (Each map assignment statement certainly does at least one map key comparison internally.) A map would only be useful if you could maintain it across many insertions. If you rebuild it on every insertion, then all advantage is lost.
If you truly need to efficiently handle large lists, consider maintaining the lists in sorted order. (I suspect the order doesn't matter to you because your first solution appended at the beginning of the list and your latest solution appends at the end.) If you always keep the lists sorted then you you can use the sort.Search function to do efficient binary insertions.
Another option:
package main
import "golang.org/x/tools/container/intsets"
func main() {
var (
a intsets.Sparse
b bool
)
b = a.Insert(9)
println(b) // true
b = a.Insert(9)
println(b) // false
}
https://pkg.go.dev/golang.org/x/tools/container/intsets
This option if the number of missing numbers is unknown
AppendIfMissing := func(sl []int, n ...int) []int {
cache := make(map[int]int)
for _, elem := range sl {
cache[elem] = elem
}
for _, elem := range n {
if _, ok := cache[elem]; !ok {
sl = append(sl, elem)
}
}
return sl
}
distincting a array of a struct :
func distinctObjects(objs []ObjectType) (distinctedObjs [] ObjectType){
var output []ObjectType
for i:= range objs{
if output==nil || len(output)==0{
output=append(output,objs[i])
} else {
founded:=false
for j:= range output{
if output[j].fieldname1==objs[i].fieldname1 && output[j].fieldname2==objs[i].fieldname2 &&......... {
founded=true
}
}
if !founded{
output=append(output,objs[i])
}
}
}
return output
}
where the struct here is something like :
type ObjectType struct {
fieldname1 string
fieldname2 string
.........
}
the object will distinct by checked fields here :
if output[j].fieldname1==objs[i].fieldname1 && output[j].fieldname2==objs[i].fieldname2 &&......... {

Is there a foreach loop in Go?

Is there a foreach construct in the Go language?
Can I iterate over a slice or array using a for?
From For statements with range clause:
A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel.
For each entry it assigns iteration values to corresponding iteration
variables and then executes the block.
As an example:
for index, element := range someSlice {
// index is the index where we are
// element is the element from someSlice for where we are
}
If you don't care about the index, you can use _:
for _, element := range someSlice {
// element is the element from someSlice for where we are
}
The underscore, _, is the blank identifier, an anonymous placeholder.
Go has a foreach-like syntax. It supports arrays/slices, maps and channels.
Iterate over an array or a slice:
// index and value
for i, v := range slice {}
// index only
for i := range slice {}
// value only
for _, v := range slice {}
Iterate over a map:
// key and value
for key, value := range theMap {}
// key only
for key := range theMap {}
// value only
for _, value := range theMap {}
Iterate over a channel:
for v := range theChan {}
Iterating over a channel is equivalent to receiving from a channel until it is closed:
for {
v, ok := <-theChan
if !ok {
break
}
}
Following is the example code for how to use foreach in Go:
package main
import (
"fmt"
)
func main() {
arrayOne := [3]string{"Apple", "Mango", "Banana"}
for index,element := range arrayOne{
fmt.Println(index)
fmt.Println(element)
}
}
This is a running example https://play.golang.org/p/LXptmH4X_0
Yes, range:
The range form of the for loop iterates over a slice or map.
When ranging over a slice, two values are returned for each iteration. The first is the index, and the second is a copy of the element at that index.
Example:
package main
import "fmt"
var pow = []int{1, 2, 4, 8, 16, 32, 64, 128}
func main() {
for i, v := range pow {
fmt.Printf("2**%d = %d\n", i, v)
}
for i := range pow {
pow[i] = 1 << uint(i) // == 2**i
}
for _, value := range pow {
fmt.Printf("%d\n", value)
}
}
You can skip the index or value by assigning to _.
If you only want the index, drop the , value entirely.
The following example shows how to use the range operator in a for loop to implement a foreach loop.
func PrintXml (out io.Writer, value interface{}) error {
var data []byte
var err error
for _, action := range []func() {
func () { data, err = xml.MarshalIndent(value, "", " ") },
func () { _, err = out.Write([]byte(xml.Header)) },
func () { _, err = out.Write(data) },
func () { _, err = out.Write([]byte("\n")) }} {
action();
if err != nil {
return err
}
}
return nil;
}
The example iterates over an array of functions to unify the error handling for the functions. A complete example is at Google´s playground.
PS: it shows also that hanging braces are a bad idea for the readability of code. Hint: the for condition ends just before the action() call. Obvious, isn't it?
You can in fact use range without referencing its return values by using for range against your type:
arr := make([]uint8, 5)
i,j := 0,0
for range arr {
fmt.Println("Array Loop", i)
i++
}
for range "bytes" {
fmt.Println("String Loop", j)
j++
}
https://play.golang.org/p/XHrHLbJMEd
This may be obvious, but you can inline the array like so:
package main
import (
"fmt"
)
func main() {
for _, element := range [3]string{"a", "b", "c"} {
fmt.Print(element)
}
}
outputs:
abc
https://play.golang.org/p/gkKgF3y5nmt
I'm seeing a lot of examples using range. Just a heads up that range creates a copy of whatever you're iterating over. If you make changes to the contents in a foreach range you will not be changing the values in the original container, in that case you'll need a traditional for loop with an index you increment and deference indexed reference. E.g.:
for i := 0; i < len(arr); i++ {
element := &arr[i]
element.Val = newVal
}
I have just implemented this library: https://github.com/jose78/go-collection.
This is an example of how to use the Foreach loop:
package main
import (
"fmt"
col "github.com/jose78/go-collection/collections"
)
type user struct {
name string
age int
id int
}
func main() {
newList := col.ListType{user{"Alvaro", 6, 1}, user{"Sofia", 3, 2}}
newList = append(newList, user{"Mon", 0, 3})
newList.Foreach(simpleLoop)
if err := newList.Foreach(simpleLoopWithError); err != nil{
fmt.Printf("This error >>> %v <<< was produced", err )
}
}
var simpleLoop col.FnForeachList = func(mapper interface{}, index int) {
fmt.Printf("%d.- item:%v\n", index, mapper)
}
var simpleLoopWithError col.FnForeachList = func(mapper interface{}, index int) {
if index > 1{
panic(fmt.Sprintf("Error produced with index == %d\n", index))
}
fmt.Printf("%d.- item:%v\n", index, mapper)
}
The result of this execution should be:
0.- item:{Alvaro 6 1}
1.- item:{Sofia 3 2}
2.- item:{Mon 0 3}
0.- item:{Alvaro 6 1}
1.- item:{Sofia 3 2}
Recovered in f Error produced with index == 2
ERROR: Error produced with index == 2
This error >>> Error produced with index == 2
<<< was produced
Try this code in playGrounD.

Resources