Checking if map has duplicate values in Go

Checking if map has duplicate values in Go - go

Consider the following map
mymap := make(map[string]string)
mymap["a"] = "one"
mymap["b"] = "two"
mymap["c"] = "one"
How to determine if the values are unique?
One strategy is to iterate through the map, create a slice of the values. Then iterate through the slice to find duplicates. Is there a better way?

If you just need true/false of whether there are dupes, without needing to know which values are dupes or how many dupes there are, the most efficient structure to use to track existing values is a map with empty struct values.
See here (pasted below for convenience):
package main
import (
"fmt"
)
func hasDupes(m map[string]string) bool {
x := make(map[string]struct{})
for _, v := range m {
if _, has := x[v]; has {
return true
}
x[v] = struct{}{}
}
return false
}
func main() {
mapWithDupes := make(map[string]string)
mapWithDupes["a"] = "one"
mapWithDupes["b"] = "two"
mapWithDupes["c"] = "one"
fmt.Println(hasDupes(mapWithDupes)) // prints true
mapWithoutDupes := make(map[string]string)
mapWithoutDupes["a"] = "one"
mapWithoutDupes["b"] = "two"
mapWithoutDupes["c"] = "three"
fmt.Println(hasDupes(mapWithoutDupes)) // prints false
}

func main() {
m := map[int]int{
1: 100,
2: 200,
3: 100,
4: 400,
6: 200,
7: 700,
}
mNew := make(map[int]int)
for k, v := range m {
if val, has := mNew[v]; !has {
mNew[v] = k
} else {
fmt.Println(k, m[k], ",", val, m[val])
}
}
swap the map key & value with a new map
Second map wont insert duplicate key, so you can find the values

Related

how to delete Duplicate elements between slices on golang

Example:
a_array := {"1","2","3","4,"}
b_array := {"3","4"}
Desired result:
"1","2"
With the assumption, a_array elements definitely has b_array elements.

If you need to strictly compare one slice against the other you may do something along the lines of
func diff(a []string, b []string) []string {
// Turn b into a map
var m map[string]bool
m = make(map[string]bool, len(b))
for _, s := range b {
m[s] = false
}
// Append values from the longest slice that don't exist in the map
var diff []string
for _, s := range a {
if _, ok := m[s]; !ok {
diff = append(diff, s)
continue
}
m[s] = true
}
// Sort the resulting slice
sort.Strings(diff)
return diff
}
Go Playground
Alternatively if you want to get all values from both slices that are not present in both of them you can do
func diff(a []string, b []string) []string {
var shortest, longest *[]string
if len(a) < len(b) {
shortest = &a
longest = &b
} else {
shortest = &b
longest = &a
}
// Turn the shortest slice into a map
var m map[string]bool
m = make(map[string]bool, len(*shortest))
for _, s := range *shortest {
m[s] = false
}
// Append values from the longest slice that don't exist in the map
var diff []string
for _, s := range *longest {
if _, ok := m[s]; !ok {
diff = append(diff, s)
continue
}
m[s] = true
}
// Append values from map that were not in the longest slice
for s, ok := range m {
if ok {
continue
}
diff = append(diff, s)
}
// Sort the resulting slice
sort.Strings(diff)
return diff
}
Then
fmt.Println(diff(a_array, b_array))
will give you
[1 2]
Go playground

Code to generate powerset in Golang gives wrong result

Next code in Golang to generate powerset produces wrong result on input {"A", "B", "C", "D", "E"}. I see [A B C E E] as the last generated set.
package main
import (
"fmt"
)
func main() {
for _, s := range PowerSet([]string{"A", "B", "C", "D", "E"}) {
fmt.Println(s)
}
}
func PowerSet(set []string) [][]string {
var powerSet [][]string
powerSet = append(powerSet, make([]string, 0))
for _, element := range set {
var moreSets [][]string
for _, existingSet := range powerSet {
newSet := append(existingSet, element)
moreSets = append(moreSets, newSet)
}
powerSet = append(powerSet, moreSets...)
}
return powerSet
}
How to fix it? How to write it idiomatically in Go?

The problem with your program is not the algorithm itself but this line:
newSet := append(existingSet, element)
You should not append and assign to a different variable.
As the documentation states (emphasis mine), "The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated.".
So, there might be cases where newSet := append(existingSet, element) will actually modify existingSet itself, which would break your logic.
If you change that to instead create a new array and append to that one, it works as you expect it.
newSet := make([]string, 0)
newSet = append(newSet, existingSet...)
newSet = append(newSet, element)

For instance, you can use algorithm like this one: https://stackoverflow.com/a/2779467/3805062.
func PowerSet(original []string) [][]string {
powerSetSize := int(math.Pow(2, float64(len(original))))
result := make([][]string, 0, powerSetSize)
var index int
for index < powerSetSize {
var subSet []string
for j, elem := range original {
if index& (1 << uint(j)) > 0 {
subSet = append(subSet, elem)
}
}
result = append(result, subSet)
index++
}
return result
}

Elaborating on #eugenioy's answer.
Look at this thread. Here is a working example : https://play.golang.org/p/dzoTk1kimf
func copy_and_append_string(slice []string, elem string) []string {
// wrong: return append(slice, elem)
return append(append([]string(nil), slice...), elem)
}
func PowerSet(s []string) [][]string {
if s == nil {
return nil
}
r := [][]string{[]string{}}
for _, es := range s {
var u [][]string
for _, er := range r {
u = append(u, copy_and_append_string(er, es))
}
r = append(r, u...)
}
return r
}

How to get intersection of two slice in golang?

Is there any efficient way to get intersection of two slices in Go?
I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}
intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter

How do I get the intersection between two arrays as a new array?
Simple Intersection: Compare each element in A to each in B (O(n^2))
Hash Intersection: Put them into a hash table (O(n))
Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))
All of which are implemented here
https://github.com/juliangruber/go-intersect

simple, generic and mutiple slices ! (Go 1.18)
Time Complexity : may be linear
func interSection[T constraints.Ordered](pS ...[]T) []T {
hash := make(map[T]*int) // value, counter
result := make([]T, 0)
for _, slice := range pS {
duplicationHash := make(map[T]bool) // duplication checking for individual slice
for _, value := range slice {
if _, isDup := duplicationHash[value]; !isDup { // is not duplicated in slice
if counter := hash[value]; counter != nil { // is found in hash counter map
if *counter++; *counter >= len(pS) { // is found in every slice
result = append(result, value)
}
} else { // not found in hash counter map
i := 1
hash[value] = &i
}
duplicationHash[value] = true
}
}
}
return result
}
func main() {
slice1 := []string{"foo", "bar", "hello"}
slice2 := []string{"foo", "bar"}
fmt.Println(interSection(slice1, slice2))
// [foo bar]
ints1 := []int{1, 2, 3, 9, 8}
ints2 := []int{10, 4, 2, 4, 8, 9} // have duplicated values
ints3 := []int{2, 4, 8, 1}
fmt.Println(interSection(ints1, ints2, ints3))
// [2 8]
}
playground : https://go.dev/play/p/lE79D0kOznZ

It's a best method for intersection two slice. Time complexity is too low.
Time Complexity : O(m+n)
m = length of first slice.
n = length of second slice.
func intersection(s1, s2 []string) (inter []string) {
hash := make(map[string]bool)
for _, e := range s1 {
hash[e] = true
}
for _, e := range s2 {
// If elements present in the hashmap then append intersection list.
if hash[e] {
inter = append(inter, e)
}
}
//Remove dups from slice.
inter = removeDups(inter)
return
}
//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
encountered := make(map[string]bool)
for _, element := range elements {
if !encountered[element] {
nodups = append(nodups, element)
encountered[element] = true
}
}
return
}

if there exists no blank in your []string, maybe you need this simple code:
func filter(src []string) (res []string) {
for _, s := range src {
newStr := strings.Join(res, " ")
if !strings.Contains(newStr, s) {
res = append(res, s)
}
}
return
}
func intersections(section1, section2 []string) (intersection []string) {
str1 := strings.Join(filter(section1), " ")
for _, s := range filter(section2) {
if strings.Contains(str1, s) {
intersection = append(intersection, s)
}
}
return
}

Try it
https://go.dev/play/p/eGGcyIlZD6y
first := []string{"one", "two", "three", "four"}
second := []string{"two", "four"}
result := intersection(first, second) // or intersection(second, first)
func intersection(first, second []string) []string {
out := []string{}
bucket := map[string]bool{}
for _, i := range first {
for _, j := range second {
if i == j && !bucket[i] {
out = append(out, i)
bucket[i] = true
}
}
}
return out
}

https://github.com/viant/toolbox/blob/a46fd679bbc5d07294b1d1b646aeacd44e2c7d50/collections.go#L869-L920
Another O(m+n) Time Complexity solution that uses a hashmap.
It has two differences compared to the other solutions discussed here.
Passing the target slice as a parameter instead of new slice returned
Faster to use for commonly used types like string/int instead of reflection for all

Yes there are a few different ways to go about it.. Here's an example that can be optimized.
package main
import "fmt"
func intersection(a []string, b []string) (inter []string) {
// interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
low, high := a, b
if len(a) > len(b) {
low = b
high = a
}
done := false
for i, l := range low {
for j, h := range high {
// get future index values
f1 := i + 1
f2 := j + 1
if l == h {
inter = append(inter, h)
if f1 < len(low) && f2 < len(high) {
// if the future values aren't the same then that's the end of the intersection
if low[f1] != high[f2] {
done = true
}
}
// we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
high = high[:j+copy(high[j:], high[j+1:])]
break
}
}
// nothing in the future so we are done
if done {
break
}
}
return
}
func main() {
slice1 := []string{"foo", "bar", "hello", "bar"}
slice2 := []string{"foo", "bar"}
fmt.Printf("%+v\n", intersection(slice1, slice2))
}
Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.
func intersectionString(a []string, b []string) (inter []string),
func intersectionInt(a []int, b []int) (inter []int),
func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect
You can then create your own package and reuse once you settle how you want to implement it.
package intersection
func String(a []string, b []string) (inter []string)
func Int(a []int, b []int) (inter []int)
func Float64(a []Float64, b []Float64) (inter []Float64)

Is there a foreach loop in Go?

Is there a foreach construct in the Go language?
Can I iterate over a slice or array using a for?

From For statements with range clause:
A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel.
For each entry it assigns iteration values to corresponding iteration
variables and then executes the block.
As an example:
for index, element := range someSlice {
// index is the index where we are
// element is the element from someSlice for where we are
}
If you don't care about the index, you can use _:
for _, element := range someSlice {
// element is the element from someSlice for where we are
}
The underscore, _, is the blank identifier, an anonymous placeholder.

Go has a foreach-like syntax. It supports arrays/slices, maps and channels.
Iterate over an array or a slice:
// index and value
for i, v := range slice {}
// index only
for i := range slice {}
// value only
for _, v := range slice {}
Iterate over a map:
// key and value
for key, value := range theMap {}
// key only
for key := range theMap {}
// value only
for _, value := range theMap {}
Iterate over a channel:
for v := range theChan {}
Iterating over a channel is equivalent to receiving from a channel until it is closed:
for {
v, ok := <-theChan
if !ok {
break
}
}

Following is the example code for how to use foreach in Go:
package main
import (
"fmt"
)
func main() {
arrayOne := [3]string{"Apple", "Mango", "Banana"}
for index,element := range arrayOne{
fmt.Println(index)
fmt.Println(element)
}
}
This is a running example https://play.golang.org/p/LXptmH4X_0

Yes, range:
The range form of the for loop iterates over a slice or map.
When ranging over a slice, two values are returned for each iteration. The first is the index, and the second is a copy of the element at that index.
Example:
package main
import "fmt"
var pow = []int{1, 2, 4, 8, 16, 32, 64, 128}
func main() {
for i, v := range pow {
fmt.Printf("2**%d = %d\n", i, v)
}
for i := range pow {
pow[i] = 1 << uint(i) // == 2**i
}
for _, value := range pow {
fmt.Printf("%d\n", value)
}
}
You can skip the index or value by assigning to _.
If you only want the index, drop the , value entirely.

The following example shows how to use the range operator in a for loop to implement a foreach loop.
func PrintXml (out io.Writer, value interface{}) error {
var data []byte
var err error
for _, action := range []func() {
func () { data, err = xml.MarshalIndent(value, "", " ") },
func () { _, err = out.Write([]byte(xml.Header)) },
func () { _, err = out.Write(data) },
func () { _, err = out.Write([]byte("\n")) }} {
action();
if err != nil {
return err
}
}
return nil;
}
The example iterates over an array of functions to unify the error handling for the functions. A complete example is at Google´s playground.
PS: it shows also that hanging braces are a bad idea for the readability of code. Hint: the for condition ends just before the action() call. Obvious, isn't it?

You can in fact use range without referencing its return values by using for range against your type:
arr := make([]uint8, 5)
i,j := 0,0
for range arr {
fmt.Println("Array Loop", i)
i++
}
for range "bytes" {
fmt.Println("String Loop", j)
j++
}
https://play.golang.org/p/XHrHLbJMEd

This may be obvious, but you can inline the array like so:
package main
import (
"fmt"
)
func main() {
for _, element := range [3]string{"a", "b", "c"} {
fmt.Print(element)
}
}
outputs:
abc
https://play.golang.org/p/gkKgF3y5nmt

I'm seeing a lot of examples using range. Just a heads up that range creates a copy of whatever you're iterating over. If you make changes to the contents in a foreach range you will not be changing the values in the original container, in that case you'll need a traditional for loop with an index you increment and deference indexed reference. E.g.:
for i := 0; i < len(arr); i++ {
element := &arr[i]
element.Val = newVal
}

I have just implemented this library: https://github.com/jose78/go-collection.
This is an example of how to use the Foreach loop:
package main
import (
"fmt"
col "github.com/jose78/go-collection/collections"
)
type user struct {
name string
age int
id int
}
func main() {
newList := col.ListType{user{"Alvaro", 6, 1}, user{"Sofia", 3, 2}}
newList = append(newList, user{"Mon", 0, 3})
newList.Foreach(simpleLoop)
if err := newList.Foreach(simpleLoopWithError); err != nil{
fmt.Printf("This error >>> %v <<< was produced", err )
}
}
var simpleLoop col.FnForeachList = func(mapper interface{}, index int) {
fmt.Printf("%d.- item:%v\n", index, mapper)
}
var simpleLoopWithError col.FnForeachList = func(mapper interface{}, index int) {
if index > 1{
panic(fmt.Sprintf("Error produced with index == %d\n", index))
}
fmt.Printf("%d.- item:%v\n", index, mapper)
}
The result of this execution should be:
0.- item:{Alvaro 6 1}
1.- item:{Sofia 3 2}
2.- item:{Mon 0 3}
0.- item:{Alvaro 6 1}
1.- item:{Sofia 3 2}
Recovered in f Error produced with index == 2
ERROR: Error produced with index == 2
This error >>> Error produced with index == 2
<<< was produced
Try this code in playGrounD.

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

How would you implement the deleteRecords function in the code below:
Example:
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords( l *RecordList, ids []int ) {
// Assume the RecordList can contain several 100 entries.
// and the number of the of the records to be removed is about 10.
// What is the fastest and cleanest ways to remove the records that match
// the id specified in the records list.
}

I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:
func deleteRecords(data []*Record, ids []int) []*Record {
w := 0 // write index
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
data[w] = x
w++
}
return data[:w]
}
You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.
func reorder(data []*Record, ids []int) []*Record {
n := len(data)
i := 0
loop:
for i < n {
r := data[i]
for _, id := range ids {
if id == r.id {
data[i] = data[n-1]
n--
continue loop
}
}
i++
}
return data[0:n]
}
As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.
If you wish to preserve original contents of the slice, something like this is more appropriate:
func deletePreserve(data []*Record, ids []int) []*Record {
wdata := make([]*Record, len(data))
w := 0
loop:
for _, x := range data {
for _, id := range ids {
if id == x.id {
continue loop
}
}
wdata[w] = x
w++
}
return wdata[0:w]
}

For a personal project, I did something like this:
func filter(sl []int, fn func(int) bool) []int {
result := make([]int, 0, len(sl))
last := 0
for i, v := range sl {
if fn(v) {
result = append(result, sl[last:i]...)
last = i + 1
}
}
return append(result, sl[last:]...)
}
It doesn't mutate the original, but should be relatively efficient.
It's probably better to just do:
func filter(sl []int, fn func(int) bool) (result []int) {
for _, v := range sl {
if !fn(v) {
result = append(result, v)
}
}
return
}
Simpler and cleaner.
If you want to do it in-place, you probably want something like:
func filter(sl []int, fn func(int) bool) []int {
outi := 0
res := sl
for _, v := range sl {
if !fn(v) {
res[outi] = v
outi++
}
}
return res[0:outi]
}
You can optimize this to use copy to copy ranges of elements, but that's twice
the code and probably not worth it.
So, in this specific case, I'd probably do something like:
func deleteRecords(l []*Record, ids []int) []*Record {
outi := 0
L:
for _, v := range l {
for _, id := range ids {
if v.id == id {
continue L
}
}
l[outi] = v
outi++
}
return l[0:outi]
}
(Note: untested.)
No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.

For the case you described, where len(ids) is approximately 10 and len(*l) is in the several hundreds, this should be relatively fast, since it minimizes memory allocations by updating in place.
package main
import (
"fmt"
"strconv"
)
type Record struct {
id int
name string
}
type RecordList []*Record
func deleteRecords(l *RecordList, ids []int) {
rl := *l
for i := 0; i < len(rl); i++ {
rid := rl[i].id
for j := 0; j < len(ids); j++ {
if rid == ids[j] {
copy(rl[i:len(*l)-1], rl[i+1:])
rl[len(rl)-1] = nil
rl = rl[:len(rl)-1]
break
}
}
}
*l = rl
}
func main() {
l := make(RecordList, 777)
for i := range l {
l[i] = &Record{int(i), "name #" + strconv.Itoa(i)}
}
ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)}
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
deleteRecords(&l, ids)
fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
}
Output:
[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776}
[0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}

Instead of repeatedly searching ids, you could use a map. This code preallocates the full size of the map, and then just moves array elements in place. There are no other allocations.
func deleteRecords(l *RecordList, ids []int) {
m := make(map[int]bool, len(ids))
for _, id := range ids {
m[id] = true
}
s, x := *l, 0
for _, r := range s {
if !m[r.id] {
s[x] = r
x++
}
}
*l = s[0:x]
}

Use the vector package's Delete method as a guide, or just use a Vector instead of a slice.

Here is one option but I would hope there are cleaner/faster more functional looking ones:
func deleteRecords( l *RecordList, ids []int ) *RecordList {
var newList RecordList
for _, rec := range l {
toRemove := false
for _, id := range ids {
if rec.id == id {
toRemove = true
}
if !toRemove {
newList = append(newList, rec)
}
}
return newList
}

With large enough l and ids it will be more effective to Sort() both lists first and then do a single loop over them instead of two nested loops

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Checking if map has duplicate values in Go - go

Related

how to delete Duplicate elements between slices on golang

Code to generate powerset in Golang gives wrong result

How to get intersection of two slice in golang?

Is there a foreach loop in Go?

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

Categories

Resources