I take the full realization of priority queue from golang docs. I'm interesting in removing several elements at once like heap.Remove(&queue, index1, index2, ...).
Now it can be done in the straightforward way:
for _, event := range events {
heap.Remove(&queue, event.getIndex())
}
But this method has an overhead because every call to heap.Remove reorganizes tree. It seems more efficient if we can remove all unnecessary elements firstly and only then reorganize tree.
How it can be implemented?
Since the underlying data structure of your heap is a slice, you can remove the elements directly from the slice and re-initialize the heap again after.
Starting from your example:
for _, event := range events {
i := event.GetIndex()
queue[i], queue[len(queue)-1] = queue[len(queue)-1], queue[i]
queue = queue[:len(queue)-1]
}
heap.Init(&queue)
And a working example: https://play.golang.org/p/-KMEilCm3t9
func main() {
h := IntHeap{1, 5, 2, 9, 8, 3, 7}
toRemove := 8
for i := 0; i < len(h); i++ {
n := h[i]
if n == toRemove {
h[i], h[len(h)-1] = h[len(h)-1], h[i]
h = h[:len(h)-1]
i--
}
}
heap.Init(&h)
fmt.Println(h)
}
In order to answer this question we first need to understand what a heap is. Heaps are a data structure that allow us to find the largest or smallest value depending on whether it is a Min Heap or a Max Heap. To do this quickly, the computer maintains a tree, this image sums it up quite well. This is a max heap:
Of course computer's memory aren't laid out in trees, they are laid out linearly. In fact, go stores heaps in slices, which means that you can iterate through the slice and remove elements like you'd normally do, for example:
for i:=0; i<len(heap); i++ {
for _, element := heap {
if element == to_remove {
heap = append(heap[:i], heap[i+1:])
i--
}
}
}
Related
I have the following function that generateс all subsets of a given array.
The idea is simple - I start with a results array that contains an empty set (slice) and for each element in the input array nums go over all previously generated sets, add the current element of nums to them and add the resulting new sets back to the results array. Nothing particularly interesting.
func subsets(nums []int) [][]int {
result := [][]int{{}}
for _, n := range nums {
newSets := [][]int{}
for _, set := range result {
newSets = append(newSets, append(set, n))
}
result = append(result, newSets...)
}
return result
}
The problem is that using append(newSets, append(set, n)) corrupts the result slice, of which set is a member. I modified the function a bit with some debug code (see below) and also found a workaround (the commented code) which doesn't cause the same behavior.
I very much suspect that this is caused by something that's passed by reference instead of being copied (I am appending the elements of newSets to result). The problem is that I can't find it. :( I never change the result within a loop that iterates over it. I also work with new instances of newSets for each loop. So I'm not sure what's causing it. Please advise. :)
func subsets(nums []int) [][]int {
result := [][]int{{}}
for _, n := range nums {
newSets := [][]int{}
var before, after []int
for _, set := range result {
lastResultIdx := len(result)-1
if lastResultIdx > 0 {
before = make([]int, len(result[lastResultIdx]))
copy(before, result[lastResultIdx])
}
//ns := []int{}
//for _,v := range set {
// ns = append(ns, v)
//}
//ns = append(ns, n)
//newSets = append(newSets, ns)
newSets = append(newSets, append(set, n))
if lastResultIdx > 0 {
after = result[lastResultIdx]
if before[len(before)-1]!=after[len(after)-1] {
fmt.Println(n, "before", before, "after", after)
}
}
}
result = append(result, newSets...)
}
return result
}
func main() {
subsets([]int{0, 1, 2, 3, 4})
}
The problem is here:
append(newSets, append(set, n))
The problem is not that it is a nested append. The problem is that you're assuming append(set,n) will return a new slice. That is not always the case. A slice is a view on an array, and when you add new elements to the slice, if the addition did not result in reallocation of the array, the returned slice is the same slice you passed in, with len field incremented. So when you're going through your results array, you're modifying the elements that are already there, and at the same time, adding them again as if they are different results.
To solve, when you get an element of the result, create a new slice, copy elements of the result to it, append the new element and then add the new slice to result.
The problem is simple enough: append takes a slice argument—[]T for some type T—plus of course the element(s) to append, and returns a []T result. But []T, if non-nil, consists of two parts: a slice header that points to some backing array and carries a current length and capacity, plus the backing array. When append does its job, it has a choice:
modify the backing array in place, and return a new slice header that re-uses the existing backing array, or
create a new backing array, copy the original values to the new backing array, and return a new slice header that uses the new backing array.
Whenever append copies the backing array, your code works. Whenever it re-uses the backing array, your code may or may not work, depending on whether some other slice header is using the same backing array.
Suppose your backing array has length 5 for instance, and one of the existing slice headers reads "length 1, capacity 5" with element 0 of the backing array holding zero. That is, the existing slice header h contains [0]. Now you call append(h, 1). The append operation re-uses the backing array and puts 1 in the second element and returns a new slice header h1 that contains [0, 1]. Now you take h again, append 2, and make a two-element slice h2 holding [0, 2]. But this re-uses the same backing array that h1 re-used so now h1 also holds [0, 2].
To solve the problem without modifying your algorithm much, you need either:
a variant of append that always copies, or
a variant of append one int to a slice of ints that always copies.
The latter is simpler:
func setPlusInt(set []int, n int) []int {
return append(append([]int(nil), set...), n)
}
which lets you replace one line of your existing code.
(I made one other trivial change here and added enough to provide a working example in the Go Playground.)
(An alternate solution is to set up each of your own slice headers to offer no extra capacity, so that append must always copy. I have not illustrated this method.)
I began to learn the language of GO and I do not quite understand something, maybe I'm just confused and tired.
Here is my code, there is an array of result (from encoded strings, size 2139614 elements). I need to decode them and use them further. But when I run an iteration, the resultrips is twice as large and the first half is completely empty. Therefore, I make a slice and add to it the desired range.
Why it happens?
It might be easier to decode the result immediately and re-record it, but I don’t know how to do it, well)))
maybe there is a completely different way and as a beginner I don’t know it yet
result := []string{}
for i, _ := range input {
result = append(result, i)
}
sort.Strings(result)
rips := make([][]byte, 2139614)
for _, i := range result {
c := Decode(i)
c = c[1:37]
rips = append(rips, c)
}
//len(result) == 2139614
for i := 2139610; i < 2139700; i++ {
fmt.Println(i, rips[i])
}
resultrips := rips[2139614:]
for _,i := range resultrips {
fmt.Println(i)
}
fmt.Println("All write: ", len(resultrips))
and this question: I do it right if I need an array of byte arrays (I do it so as not to do too much work and will check the values in bytes, because there is no any coding) ???
rips := make([][]byte, 2139614) //array []byte
in the end, I need an array of the type of the set in C ++ to check if there is an element in my set
in C ++ it was code:
if (resultrips.count > 0) { ... }
When you write:
make([][]byte, 2139614)
This creates a slice with length and capacity equal to 2139614. When you append to a slice, it always appends after the last element, thereby increasing the length. If you want to pre-allocate a large slice so that you can append into it, you want to specify a length of 0:
make([][]byte, 0, 2139614)
This pre-allocates 2139614 elements, but with a length of 0, subsequent append calls will start at the beginning of the slice; after the first append it will have a length of 1, and it will not need to have increased its capacity.
Length vs capacity is covered in the Tour of Go: https://tour.golang.org/moretypes/13
A quick note based on the text of your question - remember that slices and arrays are not the same thing. Arrays have a compile-time fixed length and their capacity is synonymous with their length. Slices are backed by arrays but have runtime dynamic independent length and capacity.
I'm having some issues optimizing a go map.
I want to generate a frequency table (count distinct occurrences) in an array of strings. My code holds nicely for small arrays, but as I start working with 100k+ structures -with many distinct values- it just isn't performant enough.
Right now, my approach is to generate an array with the distinct values, compare values and increasing the counter variable (mapped to the string).
counter := make( map[string]int )
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
counter[distinct]++
}
}
}
I've tried another approach, where with the input array previously sorted (to minimize the number of changes to the map). This is a bit faster.
count:=0
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
count++
}
}
counter[distinct] += count
count= 0
}
Do you have any suggestion of what I can do to optimize a simple count(distinct) type problem...? I'm open to anything.
thanks!
Without more context, I would dump the separate array of distinct values - generating it takes time, and using it necessitates the nested loop. Assuming there's no other purpose to the second array, I'd use something like:
counter := make( map[string]int )
for _, row := range StrArray {
counter[row]++
}
If you need the list of distinct strings without the counts for some separate purpose, you can easily get it afterward:
distinctStrings := make([]string, len(counter))
i := 0
for k := range counter {
distinctStrings[i] = k
i++
}
Iterating the array of distinct strings is O(n), while map access by key is O(log(n)). That takes your overall from O(n^2) to O(n*log(n)), which should be a significant improvement with larger datasets. But, as with any optimization: test, measure, analyze, optimize.
What is the best way to check whether a certain value is in a string slice? I would use a Set in other languages, but Go doesn't have one.
My best try is this so far:
package main
import "fmt"
func main() {
list := []string{"a", "b", "x"}
fmt.Println(isValueInList("b", list))
fmt.Println(isValueInList("z", list))
}
func isValueInList(value string, list []string) bool {
for _, v := range list {
if v == value {
return true
}
}
return false
}
http://play.golang.org/p/gkwMz5j09n
This solution should be ok for small slices, but what to do for slices with many elements?
If you have a slice of strings in an arbitrary order, finding if a value exists in the slice requires O(n) time. This applies to all languages.
If you intend to do a search over and over again, you can use other data structures to make lookups faster. However, building these structures require at least O(n) time. So you will only get benefits if you do lookups using the data structure more than once.
For example, you could load your strings into a map. Then lookups would take O(1) time. Insertions also take O(1) time making the initial build take O(n) time:
set := make(map[string]bool)
for _, v := range list {
set[v] = true
}
fmt.Println(set["b"])
You can also sort your string slice and then do a binary search. Binary searches occur in O(log(n)) time. Building can take O(n*log(n)) time.
sort.Strings(list)
i := sort.SearchStrings(list, "b")
fmt.Println(i < len(list) && list[i] == "b")
Although in theory given an infinite number of values, a map is faster, in practice it is very likely searching a sorted list will be faster. You need to benchmark it yourself.
To replace sets you should use a map[string]struct{}. This is efficient and considered idiomatic, the "values" take absolutely no space.
Initialize the set:
set := make(map[string]struct{})
Put an item :
set["item"]=struct{}{}
Check whether an item is present:
_, isPresent := set["item"]
Remove an item:
delete(set, "item")
You can use a map, and have the value e.g. a bool
m := map[string] bool {"a":true, "b":true, "x":true}
if m["a"] { // will be false if "a" is not in the map
//it was in the map
}
There's also the sort package, so you could sort and binary search your slices
I am new to algorithms so please forgive me if this sounds basic or stupid.
I want to know this : instead of adding data into some kind of list and then performing a sort on the list, is there a method (data structure+algorithm) that lets me sort the data at the time of adding itself, or to put it another way, inserts the data in its proper place?
eg: if I want to add '3' to {1,5,6}, instead of adding it at the start or end and then sorting the list, I want '3' to go after '1' "directly".
thanks
If you use a binary search tree instead of an array, the sorting would happen "automatically", because it's already done by the insert method of the nodes. So a binary tree is always sorted, and it's easy to traverse. The only problem is that when you have already (more or less) sorted data, the tree becomes inbalanced (which is where red-black-trees and other variations come into play).
You want to maintain a sorted array at all times, so you shall find a correct place in sequence for every new element you want to add to the array. This can be done efficiently (O(logn) complexity) by utilizing a modified binary search algorithm.
There are basically two different methods to insert a value in a list, which you use depend a bit on what kind of list you are using:
Use binary search to locate where the value should be inserted, and insert the value there.
Loop from the end of the list, moving all higher values one step up, and put the value in the gap before the lowest higher value.
The first method would typically be used if you are using a binary tree or a linked list, where you don't have to move items in the list to do the insert.
Yes but that's usually slower than adding all the data and sorting it afterwards because to insert the data as it is added, you need to traverse the list every time you add an element.
With binary search, you need not look at every element but even then, you often need to get more elements from the list as when you sort afterwards.
So from a performance view, sorted insert is inferior to post sorting.
Here is a golang code that sorts inputs on the fly
What I am doing here is that I am determining the position of where possibly the input will fit through binary search and then partitioning the already sorted array to fit the element, appending to part one and then rejoining the two parts.
time complexity = Log N(to determine the position) + 3N (to create slices and append for each input)
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
reader := bufio.NewReader(os.Stdin)
var a []int
for {
fmt.Print("Please enter a value:")
text := readLine(reader)
key, _ := strconv.ParseInt(text, 10, 0)
pos := binarySearch(a, 0, len(a)-1, int(key))
p1 := append([]int{}, a[:pos]...)
p2 := a[pos:]
p1 = append(p1, int(key))
p1 = append(p1, p2...)
a = p1
fmt.Println(a)
}
}
func binarySearch(a []int, low int, high int, key int) int {
var result int
if high == -1 {
return 0
} else if key >= a[high] {
return high + 1
} else if key <= a[low] {
return low
}
mid := (high + low) / 2
if a[mid] == key {
result = mid
} else if key < a[mid] {
return binarySearch(a, low, mid-1, key)
} else if key > a[mid] {
return binarySearch(a, mid+1, high, key)
}
return result
}
func readLine(reader *bufio.Reader) string {
text, err := reader.ReadString('\n')
if err != nil {
fmt.Println(err)
}
text = strings.TrimRight(text, "\n")
return text
}