How to find ip in range in very large struct - go

I have a struct like below, with about 100k entires.
I would like to loop over it and check if a ip address is in range.
My current code:
type Users struct {
Id string
Descr string
IpStart string
IpEnd string
}
var users []*Users
func LookUpIP(IpAddress string) (string, string) {
iptocheck := net.ParseIP(IpAddress)
for _, elem := range users {
if bytes.Compare(iptocheck, elem.IpStart) >= 0 && bytes.Compare(iptocheck, elem.IpEnd) <= 0 {
fmt.Printf("%v is between %v and %v\n", IpAddress, elem.IpStart, elem.IpEnd)
return elem.Id, elem.Descr
}
}
return "0", "null"
}
The above works fine with about 40k entires but over that it gets slow. Is there any faster way to find out if a ip address is in range inside my struct?
Update: Now only parsing IP once and storing it as number in struct

There are two simple steps I see.
Do the parsing once and store the IP address as a single number.
Order the ranges by start of the range and use binary search.

As a completion of #Grzegorz Żur suggestion to use binary search for reducing the search time, here is a binary search implementation in go.
But first what is binary search? A binary search divides a range of values into halves, and continues to narrow down the field of search until the unknown value is found. It is the classic example of a "divide and conquer" algorithm.
The algorithm returns the index of some element that equals the given value (if there are multiple such elements, it returns some arbitrary one). It is also possible, when the element is not found, to return the "insertion point" for it (the index that the value would have if it were inserted into the array).
The recursive method
func binarySearch(a []float64, value float64, low int, high int) int {
if high < low {
return -1
}
mid := (low + high) / 2 // calculate the mean of two values
if a[mid] > value {
return binarySearch(a, value, low, mid-1)
} else if a[mid] < value {
return binarySearch(a, value, mid+1, high)
}
return mid
}
The iterative method
func binarySearch(a []float64, value float64) int {
low := 0
high := len(a) - 1
for low <= high {
mid := (low + high) / 2
if a[mid] > value {
high = mid - 1
} else if a[mid] < value {
low = mid + 1
} else {
return mid
}
}
return -1
}
But if you take a look into the sort package you can observe that there is an already implemented sorting algorithm based on binary search. So probably this would be the best option.

Related

Golang maps/hashmaps, optimizing for iteration speed

When migrating a production NodeJS application to Golang I've noticed that iteration of GO's native Map is actually slower than Node.
I've come up with an alternative solution that sacrifices removal/insertion speed with iteration speed instead, by exposing an array that can be iterated over and storing key=>index pairs inside a separate map.
While this solution works, and has a significant performance increase, I was wondering if there is a better solution to this that I could look into.
The setup I have is that its very rare something is removed from the hashmaps, only additions and replacements are common for which this implementation 'works', albeit feels like a workaround more than an actual solution.
The maps are always indexed by an integer, holding arbitrary data.
FastMap: 500000 Iterations - 0.153000ms
Native Map: 500000 Iterations - 4.988000ms
/*
Unordered hash map optimized for iteration speed.
Stores values in an array and holds key=>index mappings inside a separate hashmap
*/
type FastMapEntry[K comparable, T any] struct {
Key K
Value T
}
type FastMap[K comparable, T any] struct {
m map[K]int // Stores key => array index mappings
entries []FastMapEntry[K, T] // Array holding entries and their keys
len int // Total map size
}
func MakeFastMap[K comparable, T any]() *FastMap[K, T] {
return &FastMap[K, T]{
m: make(map[K]int),
entries: make([]FastMapEntry[K, T], 0),
}
}
func (m *FastMap[K, T]) Set(key K, value T) {
index, exists := m.m[key]
if exists {
// Replace if key already exists
m.entries[index] = FastMapEntry[K, T]{
Key: key,
Value: value,
}
} else {
// Store the key=>index pair in the map and add value to entries. Increase total len by one
m.m[key] = m.len
m.entries = append(m.entries, FastMapEntry[K, T]{
Key: key,
Value: value,
})
m.len++
}
}
func (m *FastMap[K, T]) Has(key K) bool {
_, exists := m.m[key]
return exists
}
func (m *FastMap[K, T]) Get(key K) (value T, found bool) {
index, exists := m.m[key]
if exists {
found = true
value = m.entries[index].Value
}
return
}
func (m *FastMap[K, T]) Remove(key K) bool {
index, exists := m.m[key]
if exists {
// Remove value from entries
m.entries = append(m.entries[:index], m.entries[index+1:]...)
// Remove key=>index mapping
delete(m.m, key)
m.len--
for i := index; i < m.len; i++ {
// Move all index mappings up, starting from current index
m.m[m.entries[i].Key] = i
}
}
return exists
}
func (m *FastMap[K, T]) Entries() []FastMapEntry[K, T] {
return m.entries
}
func (m *FastMap[K, T]) Len() int {
return m.len
}
The test code that was ran is:
// s.Variations is a native map holding ~500k records
start := time.Now()
iterations := 0
for _, variation := range s.Variations {
if variation.Id > 0 {
}
iterations++
}
log.Printf("Native Map: %d Iterations - %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)
// Copy data into FastMap
fm := helpers.MakeFastMap[state.VariationId, models.ItemVariation]()
for key, variation := range s.Variations {
fm.Set(key, variation)
}
start = time.Now()
iterations = 0
for _, variation := range fm.Entries() {
if variation.Value.Id > 0 {
}
iterations++
}
log.Printf("FastMap: %d Iterations - %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)
I think this kind of comparison and benchmarking is a little off-topic. Go implementation of map is quite different from your implementation, basically because it needs to cover a wider area of entries, the structs used in compile time are actually kind of heavy (not so much though, they basically store some information about the types you use in your map and so on), and the implementation approach is different! Go implementation of map is basically a hashmap (yours is not obviously, or it is, but the actual hashing implementation is delegated to the m map you hold internally).
One of the other factors makes you get this result is, if you take a look at this:
for _, variation := range fm.Entries() {
if variation.Value.Id > 0 {
}
iterations++
}
Basically, you're iterating over a slice, which is much easier and faster to iterate rather than a map, you have a view to an array, which holds elements of the same types next to each other, makes sense, right?
What you should do to make a better comparison would be something like this:
for _, y := range fastMap.m {
_ = fastMap.Entries()[y].Value + 1 // some simple calculation
}
If you're really looking for performance, a well written hash function and a fixed size array would be your best choice.

Golang Binary search

I'm practicing an interview algorithm, now coding it in Go. The purpose is to practice basic interview algorithms, and my skills in Go. I'm trying to perform a Binary search of an array of numbers.
package main
import "fmt"
func main() {
searchField := []int{2, 5, 8, 12, 16, 23, 38, 56, 72, 91}
searchNumber := 23
fmt.Println("Running Program")
fmt.Println("Searching list of numbers: ", searchField)
fmt.Println("Searching for number: ", searchNumber)
numFound := false
//searchCount not working. Belongs in second returned field
result, _ := binarySearch2(searchField, len(searchField), searchNumber, numFound)
fmt.Println("Found! Your number is found in position: ", result)
//fmt.Println("Your search required ", searchCount, " cycles with the Binary method.")
}
func binarySearch2(a []int, field int, search int, numFound bool) (result int, searchCount int) {
//searchCount removed for now.
searchCount, i := 0, 0
for !numFound {
searchCount++
mid := i + (field-i)/2
if search == a[mid] {
numFound = true
result = mid
return result, searchCount
} else if search > a[mid] {
field++
//i = mid + 1 causes a stack overflow
return binarySearch2(a, field, search, numFound)
}
field = mid
return binarySearch2(a, field, search, numFound)
}
return result, searchCount
}
The main problems I'm coming across are:
1) When the number is higher in the list than my mid search, am I truly continuing a binary search, or has it turned to a sequential? How can I fix that? The other option I've placed has been commented out because it causes a stack overflow.
2) I wanted to add a step count to see how many steps it takes to finish the search. Something to use with other search methods as well. If I print out the search count as is, it always reads one. Is that because I need to return it (and therefore call for it in the header) in the method?
I understand Go has methods that streamline this process. I'm trying to increase my knowledge and coding skills. I appreciate your input.
You're not doing a binary search properly. First off, your for loop is useless, since each branch in the conditional tree has a return statement in it, so it can never run more than one iteration. It looks like you started to code it iteratively, then swapped to a recursive setup, but only kinda halfway converted it.
The idea of a binary search is that you have a high and low index and search the midway point between them. You're not doing that, you're just incrementing the field variable and trying again (which will cause you to search each index twice until you find the item or segfault by running past the end of the list). In Go, though, you don't need to keep track of the high and low indexes, as you can simply subslice the search field as appropriate.
Here's a more elegant recursive version:
func binarySearch(a []int, search int) (result int, searchCount int) {
mid := len(a) / 2
switch {
case len(a) == 0:
result = -1 // not found
case a[mid] > search:
result, searchCount = binarySearch(a[:mid], search)
case a[mid] < search:
result, searchCount = binarySearch(a[mid+1:], search)
if result >= 0 { // if anything but the -1 "not found" result
result += mid + 1
}
default: // a[mid] == search
result = mid // found
}
searchCount++
return
}
https://play.golang.org/p/UyZ3-14VGB9
func BinarySearch(a []int, x int) int {
r := -1 // not found
start := 0
end := len(a) - 1
for start <= end {
mid := (start + end) / 2
if a[mid] == x {
r = mid // found
break
} else if a[mid] < x {
start = mid + 1
} else if a[mid] > x {
end = mid - 1
}
}
return r
}
Off-topic, but might help others looking for a simple binary search who could land here.
There's a generic binary search module on github since the standard library doesn't offer this common functionality: https://github.com/bbp-brieuc/binarysearch
func BinarySearch(array []int, target int) int {
startIndex := 0
endIndex := len(array) - 1
midIndex := len(array) / 2
for startIndex <= endIndex {
value := array[midIndex]
if value == target {
return midIndex
}
if value > target {
endIndex = midIndex - 1
midIndex = (startIndex + endIndex) / 2
continue
}
startIndex = midIndex + 1
midIndex = (startIndex + endIndex) / 2
}
return -1
}
generic type version ! (go 1.18)
Time Complexity : log2(n)+1
package main
import "golang.org/x/exp/constraints"
func BinarySearch[T constraints.Ordered](a []T, x T) int {
start, mid, end := 0, 0, len(a)-1
for start <= end {
mid = (start + end) >> 1
switch {
case a[mid] > x:
end = mid - 1
case a[mid] < x:
start = mid + 1
default:
return mid
}
}
return -1
}
full version with iteration counter at playground.
func search(nums []int, target, lo, hi int) int {
if(lo > hi) {
return -1
}
mid := lo + (hi -lo) /2
if(nums[mid]< target){
return search2(nums, target,mid+1, hi)
}
if (nums[mid]> target){
return search2(nums, target,lo, mid -1)
}
return mid
}
https://www.youtube.com/watch?v=kNkeJ3ZtgJA

Find the most frequent character in text

I need to implement a package with interface with methods that take text file and performs analysis on it - counts the total amount of characters and finds the most frequent symbol and word. To find the most frequent character I loop through each rune in the text, convert it to string and append it as a key to map. The value is an incremented counter which counts how often this character occurs in the given text. Now I'm stuck a little with the following problem -- I can't figure out how to get the key with the highest value in my map. Here's the code:
package textscanner
import (
"fmt"
"log"
"io/ioutil"
"unicode/utf8"
"strconv"
)
// Initializing my scanner
type Scanner interface {
countChar(text string) int
frequentSym(text string) // Return value is not yet implemented
Scan()
Run()
}
/* method counting characters */
func countChar(sc Scanner, text string) int { ... }
func frequentSym(sc Scanner, text string) {
// Make a map with string key and integer value
symbols := make(map[string] int)
// Iterate through each char in text
for _, sym := range text {
// Convert rune to string
char := strconv.QuoteRune(sym)
// Set this string as a key in map and assign a counter value
count := symbols[char]
if count == symbols[char] {
// increment the value
symbols[char] = count + 1
} else {
symbols[char] = 1
}
}
}
So, basically I need to find a pair with the highest int value and return a string key that corresponds to it, that is the most frequent character in text
Just iterate over the map:
maxK := ""
maxV := 0
for k, v := range symbols {
if v > maxV {
maxV = v
maxK = k
}
}
// maxK is the key with the maximum value.
Expanding on #Ainar-G answer, if there is a possibility that your map could contain multiple keys that occur the same number of times, then #Ainar-G code could return different results every time because Go maps are inherently unordered; in other words, the first key in your map to have a value higher then all previous values becomes the highest key, but you don't always know whether that value will occur first in the map. See this as an example.
In order for the code to be deterministic, you will need to address the case where two keys have the same value. A simple implementation would be to do a string comparison if the value is the same.
maxK := ""
maxV := 0
for k, v := range symbols {
if v > maxV || (v == maxV && k < maxK) {
maxV = v
maxK = k
}
}

Golang: Find two number index where the sum of these two numbers equals to target number

The problem is: find the index of two numbers that nums[index1] + nums[index2] == target. Here is my attempt in golang (index starts from 1):
package main
import (
"fmt"
)
var nums = []int{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 25182, 25184, 25186, 25188, 25190, 25192, 25194, 25196} // The number list is too long, I put the whole numbers in a gist: https://gist.github.com/nickleeh/8eedb39e008da8b47864
var target int = 16021
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return 0, 0
}
hdict := make(map[int]int)
for i := 1; i < len(nums); i++ {
if val, ok := hdict[nums[i+1]]; ok {
return val, i + 1
} else {
hdict[target-nums[i+1]] = i + 1
}
}
return 0, 0
}
func main() {
fmt.Println(twoSum(nums, target))
}
The nums list is too long, I put it into a gist:
https://gist.github.com/nickleeh/8eedb39e008da8b47864
This code works fine, but I find the return 0,0 part is ugly, and it runs ten times slower than the Julia translation. I would like to know is there any part that is written terrible and affect the performance?
Edit:
Julia's translation:
function two_sum(nums, target)
if length(nums) <= 1
return false
end
hdict = Dict()
for i in 1:length(nums)
if haskey(hdict, nums[i])
return [hdict[nums[i]], i]
else
hdict[target - nums[i]] = i
end
end
end
In my opinion if no elements found adding up to target, best would be to return values which are invalid indices, e.g. -1. Although returning 0, 0 would be enough as a valid index pair can't be 2 equal indices, this is more convenient (because if you forget to check the return values and you attempt to use the invalid indices, you will immediately get a run-time panic, alerting you not to forget checking the validity of the return values). As so, in my solutions I will get rid of that i + 1 shifts as it makes no sense.
Benchmarking of different solutions can be found at the end of the answer.
If sorting allowed:
If the slice is big and not changing, and you have to call this twoSum() function many times, the most efficient solution would be to sort the numbers simply using sort.Ints() in advance:
sort.Ints(nums)
And then you don't have to build a map, you can use binary search implemented in sort.SearchInts():
func twoSumSorted(nums []int, target int) (int, int) {
for i, v := range nums {
v2 := target - v
if j := sort.SearchInts(nums, v2); v2 == nums[j] {
return i, j
}
}
return -1, -1
}
Note: Note that after sorting, the indices returned will be indices of values in the sorted slice. This may differ from indices in the original (unsorted) slice (which may or may not be a problem). If you do need indices from the original order (original, unsorted slice), you may store sorted and unsorted index mapping so you can get what the original index is. For details see this question:
Get the indices of the array after sorting in golang
If sorting is not allowed:
Here is your solution getting rid of that i + 1 shifts as it makes no sense. Slice and array indices are zero based in all languages. Also utilizing for ... range:
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int)
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
If the nums slice is big and the solution is not found fast (meaning the i index grows big) that means a lot of elements will be added to the map. Maps start with small capacity, and they are internally grown if additional space is required to host many elements (key-value pairs). An internal growing requires rehashing and rebuilding with the already added elements. This is "very" expensive.
It does not seem significant but it really is. Since you know the max elements that will end up in the map (worst case is len(nums)), you can create a map with a big-enough capacity to hold all elements for the worst case. The gain will be that no internal growing and rehashing will be required. You can provide the initial capacity as the second argument to make() when creating the map. This speeds up twoSum2() big time if nums is big:
func twoSum2(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int, len(nums))
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
Benchmarking
Here's a little benchmarking code to test execution speed of the 3 solutions with the input nums and target you provided. Note that in order to test twoSumSorted(), you first have to sort the nums slice.
Save this into a file named xx_test.go and run it with go test -bench .:
package main
import (
"sort"
"testing"
)
func BenchmarkTwoSum(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum(nums, target)
}
}
func BenchmarkTwoSum2(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum2(nums, target)
}
}
func BenchmarkTwoSumSorted(b *testing.B) {
sort.Ints(nums)
b.ResetTimer()
for i := 0; i < b.N; i++ {
twoSumSorted(nums, target)
}
}
Output:
BenchmarkTwoSum-4 1000 1405542 ns/op
BenchmarkTwoSum2-4 2000 722661 ns/op
BenchmarkTwoSumSorted-4 10000000 133 ns/op
As you can see, making a map with big enough capacity speeds up: it runs twice as fast.
And as mentioned, if nums can be sorted in advance, that is ~10,000 times faster!
If nums is always sorted, you can do a binary search to see if the complement to whichever number you're on is also in the slice.
func binary(haystack []int, needle, startsAt int) int {
pivot := len(haystack) / 2
switch {
case haystack[pivot] == needle:
return pivot + startsAt
case len(haystack) <= 1:
return -1
case needle > haystack[pivot]:
return binary(haystack[pivot+1:], needle, startsAt+pivot+1)
case needle < haystack[pivot]:
return binary(haystack[:pivot], needle, startsAt)
}
return -1 // code can never fall off here, but the compiler complains
// if you don't have any returns out of conditionals.
}
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := binary(nums, adjusted, 0); j != -1 {
return i, j
}
}
return 0, 0
}
playground example
Or you can use sort.SearchInts which implements binary searching.
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := sort.SearchInts(nums, adjusted); nums[j] == adjusted {
// sort.SearchInts returns the index where the searched number
// would be if it was there. If it's not, then nums[j] != adjusted.
return i, j
}
}
return 0, 0
}

sort.SearchInts works strangely or I'm missing something

Consider the following example:
package main
import (
"fmt"
"sort"
)
func main() {
var n int
var a sort.IntSlice
a = append(a, 23)
a = append(a, 3)
a = append(a, 10)
sort.Sort(a)
fmt.Println(a)
n = sort.SearchInts(a, 1)
fmt.Println(n)
n = sort.SearchInts(a, 3)
fmt.Println(n)
}
http://play.golang.org/p/wo4r43Zghv
And the result is:
[3 10 23]
0
0
How am I supposed to know if the number is present in the slice or not, when both the first element and the nonexistent element return 0 as index?
Update
Note that the index can also be greater than the length of the slice so the proper way to find if an element is present in the slice is:
num := 1
n = sort.SearchInts(a, num)
if n < len(a) && a[n] == num {
// found
}
It may seem a functionally strange feature but it's documented :
For instance, given a slice data sorted in ascending order, the call
Search(len(data), func(i int) bool { return data[i] >= 23 }) returns
the smallest index i such that data[i] >= 23.
The obvious solution is also documented :
If the caller wants to
find whether 23 is in the slice, it must test data[i] == 23
separately.
Check whether the number you were looking for actually lives at the returned index. The binary search routines in sort are supposed to find the index where a value can be inserted if it's not already present.
E.g. a search for 100 in the slice in your example will return 3 because the value must be appended to the slice.
the call Search(len(data), func(i int) bool { return data[i] >= 23 }) returns the smallest index i such that data[i] >= 23. If the caller wants to find whether 23 is in the slice, it must test data[i] == 23 separately.
SearchInts searches for x in a sorted slice of ints and returns the index as specified by Search. SearchInts calls Search with a function:
func(i int) bool { return a[i] >= x }
Quoting from the Search doc:
Search uses binary search to find and return the smallest index i in [0, n) at which f(i) is true, assuming that on the range [0, n),
f(i) == true implies f(i+1) == true. That is, Search requires that f
is false for some (possibly empty) prefix of the input range [0, n)
and then true for the (possibly empty) remainder; Search returns the
first true index. If there is no such index, Search returns n. Search
calls f(i) only for i in the range [0, n).
So basically you'll get back an index where the number you searched for should be inserted so the slice remains sorted.
Just check if at the returned index the number in the slice is identical to the one you searched for.
This snippet is from the documentation:
x := 23
i := sort.Search(len(data), func(i int) bool { return data[i] >= x })
if i < len(data) && data[i] == x {
// x is present at data[i]
} else {
// x is not present in data,
// but i is the index where it would be inserted.
}
http://golang.org/pkg/sort/#Search
So you have to check i < len(data) && data[i] == x like you suggest.

Resources