Find the most frequent character in text - go

I need to implement a package with interface with methods that take text file and performs analysis on it - counts the total amount of characters and finds the most frequent symbol and word. To find the most frequent character I loop through each rune in the text, convert it to string and append it as a key to map. The value is an incremented counter which counts how often this character occurs in the given text. Now I'm stuck a little with the following problem -- I can't figure out how to get the key with the highest value in my map. Here's the code:
package textscanner
import (
"fmt"
"log"
"io/ioutil"
"unicode/utf8"
"strconv"
)
// Initializing my scanner
type Scanner interface {
countChar(text string) int
frequentSym(text string) // Return value is not yet implemented
Scan()
Run()
}
/* method counting characters */
func countChar(sc Scanner, text string) int { ... }
func frequentSym(sc Scanner, text string) {
// Make a map with string key and integer value
symbols := make(map[string] int)
// Iterate through each char in text
for _, sym := range text {
// Convert rune to string
char := strconv.QuoteRune(sym)
// Set this string as a key in map and assign a counter value
count := symbols[char]
if count == symbols[char] {
// increment the value
symbols[char] = count + 1
} else {
symbols[char] = 1
}
}
}
So, basically I need to find a pair with the highest int value and return a string key that corresponds to it, that is the most frequent character in text

Just iterate over the map:
maxK := ""
maxV := 0
for k, v := range symbols {
if v > maxV {
maxV = v
maxK = k
}
}
// maxK is the key with the maximum value.

Expanding on #Ainar-G answer, if there is a possibility that your map could contain multiple keys that occur the same number of times, then #Ainar-G code could return different results every time because Go maps are inherently unordered; in other words, the first key in your map to have a value higher then all previous values becomes the highest key, but you don't always know whether that value will occur first in the map. See this as an example.
In order for the code to be deterministic, you will need to address the case where two keys have the same value. A simple implementation would be to do a string comparison if the value is the same.
maxK := ""
maxV := 0
for k, v := range symbols {
if v > maxV || (v == maxV && k < maxK) {
maxV = v
maxK = k
}
}

Related

how to manipulate very long string to avoid out of memory with golang

I trying for personal skills improvement to solve the hacker rank challenge:
There is a string, s, of lowercase English letters that is repeated infinitely many times. Given an integer, n, find and print the number of letter a's in the first n letters of the infinite string.
1<=s<=100 && 1<=n<=10^12
Very naively I though this code will be fine:
fs := strings.Repeat(s, int(n)) // full string
ss := fs[:n] // sub string
fmt.Println(strings.Count(ss, "a"))
Obviously I explode the memory and got an: "out of memory".
I never faced this kind of issue, and I'm clueless on how to handle it.
How can I manipulate very long string to avoid out of memory ?
I hope this helps, you don't have to actually count by running through the string.
That is the naive approach. You need to use some basic arithmetic to get the answer without running out of memory, I hope the comments help.
var answer int64
// 1st figure out how many a's are present in s.
aCount := int64(strings.Count(s, "a"))
// How many times will s repeat in its entirety if it had to be of length n
repeats := n / int64(len(s))
remainder := n % int64(len(s))
// If n/len(s) is not perfectly divisible, it means there has to be a remainder, check if that's the case.
// If s is of length 5 and the value of n = 22, then the first 2 characters of s would repeat an extra time.
if remainder > 0{
aCountInRemainder := strings.Count(s[:remainder], "a")
answer = int64((aCount * repeats) + int64(aCountInRemainder))
} else{
answer = int64((aCount * repeats))
}
return answer
There might be other methods but this is what came to my mind.
As you found out, if you actually generate the string you will end up having that huge memory block in RAM.
One common way to represent a "big sequence of incoming bytes" is to implement it as an io.Reader (which you can view as a stream of bytes), and have your code run a r.Read(buff) loop.
Given the specifics of the exercise you mention (a fixed string repeated n times), the number of occurrence of a specific letter can also be computed straight from the number of occurences of that letter in s, plus something more (I'll let you figure out what multiplications and counting should be done).
How to implement a Reader that repeats the string without allocating 10^12 times the string ?
Note that, when implementing the .Read() method, the caller has already allocated his buffer. You don't need to repeat your string in memory, you just need to fill the buffer with the correct values -- for example by copying byte by byte your data into the buffer.
Here is one way to do it :
type RepeatReader struct {
str string
count int
}
func (r *RepeatReader) Read(p []byte) (int, error) {
if r.count == 0 {
return 0, io.EOF
}
// at each iteration, pos will hold the number of bytes copied so far
var pos = 0
for r.count > 0 && pos < len(p) {
// to copy slices over, you can use the built-in 'copy' method
// at each iteration, you need to write bytes *after* the ones you have already copied,
// hence the "p[pos:]"
n := copy(p[pos:], r.str)
// update the amount of copied bytes
pos += n
// bad computation for this first example :
// I decrement one complete count, even if str was only partially copied
r.count--
}
return pos, nil
}
https://go.dev/play/p/QyFQ-3NzUDV
To have a complete, correct implementation, you also need to keep track of the offset you need to start from next time .Read() is called :
type RepeatReader struct {
str string
count int
offset int
}
func (r *RepeatReader) Read(p []byte) (int, error) {
if r.count == 0 {
return 0, io.EOF
}
var pos = 0
for r.count > 0 && pos < len(p) {
// when copying over to p, you should start at r.offset :
n := copy(p[pos:], r.str[r.offset:])
pos += n
// update r.offset :
r.offset += n
// if one full copy of str has been issued, decrement 'count' and reset 'offset' to 0
if r.offset == len(r.str) {
r.count--
r.offset = 0
}
}
return pos, nil
}
https://go.dev/play/p/YapRuioQcOz
You can now count the as while iterating through this Reader.

Sort 2D array of structs Golang

I want to create a consistent ordering for a 2D slice of structs, I am creating the 2D slice from a map so the order is always different.
My structs look like
// Hit contains the data for a hit.
type Hit struct {
Key string `json:"key"`
Data []Field `json:"data"`
}
// Hits stores a list of hits.
type Hits [][]Hit
I want to provide a consistent order for the contents of my Hits type.
I have tried:
func (c Hits) Len() int { return len(c) }
func (c Hits) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
func (c Hits) Less(i, j int) bool { return strings.Compare(c[i][0].Key, c[j][0].Key) == -1 }
But the results still seem to come back in random order.
I was thinking of possibly hashing each item in the slice but thought there might be an easier option
The order of iteration over a map, because it's a hash table is rather indeterminate (it's not, really — insert items with the same keys in the same exact sequence into 2 maps and the order of iteration for each will be identical).
Assuming that your map is a map[string]Hit, to iterate it over in a determinate order, I would enumerate the set of keys in the map, sort that, and use that sorted set to enumerate the map.
Something like this:
package main
import (
"fmt"
"sort"
)
type Hit struct {
Key string `json:"key"`
Data []Field `json:"data"`
}
type Field struct {
Value string `json:"value"`
}
func main() {
var mapOfHits = getSomeHits()
var sortedHits = sortHits(mapOfHits)
for _, h := range sortedHits {
fmt.Println(h.Key)
}
}
func getSomeHits() map[string]Hit {
return make(map[string]Hit, 0)
}
func sortHits(m map[string]Hit) []Hit {
keys := make([]string, 0, len(m))
sorted := make([]Hit, 0, len(m))
for k := range m {
keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
sorted = append(sorted, m[k])
}
return sorted
}

++ operator for map key loop in go

I am following the Go tutorial here https://tour.golang.org/moretypes/23 and have modified the exercise a little bit to try to dig deeper.
package main
import (
"fmt"
"strings"
)
func WordCount(s string) map[string]int {
m := make(map[string]int)
x := strings.Fields(s)
for _, e := range x {
m[e]++
}
return m
}
func main() {
phrase := "The quick brown fox"
fmt.Println(WordCount(phrase), "length:", len(WordCount(phrase)))
}
What doesn't make sense to me is how the ++ operator works in this context when adding new elements to the map.
Definition of ++ operator: Increment operator. It increases the integer value by one.
In this context, the ++ operator increasing the integer value of the LENGTH of the map and then adding the e element to the new map length?
The default value of int values in a map is 0. So, when you iterate through x and call m[e]++, the expanded version would be
m[e] = m[e] + 1
In other words:
m[e] = 0 + 1
Of course, if a field repeats, it will already be in the map (with some value > 0).
When you check the length of the map after the loop, it gives the number of unique fields in the string.

How to find ip in range in very large struct

I have a struct like below, with about 100k entires.
I would like to loop over it and check if a ip address is in range.
My current code:
type Users struct {
Id string
Descr string
IpStart string
IpEnd string
}
var users []*Users
func LookUpIP(IpAddress string) (string, string) {
iptocheck := net.ParseIP(IpAddress)
for _, elem := range users {
if bytes.Compare(iptocheck, elem.IpStart) >= 0 && bytes.Compare(iptocheck, elem.IpEnd) <= 0 {
fmt.Printf("%v is between %v and %v\n", IpAddress, elem.IpStart, elem.IpEnd)
return elem.Id, elem.Descr
}
}
return "0", "null"
}
The above works fine with about 40k entires but over that it gets slow. Is there any faster way to find out if a ip address is in range inside my struct?
Update: Now only parsing IP once and storing it as number in struct
There are two simple steps I see.
Do the parsing once and store the IP address as a single number.
Order the ranges by start of the range and use binary search.
As a completion of #Grzegorz Żur suggestion to use binary search for reducing the search time, here is a binary search implementation in go.
But first what is binary search? A binary search divides a range of values into halves, and continues to narrow down the field of search until the unknown value is found. It is the classic example of a "divide and conquer" algorithm.
The algorithm returns the index of some element that equals the given value (if there are multiple such elements, it returns some arbitrary one). It is also possible, when the element is not found, to return the "insertion point" for it (the index that the value would have if it were inserted into the array).
The recursive method
func binarySearch(a []float64, value float64, low int, high int) int {
if high < low {
return -1
}
mid := (low + high) / 2 // calculate the mean of two values
if a[mid] > value {
return binarySearch(a, value, low, mid-1)
} else if a[mid] < value {
return binarySearch(a, value, mid+1, high)
}
return mid
}
The iterative method
func binarySearch(a []float64, value float64) int {
low := 0
high := len(a) - 1
for low <= high {
mid := (low + high) / 2
if a[mid] > value {
high = mid - 1
} else if a[mid] < value {
low = mid + 1
} else {
return mid
}
}
return -1
}
But if you take a look into the sort package you can observe that there is an already implemented sorting algorithm based on binary search. So probably this would be the best option.

Inverted Return from strings.Replace() Golang

I have a large dataset where I needed to do some string manipulation (I know strings are immutable). The Replace() function in the strings package does exactly what I need, except I need it to search in reverse.
Say I have this string: AA-BB-CC-DD-EE
Run this script:
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Replace("AA-BB-CC-DD-EE", "-", "", 1))
}
It outputs: AABB-CC-DD-EE
What I need is: AA-BBCCDDEE, where the first instance of the search key is found, and the rest discarded.
Splitting the string, inserting the dash, and joining it back together works. But, I'm thinking there is a more performant way to achieve this.
String slices!
in := "AA-BB-CC-DD-EE"
afterDash := strings.Index(in, "-") + 1
fmt.Println(in[:afterDash] + strings.Replace(in[afterDash:], "-", "", -1))
(might require some tweaking to get the behavior you want in the case that the input has no dashes).
This can be another solution
package main
import (
"strings"
"fmt"
)
func Reverse(s string) string {
n := len(s)
runes := make([]rune, n)
for _, rune := range s {
n--
runes[n] = rune
}
return string(runes[n:])
}
func main() {
S := "AA-BB-CC-DD-EE"
S = Reverse(strings.Replace(Reverse(S), "-", "", strings.Count(S, "-")-1))
fmt.Println(S)
}
Another solution:
package main
import (
"fmt"
"strings"
)
func main() {
S := strings.Replace("AA-BB-CC-DD-EE", "-", "*", 1)
S = strings.Replace(S, "-", "", -1)
fmt.Println(strings.Replace( S, "*", "-", 1))
}
I think you want to use strings.Map rather than rigging things with compositions of functions. It's basically meant for this scenario: character replacement with more complex requirements than Replace and cousins can handle. The definition:
Map returns a copy of the string s with all its characters modified according to the mapping function. If mapping returns a negative value, the character is dropped from the string with no replacement.
Your mapping function can be built with a fairly simple closure:
func makeReplaceFn(toReplace rune, skipCount int) func(rune) rune {
count := 0
return func(r rune) rune {
if r == toReplace && count < skipCount {
count++
} else if r == toReplace && count >= skipCount {
return -1
}
return r
}
}
From there, it's a very straightforward program:
strings.Map(makeReplaceFn('-', 1), "AA-BB-CC-DD-EE")
Playground, this produces the desired output:
AA-BBCCDDEE
Program exited.
I'm not sure whether this is faster or slower than other solutions without benchmarking, because on one hand it has to call a function for each rune in the string, while on the other hand it doesn't have to convert (and thus copy) between a []byte/[]rune and string between each function call (though the subslicing answer by hobbs is probably overall the best).
In addition, the method can be easily adapted to other scenarios (e.g. retaining every other dash), with the caveat that strings.Map can only do rune to rune mapping, and not rune to string mapping like strings.Replace does.
This was a fun question to answer. While the solutions offered work neatly, splitting and replacing, to say nothing of calling Replace 3 times doesn't seem likely to be performant.
The answer? Don't reinvent the wheel, the go standard library has already almost solved this problem with Replace(), let's tweak it. I stumbled a bit over how the API of our new function should work, finally settling on leaving the signature unchanged, but deciding on minimal change from strings.Replace:
func ReplaceAfter(s,old,new string,skip int) string
The variable skip replaces n to clarify what it does since the caller will specify how many instances of old to skip replacing. skip==0 is defined as replacing every instance and skip==-1 is defined as replacing no instances.
From here there were really only a few bits of the function that needed changing.
func ReplaceAfter(s, old, new string, skip int) string {
if old == new || skip == -1 { // changed
return s // avoid allocation
}
// Compute number of replacements.
m := strings.Count(s, old)
if m == 0 || m < skip { // changed
return s // avoid allocation
} // changed (removed else if)
// Apply replacements to buffer.
n := m - skip // changed, n means the same thing but is calculated
t := make([]byte, len(s)+n*(len(new)-len(old))) // longer buffer
w := 0
start := 0
for i := 0; i < m; i++ {
j := start
if len(old) == 0 {
if i > 0 {
_, wid := utf8.DecodeRuneInString(s[start:])
j += wid
}
} else {
j += strings.Index(s[start:], old)
}
if i >= skip { // changed, replace
w += copy(t[w:], s[start:j])
w += copy(t[w:], new)
} else { // changed, skip ahead
w += copy(t[w:], s[start:j+len(old)])
}
start = j + len(old)
}
w += copy(t[w:], s[start:])
return string(t[0:w])
}
Here's a playground link with a working demo. If you're interested, I also copied and adapted the relevant Test functions from go/src/strings/, to make sure that the function as written behaved itself predictably.

Resources