I am new to the language GO and working on an assignment where i should write a code that return the word frequencies of the text. However I know that the words 'Hello', 'HELLO' and 'hello' are all counted as 'hello', so I need to convert all strings to lower case.
I know that I should use strings.ToLower(), however I dont know where I should Included that in the class. Can someone please help me?
package main
import (
"fmt"
"io/ioutil"
"log"
"strings"
"time"
)
const DataFile = "loremipsum.txt"
// Return the word frequencies of the text argument.
func WordCount(text string) map[string]int {
fregs := make(map[string]int)
words := strings.Fields(text)
for _, word := range words {
fregs[word] += 1
}
return fregs
}
// Benchmark how long it takes to count word frequencies in text numRuns times.
//
// Return the total time elapsed.
func benchmark(text string, numRuns int) int64 {
start := time.Now()
for i := 0; i < numRuns; i++ {
WordCount(text)
}
runtimeMillis := time.Since(start).Nanoseconds() / 1e6
return runtimeMillis
}
// Print the results of a benchmark
func printResults(runtimeMillis int64, numRuns int) {
fmt.Printf("amount of runs: %d\n", numRuns)
fmt.Printf("total time: %d ms\n", runtimeMillis)
average := float64(runtimeMillis) / float64(numRuns)
fmt.Printf("average time/run: %.2f ms\n", average)
}
func main() {
// read in DataFile as a string called data
data, err:= ioutil.ReadFile("loremipsum.txt")
if err != nil {
log.Fatal(err)
}
// Convert []byte to string and print to screen
text := string(data)
fmt.Println(text)
fmt.Printf("%#v",WordCount(string(data)))
numRuns := 100
runtimeMillis := benchmark(string(data), numRuns)
printResults(runtimeMillis, numRuns)
}
You should convert words to lowercase when you are using them as map key
for _, word := range words {
fregs[strings.ToLower(word)] += 1
}
I get [a:822 a.:110 I want all a in the same. How do i a change the code so that a and a. is the same? – hello123
You need to carefully define a word. For example, a string of consecutive letters and numbers converted to lowercase.
func WordCount(s string) map[string]int {
wordFunc := func(r rune) bool {
return !unicode.IsLetter(r) && !unicode.IsNumber(r)
}
counts := make(map[string]int)
for _, word := range strings.FieldsFunc(s, wordFunc) {
counts[strings.ToLower(word)]++
}
return counts
}
to remove all non-word characters you could use a regular expression:
package main
import (
"bufio"
"fmt"
"log"
"regexp"
"strings"
)
func main() {
str1 := "This is some text! I want to count each word. Is it cool?"
re, err := regexp.Compile(`[^\w]`)
if err != nil {
log.Fatal(err)
}
str1 = re.ReplaceAllString(str1, " ")
scanner := bufio.NewScanner(strings.NewReader(str1))
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println(strings.ToLower(scanner.Text()))
}
}
See strings.EqualFold.
Here is an example.
Related
I'm trying to parse this string goats=1\r\nalligators=false\r\ntext=works.
contents := "goats=1\r\nalligators=false\r\ntext=works"
compile, err := regexp.Compile("([^#\\s=]+)=([a-zA-Z0-9.]+)")
if err != nil {
return
}
matchString := compile.FindAllStringSubmatch(contents, -1)
my Output looks like [[goats=1 goats 1] [alligators=false alligators false] [text=works text works]]
What I'm I doing wrong in my expression to cause goats=1 to be valid too? I only want [[goats 1]...]
For another approach, you can use the strings package instead:
package main
import (
"fmt"
"strings"
)
func parse(s string) map[string]string {
m := make(map[string]string)
for _, kv := range strings.Split(s, "\r\n") {
a := strings.Split(kv, "=")
m[a[0]] = a[1]
}
return m
}
func main() {
m := parse("goats=1\r\nalligators=false\r\ntext=works")
fmt.Println(m) // map[alligators:false goats:1 text:works]
}
https://golang.org/pkg/strings
Suppose I have a string containing Unicode characters. For example:
s := "foo 日本 foo!"
I'm trying to find the last occurrence foo in the string:
index := strings.LastIndex(s, "foo")
The expected result here would be 7 but this will return 11 as the index due to the Unicode in the string.
Is there a way to handle this using standard library functions?
You're encountering the difference between runes in go and bytes. Strings are composed of bytes, not runes. If you haven't learned about this, you should read https://blog.golang.org/strings.
Here's my version of a quick function to calculate the number of runes preceding the last match of a substring in a string. The basic approach is to find the byte index, then iterate/count through the strings runes until that number of bytes have been consumed.
I'm not aware of a standard library method that will do this directly.
package main
import (
"fmt"
"strings"
)
func LastRuneIndex(s, substr string) (int, error) {
byteIndex := strings.LastIndex(s, substr)
if byteIndex < 0 {
return byteIndex, nil
}
reader := strings.NewReader(s)
count := 0
for byteIndex > 0 {
_, bytes, err := reader.ReadRune()
if err != nil {
return 0, err
}
byteIndex = byteIndex - bytes
count += 1
}
return count, nil
}
func main() {
s := "foo 日本 foo!"
count, err := LastRuneIndex(s, "foo")
fmt.Println(count, err)
// outputs:
// 7 <nil>
}
This gets pretty close:
package main
import (
"golang.org/x/text/language"
"golang.org/x/text/search"
)
func main() {
m := search.New(language.English)
start, end := m.IndexString("foo 日本 foo!", "foo")
println(start == 0, end == 3)
}
buts it's searching forward. I tried this:
m.IndexString("foo 日本 foo!", "foo", search.Backwards)
but I get this result:
panic: TODO: implement
https://pkg.go.dev/golang.org/x/text/search
https://github.com/golang/text/blob/v0.3.6/search/search.go#L222-L223
I'm reading Donovan's "The Go Programming Language" book and trying to implement an exercise which prints duplicate lines from several files and the files in which they occur:
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
mapset "github.com/deckarep/golang-set"
)
func main() {
counts := make(map[string]int)
occurrences := make(map[string]mapset.Set)
for _, filename := range os.Args[1:] {
data, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Fprintf(os.Stderr, "dup3: %v\n", err)
continue
}
for _, line := range strings.Split(string(data), "\n") {
counts[line]++
occurrences[line].Add(filename)
}
}
for line, n := range counts {
if n > 1 {
fmt.Printf("%d\t%s\t%s\n", n, line, strings.Join(occurrences[line], ", "))
}
}
}
To accomplish the exercise, I've used the https://godoc.org/github.com/deckarep/golang-set package. However, I'm not sure how to print out the elements of the set joined by a ", ". With this code, I get a
./hello.go:23:30: first argument to append must be slice; have interface { Add(interface {}) bool; Cardinality() int; CartesianProduct(mapset.Set) mapset.Set; Clear(); Clone() mapset.Set; Contains(...interface {}) bool; Difference(mapset.Set) mapset.Set; Each(func(interface {}) bool); Equal(mapset.Set) bool; Intersect(mapset.Set) mapset.Set; IsProperSubset(mapset.Set) bool; IsProperSuperset(mapset.Set) bool; IsSubset(mapset.Set) bool; IsSuperset(mapset.Set) bool; Iter() <-chan interface {}; Iterator() *mapset.Iterator; Pop() interface {}; PowerSet() mapset.Set; Remove(interface {}); String() string; SymmetricDifference(mapset.Set) mapset.Set; ToSlice() []interface {}; Union(mapset.Set) mapset.Set }
./hello.go:28:64: cannot use occurrences[line] (type mapset.Set) as type []string in argument to strings.Join
I wasn't able to easily find out how to convert the Set to a slice though. Any idea how I might accomplish this?
The XY problem is asking about your attempted solution rather than your actual problem: The XY Problem.
The Go Programming Language by Alan A. A. Donovan and Brian W. Kernighan, Exercise 1.4 is designed to use Go maps.
For example,
// Modify dup3 to print the names of all files in which each duplicated line occurs.
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
// counts = [line][file]count
counts := make(map[string]map[string]int)
for _, filename := range os.Args[1:] {
data, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Fprintf(os.Stderr, "Exercise 1.4: %v\n", err)
continue
}
for _, line := range strings.Split(string(data), "\n") {
files := counts[line]
if files == nil {
files = make(map[string]int)
counts[line] = files
}
files[filename]++
}
}
for line, files := range counts {
n := 0
for _, count := range files {
n += count
}
if n > 1 {
fmt.Printf("%d\t%s\n", n, line)
for name := range files {
fmt.Printf("%s\n", name)
}
}
}
}
I'm sure there is a better way to do this, and I understand it's simple but I am new to go so bear with me. I am trying to set the fields of a struct (playersObject) from two functions (setCalculations and Calculations), more specifically, I am passing in values of two arrays (playerData and playerData2 from main to those functions, performing calculations in those functions, and want to return the values so that they can be set within the struct.
package main
import (
"fmt"
"os"
"log"
"strings"
"bufio"
"strconv"
)
type playersObject struct {
firstname, lastname string
batting_average, slugging_percentage, OBP, teamaverage float64
}
func strToFloat(playerData []string, playerData2 []float64) []float64 {
for _, i := range playerData[2:] {
j, err := strconv.ParseFloat(i, 64)
if err != nil {
panic(err)
}
playerData2 = append(playerData2, j)
}
return playerData2
}
func (player *playersObject) setCalculations (playerData []string, playerData2 []float64) {
player.firstname = playerData[1]
player.lastname = playerData[0]
player.batting_average = (playerData2[2] + playerData2[3] + playerData2[4] + playerData2[5]) / (playerData2[1])
player.slugging_percentage = ((playerData2[2]) + (playerData2[3]*2) + (playerData2[4]*3) + (playerData2[5]*4) )/(playerData2[1])
player.OBP = (( playerData2[2] + playerData2[3] + playerData2[4] + playerData2[5] +playerData2[6] +playerData2[7])/ (playerData2[0]))
}
func (player *playersObject) Calculations () (string, string, float64, float64, float64, ) {
return player.firstname, player.lastname, player.batting_average, player.slugging_percentage, player.OBP
}
func main() {
reader := bufio.NewReader(os.Stdin)
fmt.Print("Enter file name: ")
fileName, err := reader.ReadString('\n')
if err != nil {
log.Fatalf("failed opening file: %s", err)
}
fileName = strings.TrimSuffix(fileName, "\n")
//fmt.Printf("%q\n", fileName)
file, err := os.Open(fileName)
scanner := bufio.NewScanner(file)
scanner.Split(bufio.ScanLines)
var fileOfPlayers []string
for scanner.Scan() {
fileOfPlayers = append(fileOfPlayers, scanner.Text())
}
file.Close()
// var total_Average_sum float64 = 0
var countofplayers float64 = 0
//var total_average float64 = 0
for _, player := range fileOfPlayers {
countofplayers ++
playerData := strings.Split(player, " ")
var playerData2 = []float64{}
playerData2 = strToFloat(playerData, playerData2)
player := playersObject{}
player.setCalculations(playerData, playerData2)
calcs := player.Calculations()
fmt.Println(firstname, lastname, batting_average, slugging_percentage, OBP)
}
}
I recieve the errors multiple-value player.Calculations() in single-value contextand undefined: firstname, lastname, batting_average, slugging_percentage, OBP
I know this is very incorrect but again I am new to go and OOP. If this can be done in any simpler way I am open to it and appreciate all help and tips. Thank you
Here, the error is thrown because Calculations() returns multiple values but you are trying to assign it to a single variable.
You need to change the player.Calculations() method invocation from
calcs := player.Calculations()
to
firstname, lastname, batting_average, slugging_percentage, OBP := player.Calculations()
Having said that I would recommend you to read more about golang may be here. You need to re-write the code in view of go best practises
I'm trying to write a Go script that takes in as many lines of comma-separated coordinates as the user wishes, split and convert the string of coordinates to float64, store each line as a slice, and then append each slice in a slice of slices for later usage.
Example inputs are:
1.1,2.2,3.3
3.14,0,5.16
Example outputs are:
[[1.1 2.2 3.3],[3.14 0 5.16]]
The equivalent in Python is
def get_input():
print("Please enter comma separated coordinates:")
lines = []
while True:
line = input()
if line:
line = [float(x) for x in line.replace(" ", "").split(",")]
lines.append(line)
else:
break
return lines
But what I wrote in Go seems way too long (pasted below), and I'm creating a lot of variables without the ability to change variable type as in Python. Since I literally just started writing Golang to learn it, I fear my script is long as I'm trying to convert Python thinking into Go. Therefore, I would like to ask for some advice as to how to write this script shorter and more concise in Go style? Thank you.
package main
import (
"fmt"
"os"
"bufio"
"strings"
"strconv"
)
func main() {
inputs := get_input()
fmt.Println(inputs)
}
func get_input() [][]float64 {
fmt.Println("Please enter comma separated coordinates: ")
var inputs [][]float64
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
if len(scanner.Text()) > 0 {
raw_input := strings.Replace(scanner.Text(), " ", "", -1)
input := strings.Split(raw_input, ",")
converted_input := str2float(input)
inputs = append(inputs, converted_input)
} else {
break
}
}
return inputs
}
func str2float(records []string) []float64 {
var float_slice []float64
for _, v := range records {
if s, err := strconv.ParseFloat(v, 64); err == nil {
float_slice = append(float_slice, s)
}
}
return float_slice
}
Using only string functions:
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
var result [][]float64
var txt string
for scanner.Scan() {
txt = scanner.Text()
if len(txt) > 0 {
values := strings.Split(txt, ",")
var row []float64
for _, v := range values {
fl, err := strconv.ParseFloat(strings.Trim(v, " "), 64)
if err != nil {
panic(fmt.Sprintf("Incorrect value for float64 '%v'", v))
}
row = append(row, fl)
}
result = append(result, row)
}
}
fmt.Printf("Result: %v\n", result)
}
Run:
$ printf "1.1,2.2,3.3
3.14,0,5.16
2,45,76.0, 45 , 69" | go run experiment2.go
Result: [[1.1 2.2 3.3] [3.14 0 5.16] [2 45 76 45 69]]
With given input, you can concatenate them to make a JSON string and then unmarshal (deserialize) that:
func main() {
var lines []string
for {
var line string
fmt.Scanln(&line)
if line == "" {
break
}
lines = append(lines, "["+line+"]")
}
all := "[" + strings.Join(lines, ",") + "]"
inputs := [][]float64{}
if err := json.Unmarshal([]byte(all), &inputs); err != nil {
fmt.Println(err)
return
}
fmt.Println(inputs)
}