Using regular expressions in Go to Identify a common pattern - go

I'm trying to parse this string goats=1\r\nalligators=false\r\ntext=works.
contents := "goats=1\r\nalligators=false\r\ntext=works"
compile, err := regexp.Compile("([^#\\s=]+)=([a-zA-Z0-9.]+)")
if err != nil {
return
}
matchString := compile.FindAllStringSubmatch(contents, -1)
my Output looks like [[goats=1 goats 1] [alligators=false alligators false] [text=works text works]]
What I'm I doing wrong in my expression to cause goats=1 to be valid too? I only want [[goats 1]...]

For another approach, you can use the strings package instead:
package main
import (
"fmt"
"strings"
)
func parse(s string) map[string]string {
m := make(map[string]string)
for _, kv := range strings.Split(s, "\r\n") {
a := strings.Split(kv, "=")
m[a[0]] = a[1]
}
return m
}
func main() {
m := parse("goats=1\r\nalligators=false\r\ntext=works")
fmt.Println(m) // map[alligators:false goats:1 text:works]
}
https://golang.org/pkg/strings

Related

How to parse a JSON string returned from scanner.Text() [duplicate]

Objects like the below can be parsed quite easily using the encoding/json package.
[
{"something":"foo"},
{"something-else":"bar"}
]
The trouble I am facing is when there are multiple dicts returned from the server like this :
{"something":"foo"}
{"something-else":"bar"}
This can't be parsed using the code below.
correct_format := strings.Replace(string(resp_body), "}{", "},{", -1)
json_output := "[" + correct_format + "]"
I am trying to parse Common Crawl data (see example).
How can I do this?
Assuming your input is really a series of valid JSON documents, use a json.Decoder to decode them:
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"strings"
)
var input = `
{"foo": "bar"}
{"foo": "baz"}
`
type Doc struct {
Foo string
}
func main() {
dec := json.NewDecoder(strings.NewReader(input))
for {
var doc Doc
err := dec.Decode(&doc)
if err == io.EOF {
// all done
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("%+v\n", doc)
}
}
Playground: https://play.golang.org/p/ANx8MoMC0yq
If your input really is what you've shown in the question, that's not JSON and you have to write your own parser.
Seems like each line is its own json object.
You may get away with the following code which will structure this output into correct json:
package main
import (
"fmt"
"strings"
)
func main() {
base := `{"trolo":"lolo"}
{"trolo2":"lolo2"}`
delimited := strings.Replace(base, "\n", ",", -1)
final := "[" + delimited + "]"
fmt.Println(final)
}
You should be able to use encoding/json library on final now.
Another option would be to parse each incoming line, line by line, and then add each one to a collection in code (ie a slice) Go provides a line scanner for this.
yourCollection := []yourObject{}
scanner := bufio.NewScanner(YOUR_SOURCE)
for scanner.Scan() {
obj, err := PARSE_JSON_INTO_yourObject(scanner.Text())
if err != nil {
// something
}
yourCollection = append(yourCollection, obj)
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
You can read the ndjson from the file row by row and parse it then apply the logical operations on it. In the below sample instead of reading from the file, I have used an Array of JSON string.
import (
"encoding/json"
"fmt"
)
type NestedObject struct {
D string
E string
}
type OuterObject struct {
A string
B string
C []NestedObject
}
func main() {
myJsonString := []string{`{"A":"1","B":"2","C":[{"D":"100","E":"10"}]}`, `{"A":"11","B":"21","C":[{"D":"1001","E":"101"}]}`}
for index, each := range myJsonString {
fmt.Printf("Index value [%d] is [%v]\n", index, each)
var obj OuterObject
json.Unmarshal([]byte(each), &obj)
fmt.Printf("a: %v, b: %v, c: %v", obj.A, obj.B, obj.C)
fmt.Println()
}
}
Output:
Index value [0] is [{"A":"1","B":"2","C":[{"D":"100","E":"10"}]}]
a: 1, b: 2, c: [{100 10}]
Index value [1] is [{"A":"11","B":"21","C":[{"D":"1001","E":"101"}]}]
a: 11, b: 21, c: [{1001 101}]
Try it on golang play

How to make a slice from a mapset.Set?

I'm reading Donovan's "The Go Programming Language" book and trying to implement an exercise which prints duplicate lines from several files and the files in which they occur:
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
mapset "github.com/deckarep/golang-set"
)
func main() {
counts := make(map[string]int)
occurrences := make(map[string]mapset.Set)
for _, filename := range os.Args[1:] {
data, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Fprintf(os.Stderr, "dup3: %v\n", err)
continue
}
for _, line := range strings.Split(string(data), "\n") {
counts[line]++
occurrences[line].Add(filename)
}
}
for line, n := range counts {
if n > 1 {
fmt.Printf("%d\t%s\t%s\n", n, line, strings.Join(occurrences[line], ", "))
}
}
}
To accomplish the exercise, I've used the https://godoc.org/github.com/deckarep/golang-set package. However, I'm not sure how to print out the elements of the set joined by a ", ". With this code, I get a
./hello.go:23:30: first argument to append must be slice; have interface { Add(interface {}) bool; Cardinality() int; CartesianProduct(mapset.Set) mapset.Set; Clear(); Clone() mapset.Set; Contains(...interface {}) bool; Difference(mapset.Set) mapset.Set; Each(func(interface {}) bool); Equal(mapset.Set) bool; Intersect(mapset.Set) mapset.Set; IsProperSubset(mapset.Set) bool; IsProperSuperset(mapset.Set) bool; IsSubset(mapset.Set) bool; IsSuperset(mapset.Set) bool; Iter() <-chan interface {}; Iterator() *mapset.Iterator; Pop() interface {}; PowerSet() mapset.Set; Remove(interface {}); String() string; SymmetricDifference(mapset.Set) mapset.Set; ToSlice() []interface {}; Union(mapset.Set) mapset.Set }
./hello.go:28:64: cannot use occurrences[line] (type mapset.Set) as type []string in argument to strings.Join
I wasn't able to easily find out how to convert the Set to a slice though. Any idea how I might accomplish this?
The XY problem is asking about your attempted solution rather than your actual problem: The XY Problem.
The Go Programming Language by Alan A. A. Donovan and Brian W. Kernighan, Exercise 1.4 is designed to use Go maps.
For example,
// Modify dup3 to print the names of all files in which each duplicated line occurs.
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
// counts = [line][file]count
counts := make(map[string]map[string]int)
for _, filename := range os.Args[1:] {
data, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Fprintf(os.Stderr, "Exercise 1.4: %v\n", err)
continue
}
for _, line := range strings.Split(string(data), "\n") {
files := counts[line]
if files == nil {
files = make(map[string]int)
counts[line] = files
}
files[filename]++
}
}
for line, files := range counts {
n := 0
for _, count := range files {
n += count
}
if n > 1 {
fmt.Printf("%d\t%s\n", n, line)
for name := range files {
fmt.Printf("%s\n", name)
}
}
}
}

Convert slice of string input from console to slice of numbers

I'm trying to write a Go script that takes in as many lines of comma-separated coordinates as the user wishes, split and convert the string of coordinates to float64, store each line as a slice, and then append each slice in a slice of slices for later usage.
Example inputs are:
1.1,2.2,3.3
3.14,0,5.16
Example outputs are:
[[1.1 2.2 3.3],[3.14 0 5.16]]
The equivalent in Python is
def get_input():
print("Please enter comma separated coordinates:")
lines = []
while True:
line = input()
if line:
line = [float(x) for x in line.replace(" ", "").split(",")]
lines.append(line)
else:
break
return lines
But what I wrote in Go seems way too long (pasted below), and I'm creating a lot of variables without the ability to change variable type as in Python. Since I literally just started writing Golang to learn it, I fear my script is long as I'm trying to convert Python thinking into Go. Therefore, I would like to ask for some advice as to how to write this script shorter and more concise in Go style? Thank you.
package main
import (
"fmt"
"os"
"bufio"
"strings"
"strconv"
)
func main() {
inputs := get_input()
fmt.Println(inputs)
}
func get_input() [][]float64 {
fmt.Println("Please enter comma separated coordinates: ")
var inputs [][]float64
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
if len(scanner.Text()) > 0 {
raw_input := strings.Replace(scanner.Text(), " ", "", -1)
input := strings.Split(raw_input, ",")
converted_input := str2float(input)
inputs = append(inputs, converted_input)
} else {
break
}
}
return inputs
}
func str2float(records []string) []float64 {
var float_slice []float64
for _, v := range records {
if s, err := strconv.ParseFloat(v, 64); err == nil {
float_slice = append(float_slice, s)
}
}
return float_slice
}
Using only string functions:
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
var result [][]float64
var txt string
for scanner.Scan() {
txt = scanner.Text()
if len(txt) > 0 {
values := strings.Split(txt, ",")
var row []float64
for _, v := range values {
fl, err := strconv.ParseFloat(strings.Trim(v, " "), 64)
if err != nil {
panic(fmt.Sprintf("Incorrect value for float64 '%v'", v))
}
row = append(row, fl)
}
result = append(result, row)
}
}
fmt.Printf("Result: %v\n", result)
}
Run:
$ printf "1.1,2.2,3.3
3.14,0,5.16
2,45,76.0, 45 , 69" | go run experiment2.go
Result: [[1.1 2.2 3.3] [3.14 0 5.16] [2 45 76 45 69]]
With given input, you can concatenate them to make a JSON string and then unmarshal (deserialize) that:
func main() {
var lines []string
for {
var line string
fmt.Scanln(&line)
if line == "" {
break
}
lines = append(lines, "["+line+"]")
}
all := "[" + strings.Join(lines, ",") + "]"
inputs := [][]float64{}
if err := json.Unmarshal([]byte(all), &inputs); err != nil {
fmt.Println(err)
return
}
fmt.Println(inputs)
}

How to convert string from interface to []string in golang?

I'm parsing a JSON object which contains an array of strings :
var ii interface{}
json := "{\"aString\": [\"aaa_111\", \"bbb_222\"], \"whatever\":\"ccc\"}"
err := json.Unmarshal([]byte(json), &ii)
if err != nil {
log.Fatal(err)
}
data := ii.(map[string]interface{})
fmt.Println(data["aString"]) // outputs: ["aaa_111" "bbb_222"]
I tried to convert data["aString"] to []string to be able to loop over it, but it fails :
test := []string(data["aString"]).([]string)
fmt.Println(test) // panic -> interface conversion:
// interface is string, not []string
How can I convert data["aString"] ?
edit:
I didn't express myself properly. If I print data, I have such map :
map[aString:["BBB-222","AAA-111"] whatever:ccc]
I want to loop over aString (to manipule each array entry). But I can't find how, because aString is type interface {} :
for i, v := range aString { // <-- fails
// ...
fmt.Println(i, v)
}
That's why I want to convert aString. I don't want to convert a string which looks like an array to an array.
I recommend you move away from this implementation in general. Your json may vary but you can easily use objects and avoid all this type unsafe nonsense.
Anyway, that conversion doesn't work because the types inside the slice are not string, they're also interface{}. You have to iterate the collection then do a type assertion on each item like so:
aInterface := data["aString"].([]interface{})
aString := make([]string, len(aInterface))
for i, v := range aInterface {
aString[i] = v.(string)
}
Is it what you need?
package main
import (
"fmt"
"encoding/json"
)
func main() {
js := "{\"aString\": [\"aaa_111\", \"bbb_222\"], \"whatever\":\"ccc\"}"
a := make(map[string]interface{})
json.Unmarshal([]byte(js), &a)
for _, v := range a["aString"].([]interface{}) {
str := v.(string)
fmt.Println(str)
}
}
Check on Go Playground
For another approach, you can use a struct instead:
package main
import (
"encoding/json"
"fmt"
)
func main() {
s := []byte(`{"aString": ["aaa_111", "bbb_222"], "whatever":"ccc"}`)
var t struct {
Astring []string
Whatever string
}
json.Unmarshal(s, &t)
fmt.Printf("%+v\n", t) // {Astring:[aaa_111 bbb_222] Whatever:ccc}
}

Go - How to convert binary string as text to binary bytes?

I have a text dump file with strings like this one:
x\x9cK\xb42\xb5\xaa.\xb6\xb2\xb0R\xcaK-\x09J\xccKOU
I need to convert them to []byte.
Can someone please suggest how this can be done in Go?
The python equivalent is decode('string_escape').
Here is one way of doing it. Note this isn't a complete decode of the python string_escape format, but may be sufficient given the example you've given.
playground link
package main
import (
"fmt"
"log"
"regexp"
"strconv"
)
func main() {
b := []byte(`x\x9cK\xb42\xb5\xaa.\xb6\xb2\xb0R\xcaK-\x09J\xccKOU`)
re := regexp.MustCompile(`\\x([0-9a-fA-F]{2})`)
r := re.ReplaceAllFunc(b, func(in []byte) []byte {
i, err := strconv.ParseInt(string(in[2:]), 16, 64)
if err != nil {
log.Fatalf("Failed to convert hex: %s", err)
}
return []byte{byte(i)}
})
fmt.Println(r)
fmt.Println(string(r))
}
I did have the idea of using the json decoder, but unfortunately it doesn't understand the \xYY syntax.
Here's how you might approach write a little parser (if you needed to support other esc things in the future):
import (
"fmt"
"encoding/hex"
)
func decode(bs string) ([]byte,error) {
in := []byte(bs)
res := make([]byte,0)
esc := false
for i := 0; i<len(in); i++ {
switch {
case in[i] == '\\':
esc = true
continue
case esc:
switch {
case in[i] == 'x':
b,err := hex.DecodeString(string(in[i+1:i+3]))
if err != nil {
return nil,err
}
res = append(res, b...)
i = i+2
default:
res = append(res, in[i])
}
esc = false
default:
res = append(res, in[i])
}
}
return res,nil
}
playground

Resources