Apache Beam Go SDK: how to convert PCollection<string> to PCollection<KV<string, string>>? - go

I'm using the Apache Beam Go SDK and having a hard time getting a PCollection in the correct format for grouping/combining by key.
I have multiple records per key in a PCollection of strings that look like this:
Bob, cat
Bob, dog
Carla, cat
Carla, bunny
Doug, horse
I want to use GroupByKey and CombinePerKey so I can aggregate each person's pets like this:
Bob, [cat, dog]
Carla, [cat, bunny]
Doug, [horse]
How do I convert a PCollection<string> to PCollection<KV<string, string>>?
They mention something similar here, but the code to aggregate the string values is not included.
I can use a ParDo to get the string key and string value as shown below, but I can't figure out how to convert to the KV<string, string> or CoGBK<string, string> format required as input to GroupPerKey.
pcolOut := beam.ParDo(s, func(line string) (string, string) {
cleanString := strings.TrimSpace(line)
openingChar := ","
iStart := strings.Index(cleanString, openingChar)
key := cleanString[0:iStart]
value := cleanString[iStart+1:]
// How to convert to PCollection<KV<string, string>> before returning?
return key, value
}, pcolIn)
groupedKV := beam.GroupByKey(s, pcolOut)
It fails with the following error. Any suggestions?
panic: inserting ParDo in scope root
creating new DoFn in scope root
binding fn main.main.func2
binding params [{Value string} {Value string}] to input CoGBK<string,string>
values of CoGBK<string,string> cannot bind to {Value string}

To map to KVs, you can apply MapElements and use into() to set KV types and in the via() logic, create a new KV.of(myKey, myValue), for example, to get a KV<String,String>, use something like this:
PCollection<KV<String, String>> kvPairs = linkpages.apply(MapElements.into(
TypeDescriptors.kvs(
TypeDescriptors.strings(),
TypeDescriptors.strings()))
.via(
linkpage -> KV.of(dataFile, linkpage)));

Maybe you mistake the next pardo iter type
test this code
pcolIn := beam.CreateList(s, []string{"Bob, cat",
"Bob, dog",
"Carla, cat",
"Carla, bunny",
"Doug, horse",
})
pcolOut := beam.ParDo(s, func(line string) (string, string) {
cleanString := strings.TrimSpace(line)
openingChar := ","
iStart := strings.Index(cleanString, openingChar)
key := cleanString[0:iStart]
value := cleanString[iStart+1:]
// How to convert to PCollection<KV<string, string>> before returning?
return key, value
}, pcolIn)
groupedKV := beam.GroupByKey(s, pcolOut)
beam.ParDo0(s, func(key string, iter func(*string) bool) {
vals := []string{}
val := ""
for iter(&val) {
vals = append(vals, strings.TrimSpace(val))
}
fmt.Println(key, vals)
}, groupedKV)

Related

Using "dynamic" key to extract value from map [duplicate]

This question already has answers here:
Access struct property by name
(5 answers)
Golang dynamic access to a struct property
(2 answers)
How to access to a struct parameter value from a variable in Golang
(1 answer)
Closed 9 months ago.
Came from javascript background, and just started with Golang. I am learning all the new terms in Golang, and creating new question because I cannot find the answer I need (probably due to lack of knowledge of terms to search for)
I created a custom type, created an array of types, and I want to create a function where I can retrieve all the values of a specific key, and return an array of all the values (brands in this example)
type Car struct {
brand string
units int
}
....
var cars []Car
var singleCar Car
//So i have a loop here and inside the for-loop, i create many single cars
singleCar = Car {
brand: "Mercedes",
units: 20
}
//and i append the singleCar into cars
cars = append(cars, singleCar)
Now what I want to do is to create a function that I can retrieve all the brands, and I tried doing the following. I intend to have key as a dynamic value, so I can search by specific key, e.g. brand, model, capacity etc.
func getUniqueByKey(v []Car, key string) []string {
var combined []string
for i := range v {
combined = append(combined, v[i][key])
//this line returns error -
//invalid operation: cannot index v[i] (map index expression of type Car)compilerNonIndexableOperand
}
return combined
//This is suppose to return ["Mercedes", "Honda", "Ferrari"]
}
The above function is suppose to work if i use getUniqueByKey(cars, "brand") where in this example, brand is the key. But I do not know the syntaxes so it's returning error.
Seems like you're trying to get a property using a slice accessor, which doesn't work in Go. You'd need to write a function for each property. Here's an example with the brands:
func getUniqueBrands(v []Car) []string {
var combined []string
tempMap := make(map[string]bool)
for _, c := range v {
if _, p := tempMap[c.brand]; !p {
tempMap[c.brand] = true
combined = append(combined, c.brand)
}
}
return combined
}
Also, note the for loop being used to get the value of Car here. Go's range can be used to iterate over just indices or both indices and values. The index is discarded by assigning to _.
I would recommend re-using this code with an added switch-case block to get the result you want. If you need to return multiple types, use interface{} and type assertion.
Maybe you could marshal your struct into json data then convert it to a map. Example code:
package main
import (
"encoding/json"
"fmt"
)
type RandomStruct struct {
FieldA string
FieldB int
FieldC string
RandomFieldD bool
RandomFieldE interface{}
}
func main() {
fieldName := "FieldC"
randomStruct := RandomStruct{
FieldA: "a",
FieldB: 5,
FieldC: "c",
RandomFieldD: false,
RandomFieldE: map[string]string{"innerFieldA": "??"},
}
randomStructs := make([]RandomStruct, 0)
randomStructs = append(randomStructs, randomStruct, randomStruct, randomStruct)
res := FetchRandomFieldAndConcat(randomStructs, fieldName)
fmt.Println(res)
}
func FetchRandomFieldAndConcat(randomStructs []RandomStruct, fieldName string) []interface{} {
res := make([]interface{}, 0)
for _, randomStruct := range randomStructs {
jsonData, _ := json.Marshal(randomStruct)
jsonMap := make(map[string]interface{})
err := json.Unmarshal(jsonData, &jsonMap)
if err != nil {
fmt.Println(err)
// panic(err)
}
value, exists := jsonMap[fieldName]
if exists {
res = append(res, value)
}
}
return res
}

Append to golang slice passed as empty interface

How to append to empty interface (that has been verified to be a *[]struct)?
func main() {
var mySlice []myStruct // myStruct can be any struct (dynamic)
decode(&mySlice, "...")
}
func decode(dest interface{}, src string) {
// assume dest has been verified to be *[]struct
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
for _, row := range rows {
// create new struct of type modelType and assign all fields
model := reflect.New(modelType)
for field := fields {
fieldValue := getRowValue(row, field)
model.Elem().FieldByName(field).Set(fieldValue)
}
castedModelRow := model.Elem().Interface()
// append model to dest; how to do this?
// dest = append(dest, castedModelRow)
}
}
Things I've tried:
This simply panics: reflect: call of reflect.Append on ptr Value (as we pass &mySlice instead of mySlice)
dest = reflect.Append(reflect.ValueOf(dest), reflect.ValueOf(castedModelRow))
This works but doesn't set the value back to dest... in main func, len(mySlice) remains 0 after decode function is called.
func decode(dest interface{}, src string) {
...
result := reflect.MakeSlice(reflect.SliceOf(modelType), rowCount, rowCount)
for _, row : range rows {
...
result = reflect.Append(result, reflect.ValueOf(castedModelRow))
}
dest = reflect.ValueOf(result)
}
Here's how to fix the second decode function shown in the question. The statement
dest = reflect.ValueOf(result)
modifies local variable dest, not the caller's value. Use the following statement to modify the caller's slice:
reflect.ValueOf(dest).Elem().Set(result)
The code in the question appends decoded elements after the elements created in reflect.MakeSlice. The resulting slice has len(rows) zero values followed by len(rows) decoded values. Fix by changing
result = reflect.Append(result, reflect.ValueOf(castedModelRow))
to:
result.Index(i).Set(model)
Here's the update version of the second decode function in the question:
func decode(dest interface{}, src string) {
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
result := reflect.MakeSlice(reflect.SliceOf(modelType), len(rows), len(rows))
for i, row := range rows {
model := reflect.New(modelType).Elem()
for _, field := range fields {
fieldValue := getRowValue(row, field)
model.FieldByName(field).Set(fieldValue)
}
result.Index(i).Set(model)
}
reflect.ValueOf(dest).Elem().Set(result)
}
Run it on the Playground.
You were very close with your original solution. You had to de-reference the pointer before calling the append operation. This solution would be helpful if your dest already had some existing elements and you don't want to lose them by creating a newSlice.
tempDest := reflect.ValueOf(dest).Elem()
tempDest = reflect.Append(tempDest, reflect.ValueOf(model.Interface()))
Similar to how #I Love Reflection pointed out, you finally need to set the new slice back to the pointer.
reflect.ValueOf(dest).Elem().Set(tempDest)
Overall Decode:
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
tempDest := reflect.ValueOf(dest).Elem()
for _, row := range rows {
model := reflect.New(modelType).Elem()
for _, field := range fields {
fieldValue := getRowValue(row, field)
model.FieldByName(field).Set(fieldValue)
}
tempDest = reflect.Append(tempDest, reflect.ValueOf(model.Interface()))
}
reflect.ValueOf(dest).Elem().Set(tempDest)

Appending to struct slice in Go

I have two structs, like so:
// init a struct for a single item
type Cluster struct {
Name string
Path string
}
// init a grouping struct
type Clusters struct {
Cluster []Cluster
}
What I want to do is append to new items to the clusters struct. So I wrote a method, like so:
func (c *Clusters) AddItem(item Cluster) []Cluster {
c.Cluster = append(c.Cluster, item)
return c.Cluster
}
The way my app works, I loop through some directories then append the name of the final directory and it's path. I have a function, that is called:
func getClusters(searchDir string) Clusters {
fileList := make([]string, 0)
//clusterName := make([]string, 0)
//pathName := make([]string, 0)
e := filepath.Walk(searchDir, func(path string, f os.FileInfo, err error) error {
fileList = append(fileList, path)
return err
})
if e != nil {
log.Fatal("Error building cluster list: ", e)
}
for _, file := range fileList {
splitFile := strings.Split(file, "/")
// get the filename
fileName := splitFile[len(splitFile)-1]
if fileName == "cluster.jsonnet" {
entry := Cluster{Name: splitFile[len(splitFile)-2], Path: strings.Join(splitFile[:len(splitFile)-1], "/")}
c.AddItem(entry)
}
}
Cluster := []Cluster{}
c := Clusters{Cluster}
return c
}
The problem here is that I don't know the correct way to do this.
Currently, I'm getting:
cmd/directories.go:41:4: undefined: c
So I tried moving this:
Cluster := []Cluster{}
c := Clusters{Cluster}
Above the for loop - range. The error I get is:
cmd/directories.go:43:20: Cluster is not a type
What am I doing wrong here?
The error is in the loop where you are calling AddItem function on Cluster method receiver which is not defined inside getClusters function. Define Cluster struct before for loop and then call the function c.AddItem as defined below:
func getClusters(searchDir string) Clusters {
fileList := make([]string, 0)
fileList = append(fileList, "f1", "f2", "f3")
ClusterData := []Cluster{}
c := Clusters{Cluster: ClusterData} // change the struct name passed to Clusters struct
for _, file := range fileList {
entry := Cluster{Name: "name" + file, Path: "path" + file}
c.AddItem(entry)
}
return c
}
you have defined the same struct name to Clusters struct that's why the error
cmd/directories.go:43:20: Cluster is not a type
Checkout working code on Go playground
In Golang Composite literal is defined as:
Composite literals construct values for structs, arrays, slices, and maps and create a new value each time they are evaluated. They
consist of the type of the literal followed by a brace-bound list of
elements. Each element may optionally be preceded by a corresponding
key.
Also Have a look on struct literals section defined in above link for Compositeliterals to get more description.
You need to define c before entering the loop in which you use it.
The Cluster is not a type error is due to using the same Cluster name as the type and the variable, try using a different variable name.
clusterArr := []Cluster{}
c := Clusters{clusterArr}
for _, file := range fileList {
....
}

how to access nested Json key values in Golang

Team,
new to Programming.
I have data available after unmarshaling the Json as shown below, which has nested Key values. flat key values I am able to access, how do I access nested key values.
Here is the byte slice data shown below after unmarshaling —>
tables:[map[name:basic__snatpool_members] map[name:net__snatpool_members] map[name:optimizations__hosts] map[columnNames:[name] name:pool__hosts rows:[map[row:[ry.hj.com]]]] traffic_group:/Common/traffic-group-1
Flat key values I am able to access by using the following code
p.TrafficGroup = m[“traffic_group”].(string)
here is the complete function
func dataToIapp(name string, d *schema.ResourceData) bigip.Iapp {
var p bigip.Iapp
var obj interface{}
jsonblob := []byte(d.Get("jsonfile").(string))
err := json.Unmarshal(jsonblob, &obj)
if err != nil {
fmt.Println("error", err)
}
m := obj.(map[string]interface{}) // Important: to access property
p.Name = m[“name”].(string)
p.Partition = m[“partition”].(string)
p.InheritedDevicegroup = m[“inherited_devicegroup”].(string)
}
Note: This may not work with your JSON structure. I inferred what it would be based on your question but without the actual structure, I cannot guarantee this to work without modification.
If you want to access them in a map, you need to assert that the interface pulled from the first map is actually a map. So you would need to do this:
tmp := m["tables"]
tables, ok := tmp.(map[string]string)
if !ok {
//error handling here
}
r.Name = tables["name"].(string)
But instead of accessing the unmarshaled JSON as a map[string]interface{}, why don't you create structs that match your JSON output?
type JSONRoot struct {
Name string `json:"name"`
Partition string `json:"partition"`
InheritedDevicegroup string `json:"inherited_devicegroup"`
Tables map[string]string `json:"tables"` //Ideally, this would be a map of structs
}
Then in your code:
func dataToIapp(name string, d *schema.ResourceData) bigip.Iapp {
var p bigip.Iapp
var obj &JSONRoot{}
jsonblob := []byte(d.Get("jsonfile").(string))
err := json.Unmarshal(jsonblob, &obj)
if err != nil {
fmt.Println("error", err)
}
p.Name = obj.Name
p.Partition = obj.Partition
p.InheritedDevicegroup = obj.InheritedDevicegroup
p.Name = obj.Tables["name"]
}
JSON objects are unmarshaled into map[string]interface{}, JSON arrays into []interface{}, same applies for nested objects/arrays.
So for example if a key/index maps to a nested object you need to type assert the value to map[string]interface{} and if the key/index maps to an array of objects you first need to assert the value to []interface{} and then each element to map[string]interface{}.
e.g. (for brevity this code is not guarding against panic)
tables := obj.(map[string]interface{})["tables"]
table1 := tables.([]interface{})[0]
name := table1.(map[string]interface{})["name"]
namestr := name.(string)
However, if it's the case that the json you are parsing is not dynamic but instead has a specific structure you should define a struct type that mirrors that structure and unmarshal the json into that.
All you have to do is repeatedly accessing the map via type-switching or assertion:
for _, table := range m["tables"] {
switch val := table {
case string:
fmt.Println("table is string")
case int:
fmt.Println("table is integer")
// This is your case, since JSON is unmarshaled to type []interface{} and map[string]interface{}
case []interface{}:
fmt.Println("table is a slice of interface{}")
for _, tb := range value {
if m, ok := tb.(map[string]interface{}); ok {
// Now it's accessible
fmt.Println(m["name"])
}
}
default:
fmt.Println("unknown type")
}
}
You might want to handle errors better than this.
To read more, check out my writing from a while ago https://medium.com/code-zen/dynamically-creating-instances-from-key-value-pair-map-and-json-in-go-feef83ab9db2.

Correct use of XML annotations, fields and structs in custom UnmarshalXML function

Consider the following struct:
type MyStruct struct {
Name string
Meta map[string]interface{}
}
Which has the following UnmarshalXML function:
func (m *MyStruct) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var v struct {
XMLName xml.Name //`xml:"myStruct"`
Name string `xml:"name"`
Meta struct {
Inner []byte `xml:",innerxml"`
} `xml:"meta"`
}
err := d.DecodeElement(&v, &start)
if err != nil {
return err
}
m.Name = v.Name
myMap := make(map[string]interface{})
// ... do the mxj magic here ... -
temp := v.Meta.Inner
prefix := "<meta>"
postfix := "</meta>"
str := prefix + string(temp) + postfix
//fmt.Println(str)
myMxjMap, err := mxj.NewMapXml([]byte(str))
myMap = myMxjMap
// fill myMap
//m.Meta = myMap
m.Meta = myMap["meta"].(map[string]interface{})
return nil
}
My problem with this code is these lines:
prefix := "<meta>"
postfix := "</meta>"
str := prefix + string(temp) + postfix
myMxjMap, err := mxj.NewMapXml([]byte(str))
myMap = myMxjMap
//m.Meta = myMap
m.Meta = myMap["meta"].(map[string]interface{})
My question is how I make the correct use of the xml annotations (,innerxml etc), fields and structs, so I don't have to manually pre-/append the <meta></meta> tags afterwards to get the whole Meta field as a single map.
The full code example is here: http://play.golang.org/p/Q4_tryubO6
xml package doesn't provide a way to unmarshal XML into map[string]interface{} because there is no single way to do it and in some cases it is not possible. A map doesn't preserve order of the elements (that is important in XML) and doesn't allow elements with duplicate keys.
mxj package that you used in your example has some rules how to unmarshal arbitrary XML into Go map. If your requirements do not conflict with these rules you can use mxj package to do all parsing and do not use xml package at all:
// I am skipping error handling here
m, _ := mxj.NewMapXml([]byte(s))
mm := m["myStruct"].(map[string]interface{})
myStruct.Name = mm["name"].(string)
myStruct.Meta = mm["meta"].(map[string]interface{})
Full example: http://play.golang.org/p/AcPUAS0QMj

Resources