I'am a Python developer but supposed to make a Dataflow pipeline using Go.
I couldn't find as many examples for Apache Beam using Go as compared to Python or Java.
I have the below code which has a structure of user name and age. The task is to increment the age and then filter on Age. I found the way to increment the age but stuck on the filtering part.
package main
import (
"context"
"flag"
"fmt"
"github.com/apache/beam/sdks/v2/go/pkg/beam"
"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
"github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
)
func init() {
beam.RegisterFunction(incrementAge)
}
type user struct {
Name string
Age int
}
func printRow(ctx context.Context, list user) {
fmt.Println(list)
}
func incrementAge(list user) user {
list.Age++
return list
}
func main() {
flag.Parse()
beam.Init()
ctx := context.Background()
p := beam.NewPipeline()
s := p.Root()
var userList = []user{
{"Bob", 40},
{"Adam", 50},
{"John", 35},
{"Ben", 8},
}
initial := beam.CreateList(s, userList)
pc := beam.ParDo(s, incrementAge, initial)
pc1 := beam.ParDo(s, func(row user, emit func(user)) {
emit(row)
}, pc)
beam.ParDo0(s, printRow, pc1)
if err := beamx.Run(ctx, p); err != nil {
log.Exitf(ctx, "Failed to execute job: %v", err)
}
}
I tried creating a function like below but this returns a bool and not a user object. I know I'am missing out on something simple but unable to figure out.
func filterAge(list user) user {
return list.Age > 40
}
In Python I could write function like below.
beam.Filter(lambda line: line["Age"] >= 40))
You need to add an emitter in the function to emit user:
func filterAge(list user, emit func(user)) {
if list.Age > 40 {
emit(list)
}
}
As written in your current code, return list.Age > 40
list.Age > 40 evaluates to true first (boolean) and this boolean is getting returned.
Related
I'm trying to use the Go Beam Sdk to create a pipeline processing pubsub messages.
github.com/apache/beam/sdks/v2/go/pkg/beam
I understand that the pubsubio connector is doing external calls working only on dataflow runner.
What if I want to test my pipeline locally ? How would you do that ?
I need to understand what is preventing me to write my own pubsub unbounded source ? (I may not understand how Beam works under the hood, like how does it serialize user defined code to send it to the runner ?)
Tried to do something like that:
package pubsubio
import (
"context"
"fmt"
cloud_pubsub "cloud.google.com/go/pubsub"
"github.com/apache/beam/sdks/v2/go/pkg/beam"
"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
"github.com/apache/beam/sdks/v2/go/pkg/beam/register"
"github.com/apache/beam/sdks/v2/go/pkg/beam/util/pubsubx"
)
func init() {
register.DoFn3x1[context.Context, string, func(*cloud_pubsub.Message), error](&readFn{})
register.Emitter1[*cloud_pubsub.Message]()
}
type ReadConfig struct {
ProjectID string
TopicName string
SubscriptionName string
}
func Read(
scope beam.Scope,
cfg ReadConfig,
) beam.PCollection {
scope = scope.Scope("pubsubio.Read")
col := beam.Create(scope, cfg.SubscriptionName)
return beam.ParDo(scope, newReadFn(cfg.ProjectID, cfg.TopicName), col)
}
type readFn struct {
pubsubFn
TopicName string
}
func newReadFn(projectID, topicName string) *readFn {
return &readFn{
pubsubFn: pubsubFn{
ProjectID: projectID,
},
TopicName: topicName,
}
}
func (fn *readFn) ProcessElement(
ctx context.Context,
subscriptionName string,
emit func(message *cloud_pubsub.Message),
) error {
log.Info(ctx, "[pubsubio.ProcessElement] Reading from pubsub")
_, err := pubsubx.EnsureTopic(ctx, fn.client, fn.TopicName)
if err != nil {
return fmt.Errorf("cannot get topic: %w", err)
}
sub, err := pubsubx.EnsureSubscription(ctx, fn.client, fn.TopicName, subscriptionName)
if err != nil {
return fmt.Errorf("cannot get subscription: %w", err)
}
return sub.Receive(ctx, func(ctx context.Context, message *cloud_pubsub.Message) {
emit(message)
log.Debugf(ctx, "[pubsubio.ProcessElement] Emit msg: %s", message.ID)
message.Ack()
})
}
So basically I created a Read fn that never return, but the rest of my pipeline is never triggered (I must miss something)
Is there a way to list out all functions that uses/returns a specific type?
For example: I'm interested to use the following function.
func ListenAndServe(addr string, handler Handler) error
How can I find out all functions (across all Go packages) that can return a Handler?
I'd write an analysis tool using the x/tools/go/analysis framework. Here's a rough sketch that you can run on any module (it uses go/packages underneath so it fully supports modules):
import (
"bytes"
"fmt"
"go/ast"
"go/format"
"go/token"
"golang.org/x/tools/go/analysis"
"golang.org/x/tools/go/analysis/singlechecker"
)
var RtAnalysis = &analysis.Analyzer{
Name: "rtanalysis",
Doc: "finds functions by return type",
Run: run,
}
func main() {
singlechecker.Main(RtAnalysis)
}
func run(pass *analysis.Pass) (interface{}, error) {
for _, file := range pass.Files {
ast.Inspect(file, func(n ast.Node) bool {
if funcTy, ok := n.(*ast.FuncType); ok {
if funcTy.Results != nil {
for _, fl := range funcTy.Results.List {
if tv, ok := pass.TypesInfo.Types[fl.Type]; ok {
if tv.Type.String() == "net/http.Handler" {
ns := nodeString(funcTy, pass.Fset)
fmt.Printf("%s has return of type net/http.Handler\n", ns)
}
}
}
}
}
return true
})
}
return nil, nil
}
// nodeString formats a syntax tree in the style of gofmt.
func nodeString(n ast.Node, fset *token.FileSet) string {
var buf bytes.Buffer
format.Node(&buf, fset, n)
return buf.String()
}
Just playing with aws sdk for go. When listing resources of different types I tend to have alot of very similar functions like the two in the example bellow.
Is there a way to rewrite them as one generic function that will return a specific type depending on what is passed on as param?
Something like:
func generic(session, funcToCall, t, input) (interface{}, error) {}
currently I have to do this (functionality is the same just types change):
func getVolumes(s *session.Session) ([]*ec2.Volume, error) {
client := ec2.New(s)
t := []*ec2.Volume{}
input := ec2.DescribeVolumesInput{}
for {
result, err := client.DescribeVolumes(&input)
if err != nil {
return nil, err
}
t = append(t, result.Volumes...)
if result.NextToken != nil {
input.NextToken = result.NextToken
} else {
break
}
}
return t, nil
}
func getVpcs(s *session.Session) ([]*ec2.Vpc, error) {
client := ec2.New(s)
t := []*ec2.Vpc{}
input := ec2.DescribeVpcsInput{}
for {
result, err := client.DescribeVpcs(&input)
if err != nil {
return nil, err
}
t = append(t, result.Vpcs...)
if result.NextToken != nil {
input.NextToken = result.NextToken
} else {
break
}
}
return t, nil
}
Because you only deal with functions it is possible to use the reflect package to generate functions at runtime.
Using the object type (Volume, Vpc) it is possible to derive all subsequents information to provide a fully generic implementation that is really dry, at the extent at the being more complex and slower.
It is untested, you are welcome to help in testing and fixing it, but something like this should put you on the track
https://play.golang.org/p/mGjtYVG2OZS
The registry idea come from this answer https://stackoverflow.com/a/23031445/4466350
for reference the golang documentation of the reflect package is at https://golang.org/pkg/reflect/
package main
import (
"errors"
"fmt"
"reflect"
)
func main() {
fmt.Printf("%T\n", getter(Volume{}))
fmt.Printf("%T\n", getter(Vpc{}))
}
type DescribeVolumesInput struct{}
type DescribeVpcs struct{}
type Volume struct{}
type Vpc struct{}
type Session struct{}
type Client struct{}
func New(s *Session) Client { return Client{} }
var typeRegistry = make(map[string]reflect.Type)
func init() {
some := []interface{}{DescribeVolumesInput{}, DescribeVpcs{}}
for _, v := range some {
typeRegistry[fmt.Sprintf("%T", v)] = reflect.TypeOf(v)
}
}
var errV = errors.New("")
var errType = reflect.ValueOf(&errV).Elem().Type()
var zeroErr = reflect.Zero(reflect.TypeOf((*error)(nil)).Elem())
var nilErr = []reflect.Value{zeroErr}
func getter(of interface{}) interface{} {
outType := reflect.SliceOf(reflect.PtrTo(reflect.TypeOf(of)))
fnType := reflect.FuncOf([]reflect.Type{reflect.TypeOf(new(Session))}, []reflect.Type{outType, errType}, false)
fnBody := func(input []reflect.Value) []reflect.Value {
client := reflect.ValueOf(New).Call(input)[0]
t := reflect.MakeSlice(outType, 0, 0)
name := fmt.Sprintf("Describe%TsInput", of)
descInput := reflect.New(typeRegistry[name]).Elem()
mName := fmt.Sprintf("Describe%Ts", of)
meth := client.MethodByName(mName)
if !meth.IsValid() {
return []reflect.Value{
t,
reflect.ValueOf(fmt.Errorf("no such method %q", mName)),
}
}
for {
out := meth.Call([]reflect.Value{descInput.Addr()})
if len(out) > 0 {
errOut := out[len(out)-1]
if errOut.Type().Implements(errType) && errOut.IsNil() == false {
return []reflect.Value{t, errOut}
}
}
result := out[1]
fName := fmt.Sprintf("%Ts", of)
if x := result.FieldByName(fName); x.IsValid() {
t = reflect.AppendSlice(t, x)
} else {
return []reflect.Value{
t,
reflect.ValueOf(fmt.Errorf("field not found %q", fName)),
}
}
if x := result.FieldByName("NextToken"); x.IsValid() {
descInput.FieldByName("NextToken").Set(x)
} else {
break
}
}
return []reflect.Value{t, zeroErr}
}
fn := reflect.MakeFunc(fnType, fnBody)
return fn.Interface()
}
Proxying 3rd party API, is quite simple to implement with
go, here is how' it got implemented with endly e2e test runner AWS proxy
I would say that AWS API is perfect candidate for proxying, as long as reflection performance price is not an issue.
Some other 3rd party API like kubernetes
are much more challenging, but still quite easy to proxy with go, which is a combination of reflection and code generation:
I'm trying to set-up an AWS-lambda using aws-sdk-go that is triggered whenever a new user is added to a certain dynamodb table.
Everything is working just fine but I can't find a way to unmarshal a map map[string]DynamoDBAttributeValue like:
{
"name": {
"S" : "John"
},
"residence_address": {
"M": {
"address": {
"S": "some place"
}
}
}
}
To a given struct, for instance, a User struct. Here is shown an example of unsmarhaling a map[string]*dynamodb.AttributeValue into a given interface, but I can't find a way to do the same thing with map[string]DynamoDBAttributeValue even though these types seem to fit the same purposes.
map[string]DynamoDBAttributeValue is returned by a events.DynamoDBEvents from package github.com/aws/aws-lambda-go/events. This is my code:
package handler
import (
"context"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
"github.com/aws/aws-sdk-go/service/dynamodb"
)
func HandleDynamoDBRequest(ctx context.Context, e events.DynamoDBEvent) {
for _, record := range e.Records {
if record.EventName == "INSERT" {
// User Struct
var dynamoUser model.DynamoDBUser
// Of course this can't be done for incompatible types
_ := dynamodbattribute.UnmarshalMap(record.Change.NewImage, &dynamoUser)
}
}
}
Of course, I can marshal record.Change.NewImage to JSON and unmarshal it back to a given struct, but then, I would have to manually initialize dynamoUser attributes starting from the latter ones.
Or I could even write a function that parses map[string]DynamoDBAttributeValue to map[string]*dynamodb.AttributeValue like:
func getAttributeValueMapFromDynamoDBStreamRecord(e events.DynamoDBStreamRecord) map[string]*dynamodb.AttributeValue {
image := e.NewImage
m := make(map[string]*dynamodb.AttributeValue)
for k, v := range image {
if v.DataType() == events.DataTypeString {
s := v.String()
m[k] = &dynamodb.AttributeValue{
S : &s,
}
}
if v.DataType() == events.DataTypeBoolean {
b := v.Boolean()
m[k] = &dynamodb.AttributeValue{
BOOL : &b,
}
}
// . . .
if v.DataType() == events.DataTypeMap {
// ?
}
}
return m
}
And then simply use dynamodbattribute.UnmarshalMap, but on events.DataTypeMap it would be quite a tricky process.
Is there a way through which I can unmarshal a DynamoDB record coming from a events.DynamoDBEvent into a struct with a similar method shown for map[string]*dynamodb.AttributeValue?
I tried the function you provided, and I met some problems with events.DataTypeList, so I managed to write the following function that does the trick:
// UnmarshalStreamImage converts events.DynamoDBAttributeValue to struct
func UnmarshalStreamImage(attribute map[string]events.DynamoDBAttributeValue, out interface{}) error {
dbAttrMap := make(map[string]*dynamodb.AttributeValue)
for k, v := range attribute {
var dbAttr dynamodb.AttributeValue
bytes, marshalErr := v.MarshalJSON(); if marshalErr != nil {
return marshalErr
}
json.Unmarshal(bytes, &dbAttr)
dbAttrMap[k] = &dbAttr
}
return dynamodbattribute.UnmarshalMap(dbAttrMap, out)
}
I was frustrated that the type of NewImage from the record wasn't map[string]*dynamodb.AttributeValue so I could use the dynamodbattribute package.
The JSON representation of events.DynamoDBAttributeValue seems to be the same as the JSON represenation of dynamodb.AttributeValue.
So I tried creating my own DynamoDBEvent type and changed the type of OldImage and NewImage, so it would be marshalled into map[string]*dynamodb.AttributeValue instead of map[string]events.DynamoDBAttributeValue
It is a little bit ugly but it works for me.
package main
import (
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/service/dynamodb"
"github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
"fmt"
)
func main() {
lambda.Start(lambdaHandler)
}
// changed type of event from: events.DynamoDBEvent to DynamoDBEvent (see below)
func lambdaHandler(event DynamoDBEvent) error {
for _, record := range event.Records {
change := record.Change
newImage := change.NewImage // now of type: map[string]*dynamodb.AttributeValue
var item IdOnly
err := dynamodbattribute.UnmarshalMap(newImage, &item)
if err != nil {
return err
}
fmt.Println(item.Id)
}
return nil
}
type IdOnly struct {
Id string `json:"id"`
}
type DynamoDBEvent struct {
Records []DynamoDBEventRecord `json:"Records"`
}
type DynamoDBEventRecord struct {
AWSRegion string `json:"awsRegion"`
Change DynamoDBStreamRecord `json:"dynamodb"`
EventID string `json:"eventID"`
EventName string `json:"eventName"`
EventSource string `json:"eventSource"`
EventVersion string `json:"eventVersion"`
EventSourceArn string `json:"eventSourceARN"`
UserIdentity *events.DynamoDBUserIdentity `json:"userIdentity,omitempty"`
}
type DynamoDBStreamRecord struct {
ApproximateCreationDateTime events.SecondsEpochTime `json:"ApproximateCreationDateTime,omitempty"`
// changed to map[string]*dynamodb.AttributeValue
Keys map[string]*dynamodb.AttributeValue `json:"Keys,omitempty"`
// changed to map[string]*dynamodb.AttributeValue
NewImage map[string]*dynamodb.AttributeValue `json:"NewImage,omitempty"`
// changed to map[string]*dynamodb.AttributeValue
OldImage map[string]*dynamodb.AttributeValue `json:"OldImage,omitempty"`
SequenceNumber string `json:"SequenceNumber"`
SizeBytes int64 `json:"SizeBytes"`
StreamViewType string `json:"StreamViewType"`
}
I have found the same problem and the solution is to perform a simple conversion of types. This is possible because in the end the type received by lambda events events.DynamoDBAttributeValue and the type used by the SDK V2 of AWS DynamoDB types.AttributeValue are the same. Next I show you the conversion code.
package aws_lambda
import (
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue"
"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)
func UnmarshalDynamoEventsMap(
record map[string]events.DynamoDBAttributeValue, out interface{}) error {
asTypesMap := DynamoDbEventsMapToTypesMap(record)
err := attributevalue.UnmarshalMap(asTypesMap, out)
if err != nil {
return err
}
return nil
}
func DynamoDbEventsMapToTypesMap(
record map[string]events.DynamoDBAttributeValue) map[string]types.AttributeValue {
resultMap := make(map[string]types.AttributeValue)
for key, rec := range record {
resultMap[key] = DynamoDbEventsToTypes(rec)
}
return resultMap
}
// DynamoDbEventsToTypes relates the dynamo event received by AWS Lambda with the data type that is
// used in the Amazon SDK V2 to deal with DynamoDB data.
// This function is necessary because Amazon does not provide any kind of solution to make this
// relationship between types of data.
func DynamoDbEventsToTypes(record events.DynamoDBAttributeValue) types.AttributeValue {
var val types.AttributeValue
switch record.DataType() {
case events.DataTypeBinary:
val = &types.AttributeValueMemberB{
Value: record.Binary(),
}
case events.DataTypeBinarySet:
val = &types.AttributeValueMemberBS{
Value: record.BinarySet(),
}
case events.DataTypeBoolean:
val = &types.AttributeValueMemberBOOL{
Value: record.Boolean(),
}
case events.DataTypeList:
var items []types.AttributeValue
for _, value := range record.List() {
items = append(items, DynamoDbEventsToTypes(value))
}
val = &types.AttributeValueMemberL{
Value: items,
}
case events.DataTypeMap:
items := make(map[string]types.AttributeValue)
for k, v := range record.Map() {
items[k] = DynamoDbEventsToTypes(v)
}
val = &types.AttributeValueMemberM{
Value: items,
}
case events.DataTypeNull:
val = nil
case events.DataTypeNumber:
val = &types.AttributeValueMemberN{
Value: record.Number(),
}
case events.DataTypeNumberSet:
val = &types.AttributeValueMemberNS{
Value: record.NumberSet(),
}
case events.DataTypeString:
val = &types.AttributeValueMemberS{
Value: record.String(),
}
case events.DataTypeStringSet:
val = &types.AttributeValueMemberSS{
Value: record.StringSet(),
}
}
return val
}
There is a package that allows conversion from events.DynamoDBAttributeValue to dynamodb.AttributeValue
https://pkg.go.dev/github.com/aereal/go-dynamodb-attribute-conversions/v2
From there one can unmarshal AttributeValue into struct
func Unmarshal(attribute map[string]events.DynamoDBAttributeValue, out interface{}) error {
av := ddbconversions.AttributeValueMapFrom(attribute)
return attributevalue.UnmarshalMap(av, out)
}
In OS package there is a FindProcess() that you can pass in the ID of the process to get a process. You can then call kill on the process but is there a way to find a process based off of the name? (In windows)
For example i would like to be able to do something like this.
p, perr := os.FindProcessByName("Itunes")
if perr != nil {
fmt.Println(perr)
}
p.Kill()
I only need this to work on Windows.
It's not pretty, but you can use the w32 binding: (github.com/AllenDang/w32)
package main
import (
"fmt"
"github.com/AllenDang/w32"
"unsafe"
)
func GetProcessName(id uint32) string {
snapshot := w32.CreateToolhelp32Snapshot(w32.TH32CS_SNAPMODULE, id)
if snapshot == w32.ERROR_INVALID_HANDLE {
return "<UNKNOWN>"
}
defer w32.CloseHandle(snapshot)
var me w32.MODULEENTRY32
me.Size = uint32(unsafe.Sizeof(me))
if w32.Module32First(snapshot, &me) {
return w32.UTF16PtrToString(&me.SzModule[0])
}
return "<UNKNOWN>"
}
func ListProcesses() []uint32 {
sz := uint32(1000)
procs := make([]uint32, sz)
var bytesReturned uint32
if w32.EnumProcesses(procs, sz, &bytesReturned) {
return procs[:int(bytesReturned)/4]
}
return []uint32{}
}
func FindProcessByName(name string) (uint32, error) {
for _, pid := range ListProcesses() {
if GetProcessName(pid) == name {
return pid, nil
}
}
return 0, fmt.Errorf("unknown process")
}
func main() {
fmt.Println(FindProcessByName("chrome.exe"))
}