Parallelism Problem on Cloud Dataflow using Go SDK - go

I have Apache Beam code implementation on Go SDK as described below. The pipeline has 3 steps. One is textio.Read, other one is CountLines and the last step is ProcessLines. ProcessLines step takes around 10 seconds time. I just added a Sleep function for the sake of brevity.
I am calling the pipeline with 20 workers. When I run the pipeline, my expectation was 20 workers would run in parallel and textio.Read read 20 lines from the file and ProcessLines would do 20 parallel executions in 10 seconds. However, the pipeline did not work like that. It's currently working in a way that textio.Read reads one line from the file, pushes the data to the next step and waits until ProcessLines step completes its 10 seconds work. There is no parallelism and there is only one line string from the file throughout the pipeline. Could you please clarify me what I'm doing wrong for parallelism? How should I update the code to achieve parallelism as described above?
package main
import (
"context"
"flag"
"time"
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
"github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
)
// metrics to be monitored
var (
input = flag.String("input", "", "Input file (required).")
numberOfLines = beam.NewCounter("extract", "numberOfLines")
lineLen = beam.NewDistribution("extract", "lineLenDistro")
)
func countLines(ctx context.Context, line string) string {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
return line
}
func processLines(ctx context.Context, line string) {
time.Sleep(10 * time.Second)
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection
{
s = s.Scope("Count Lines")
return beam.ParDo(s, countLines, lines)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
beam.ParDo0(s, processLines, lines)
}
func main() {
// If beamx or Go flags are used, flags must be parsed first.
flag.Parse()
// beam.Init() is an initialization hook that must be called on startup. On
// distributed runners, it is used to intercept control.
beam.Init()
// Input validation is done as usual. Note that it must be after Init().
if *input == "" {
log.Fatal(context.Background(), "No input file provided")
}
p := beam.NewPipeline()
s := p.Root()
l := textio.Read(s, *input)
lines := CountLines(s, l)
ProcessLines(s, lines)
// Concept #1: The beamx.Run convenience wrapper allows a number of
// pre-defined runners to be used via the --runner flag.
if err := beamx.Run(context.Background(), p); err != nil {
log.Fatalf(context.Background(), "Failed to execute job: %v", err.Error())
}
}
EDIT:
After I got the answer about the problem might be caused by fusion, I changed the related part of the code but it did not work again.
Now the first and second step is working in parallel, however the third step ProcessLines is not working in parallel. I only made the following changes. Can anyone tell me what the problem is?
func AddRandomKey(s beam.Scope, col beam.PCollection) beam.PCollection {
return beam.ParDo(s, addRandomKeyFn, col)
}
func addRandomKeyFn(elm beam.T) (int, beam.T) {
return rand.Int(), elm
}
func countLines(ctx context.Context, _ int, lines func(*string) bool, emit func(string)) {
var line string
for lines(&line) {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
emit(line)
}
}
func processLines(ctx context.Context, _ int, lines func(*string) bool) {
var line string
for lines(&line) {
time.Sleep(10 * time.Second)
numberOfLinesProcess.Inc(ctx, 1)
}
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection {
s = s.Scope("Count Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
return beam.ParDo(s, countLines, grouped)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
beam.ParDo0(s, processLines, grouped)
}

Many advanced runners of MapReduce-type pipelines fuse stages that can be run in memory together. Apache Beam and Dataflow are no exception.
What's happening here is that the three steps of your pipeline are fused, and happening in the same machine. Furthermore, the Go SDK does not currently support splitting the Read transform, unfortunately.
To achieve parallelism in the third transform, you can break the fusion between Read and ProcessLines. You can do that adding a random key to your lines, and a GroupByKey transform.
In Python, it would be:
(p | beam.ReadFromText(...)
| CountLines()
| beam.Map(lambda x: (random.randint(0, 1000), x))
| beam.GroupByKey()
| beam.FlatMap(lambda k, v: v) # Discard the key, and return the values
| ProcessLines())
This would allow you to parallelize ProcessLines.

Related

read csv into [][]byte

I experienced some weird behavior when I was reading a csv file into 2 dimensional byte slice. The first 42 rows are fine and after that it seems like extra line ending are put into the data which messes up things:
first row in the first 42 times:
row 0: 504921600000000000,truck_0,South,Trish,H-2,v2.3,1500,150,12,52.31854,4.72037,124,0,221,0,25
first row after I appended 43 rows:
row 0: 504921600000000000,truck_49,South,Andy,F-150,v2.0,2000,200,15,38.9349,179.94282,289,0,269,0
row 1: 25
minimal code to reproduce the problem:
package main
import (
"bufio"
"log"
"os"
)
type fileDataSource struct {
scanner *bufio.Scanner
}
type batch struct {
rows [][]byte
}
func (b *batch) Len() uint {
return uint(len(b.rows))
}
func (b *batch) Append(row []byte) {
b.rows = append(b.rows, row)
for index, row := range b.rows {
log.Printf("row %d: %s\n", index, string(row))
}
if len(b.rows) > 43 {
log.Fatalf("asdf")
}
}
type factory struct{}
func (f *factory) New() *batch {
return &batch{rows: make([][]byte, 0)}
}
func main() {
file, _ := os.Open("/tmp/data1.csv")
scanner := bufio.NewScanner(bufio.NewReaderSize(file, 4<<20))
b := batch{}
for scanner.Scan() {
b.Append(scanner.Bytes())
}
}
csv I used:
504921600000000000,truck_0,South,Trish,H-2,v2.3,1500,150,12,52.31854,4.72037,124,0,221,0,25
504921600000000000,truck_1,South,Albert,F-150,v1.5,2000,200,15,72.45258,68.83761,255,0,181,0,25
504921600000000000,truck_2,North,Derek,F-150,v1.5,2000,200,15,24.5208,28.09377,428,0,304,0,25
504921600000000000,truck_3,East,Albert,F-150,v2.0,2000,200,15,18.11037,98.65573,387,0,192,0,25
504921600000000000,truck_4,West,Andy,G-2000,v1.5,5000,300,19,81.93919,56.12266,236,0,335,0,25
504921600000000000,truck_5,East,Seth,F-150,v2.0,2000,200,15,5.00552,114.50557,89,0,187,0,25
504921600000000000,truck_6,East,Trish,G-2000,v1.0,5000,300,19,41.59689,57.90174,395,0,150,0,25
504921600000000000,truck_8,South,Seth,G-2000,v1.0,5000,300,19,21.89157,44.58919,411,0,232,0,25
504921600000000000,truck_9,South,Andy,H-2,v2.3,1500,150,12,15.67271,112.4023,402,0,75,0,25
504921600000000000,truck_10,North,Albert,F-150,v2.3,2000,200,15,35.05682,36.20513,359,0,68,0,25
504921600000000000,truck_7,East,Andy,H-2,v2.0,1500,150,12,7.74826,14.96075,105,0,323,0,25
504921600000000000,truck_11,South,Derek,F-150,v1.0,2000,200,15,87.9924,134.71544,293,0,133,0,25
504921600000000000,truck_14,North,Albert,H-2,v1.0,1500,150,12,66.68217,105.76965,222,0,252,0,25
504921600000000000,truck_18,West,Trish,F-150,v2.0,2000,200,15,67.15164,153.56165,252,0,240,0,25
504921600000000000,truck_20,North,Rodney,G-2000,v2.0,5000,300,19,38.88807,65.86698,104,0,44,0,25
504921600000000000,truck_21,East,Derek,G-2000,v2.0,5000,300,19,81.87812,167.8083,345,0,327,0,25
504921600000000000,truck_22,West,Albert,G-2000,v1.5,5000,300,19,39.9433,16.0241,449,0,42,0,25
504921600000000000,truck_23,South,Andy,F-150,v2.0,2000,200,15,73.28358,98.05159,198,0,276,0,25
504921600000000000,truck_24,West,Rodney,G-2000,v2.3,5000,300,19,22.19262,0.27462,223,0,318,0,25
504921600000000000,truck_25,North,Trish,F-150,v2.0,2000,200,15,17.26704,16.91226,461,0,183,0,25
504921600000000000,truck_26,South,Seth,F-150,v1.5,2000,200,15,45.65327,144.60354,58,0,182,0,25
504921600000000000,truck_12,East,Trish,G-2000,v1.0,5000,300,19,36.03928,113.87118,39,0,294,0,25
504921600000000000,truck_13,West,Derek,H-2,v1.0,1500,150,12,14.07479,110.77267,152,0,69,0,25
504921600000000000,truck_27,West,Seth,G-2000,v1.5,5000,300,19,79.55971,97.86182,252,0,345,0,25
504921600000000000,truck_28,West,Rodney,G-2000,v1.5,5000,300,19,60.33457,4.62029,74,0,199,0,25
504921600000000000,truck_16,South,Albert,G-2000,v1.5,5000,300,19,51.16438,121.32451,455,0,290,0,25
504921600000000000,truck_19,West,Derek,G-2000,v1.5,5000,300,19,19.69355,139.493,451,0,300,0,25
504921600000000000,truck_31,North,Albert,G-2000,v1.0,5000,300,19,0.75251,116.83474,455,0,49,0,25
504921600000000000,truck_32,West,Seth,F-150,v2.0,2000,200,15,4.07566,164.43909,297,0,277,0,25
504921600000000000,truck_33,West,Rodney,G-2000,v1.5,5000,300,19,89.19448,10.47499,407,0,169,0,25
504921600000000000,truck_34,West,Rodney,G-2000,v2.0,5000,300,19,73.7383,10.79582,488,0,170,0,25
504921600000000000,truck_35,West,Seth,G-2000,v2.3,5000,300,19,60.02428,2.51011,480,0,307,0,25
504921600000000000,truck_36,North,Andy,G-2000,v1.0,5000,300,19,87.52877,45.07308,161,0,128,0,25
504921600000000000,truck_38,West,Andy,H-2,v2.3,,150,12,63.54604,119.82031,282,0,325,0,25
504921600000000000,truck_39,East,Derek,G-2000,v1.5,5000,300,19,33.83548,3.90996,294,0,123,0,25
504921600000000000,truck_40,West,Albert,H-2,v2.0,1500,150,12,32.32773,118.43138,276,0,316,0,25
504921600000000000,truck_41,East,Rodney,F-150,v1.0,2000,200,15,68.85572,173.23123,478,0,207,0,25
504921600000000000,truck_42,West,Trish,F-150,v2.0,2000,200,15,38.45195,171.2884,113,0,180,0,25
504921600000000000,truck_43,East,Derek,H-2,v2.0,1500,150,12,52.90189,49.76966,295,0,195,0,25
504921600000000000,truck_44,South,Seth,H-2,v1.0,1500,150,12,32.33297,3.89306,396,0,320,0,25
504921600000000000,truck_30,East,Andy,G-2000,v1.5,5000,300,19,29.62198,83.73482,291,0,267,0,25
504921600000000000,truck_46,West,Seth,H-2,v2.3,1500,150,12,26.07966,118.49629,321,,267,0,25
504921600000000000,truck_37,South,Andy,G-2000,v2.0,5000,300,19,57.90077,77.20136,77,0,179,0,25
504921600000000000,truck_49,South,Andy,F-150,v2.0,2000,200,15,38.9349,179.94282,289,0,269,0,25
504921600000000000,truck_53,West,Seth,G-2000,v2.3,5000,300,19,25.02,157.45082,272,0,5,0,25
504921600000000000,truck_54,North,Andy,H-2,v2.0,1500,150,12,87.62736,106.0376,360,0,66,0,25
504921600000000000,truck_55,East,Albert,G-2000,v1.0,5000,300,19,78.56605,71.16225,295,0,150,0,25
504921600000000000,truck_56,North,Derek,F-150,v2.0,2000,200,15,23.51619,123.22682,71,0,209,0,25
504921600000000000,truck_57,South,Rodney,F-150,v2.3,2000,200,15,26.07996,159.92716,454,0,22,0,25
504921600000000000,truck_58,South,Derek,F-150,v2.0,2000,200,15,84.79333,79.23813,175,0,246,0,25
504921600000000000,truck_59,East,Andy,H-2,v2.0,1500,150,12,8.7621,82.48318,82,0,55,0,25
504921600000000000,truck_45,East,Trish,G-2000,v1.0,5000,300,19,17.48624,100.78121,306,0,193,0,25
504921600000000000,truck_47,South,Derek,G-2000,v1.5,5000,300,19,41.62173,110.80422,111,0,78,0,25
504921600000000000,truck_48,East,Trish,G-2000,v1.5,5000,300,19,63.90773,141.50555,53,0,,0,25
504921600000000000,truck_50,East,Andy,H-2,v2.3,1500,150,12,45.44111,172.39833,219,0,88,0,25
504921600000000000,truck_51,East,Rodney,F-150,v2.3,2000,200,15,89.03645,91.57675,457,0,337,0,25
504921600000000000,truck_52,West,Derek,G-2000,v1.0,5000,300,19,89.0133,97.8037,23,0,168,0,25
504921600000000000,truck_61,East,Albert,G-2000,v2.3,5000,300,19,75.91676,167.78366,462,0,60,0,25
504921600000000000,truck_62,East,Derek,H-2,v1.5,1500,150,12,54.61668,103.21398,231,0,143,0,25
504921600000000000,truck_63,South,Rodney,H-2,v2.0,1500,150,12,37.13702,149.25546,46,0,118,0,25
504921600000000000,truck_64,South,Albert,G-2000,v2.0,5000,300,19,45.04214,10.73002,447,0,253,0,25
504921600000000000,truck_60,South,Derek,H-2,v1.5,1500,150,12,57.99184,33.45994,310,0,93,0,25
504921600000000000,truck_67,South,Seth,H-2,v1.0,1500,150,12,4.62985,155.01707,308,0,22,0,25
504921600000000000,truck_68,West,Rodney,F-150,v1.5,2000,200,15,16.90741,123.03863,303,0,43,0,25
504921600000000000,truck_69,East,Derek,H-2,v2.3,1500,150,12,79.88424,120.79121,407,0,138,0,25
504921600000000000,truck_70,North,Albert,H-2,v2.0,1500,150,12,77.87592,164.70924,270,0,21,0,25
504921600000000000,truck_71,West,Seth,G-2000,v2.3,5000,300,19,72.75635,78.0365,391,0,32,0,25
504921600000000000,truck_73,North,Seth,F-150,v1.5,2000,200,15,37.67468,91.09732,489,0,103,0,25
504921600000000000,truck_74,North,Trish,H-2,v1.0,1500,150,12,41.4456,158.13897,206,0,79,0,25
504921600000000000,truck_75,South,Andy,F-150,v1.5,2000,200,15,4.11709,175.65994,378,0,176,0,25
504921600000000000,truck_66,South,Seth,G-2000,v2.0,5000,300,19,42.24286,151.8978,227,0,67,0,25
504921600000000000,truck_72,South,Andy,G-2000,v2.3,5000,300,19,82.46228,2.44504,487,0,39,0,25
504921600000000000,truck_76,South,Rodney,F-150,v2.3,2000,200,15,71.62798,121.89842,283,0,164,0,25
504921600000000000,truck_78,South,Seth,F-150,v2.0,2000,200,15,13.96218,39.04615,433,0,326,0,25
504921600000000000,truck_79,South,Andy,G-2000,v2.0,5000,300,19,56.54137,,46,0,127,0,25
504921600000000000,truck_81,West,Rodney,G-2000,v2.3,5000,300,19,59.42624,115.59744,68,0,296,0,25
504921600000000000,truck_83,South,Albert,F-150,v2.0,2000,200,15,49.20261,115.98262,449,0,132,0,25
504921600000000000,truck_84,West,Derek,H-2,v1.0,1500,150,12,70.16476,59.05399,301,0,134,0,25
504921600000000000,truck_85,West,Derek,G-2000,v1.0,5000,300,19,11.75251,142.86513,358,0,339,0,25
504921600000000000,truck_86,West,Rodney,G-2000,v1.0,5000,300,19,30.92821,127.53274,367,0,162,0,25
504921600000000000,truck_87,West,Rodney,H-2,v2.0,1500,150,12,32.86913,155.7666,122,0,337,0,25
504921600000000000,truck_88,West,Andy,G-2000,v1.5,5000,300,19,60.03367,9.5707,204,0,333,0,25
504921600000000000,truck_80,East,Andy,G-2000,v2.3,5000,300,,46.13937,137.42962,295,0,290,0,25
504921600000000000,truck_91,East,Derek,F-150,v2.0,2000,200,15,7.13401,52.78885,100,0,147,0,25
504921600000000000,truck_93,North,Derek,G-2000,v2.0,5000,300,19,11.46065,20.57173,242,0,148,0,25
504921600000000000,truck_94,North,Derek,F-150,v1.0,2000,200,15,59.53287,26.98247,427,0,341,0,25
504921600000000000,truck_95,East,Albert,G-2000,v2.0,5000,300,19,37.31513,134.40078,383,0,121,0,25
504921600000000000,truck_96,East,Albert,G-2000,v1.5,5000,300,19,15.78803,146.68255,348,0,189,0,25
504921600000000000,truck_97,South,Seth,F-150,v1.0,2000,200,15,14.08559,18.49763,369,0,34,0,25
504921600000000000,truck_98,South,Albert,G-2000,v1.5,5000,300,19,15.1474,71.85194,89,0,238,0,25
504921600000000000,truck_77,East,Trish,F-150,v2.0,2000,200,15,80.5734,17.68311,389,0,218,0,25
504921600000000000,truck_82,West,Derek,H-2,v2.0,1500,150,12,57.00976,90.13642,102,0,296,0,25
504921600000000000,truck_92,North,Derek,H-2,v1.0,1500,150,12,54.40335,153.5809,123,0,150,0,25
504921600000000000,truck_99,West,Trish,G-2000,v1.5,5000,300,19,62.73061,26.1884,309,0,202,0,25
504921610000000000,truck_1,South,Albert,F-150,v1.5,2000,200,15,72.45157,68.83919,259,0,180,2,27.5
504921610000000000,truck_2,North,Derek,F-150,v1.5,2000,200,15,24.5195,28.09369,434,6,302,0,22.1
504921610000000000,truck_3,East,Albert,F-150,v2.0,2000,200,15,18.107,98.66002,390,,190,0,21.2
504921610000000000,truck_4,West,Andy,G-2000,v1.5,5000,300,19,81.9438,56.12717,244,8,334,2,27.6
504921610000000000,truck_5,East,Seth,F-150,v2.0,2000,200,15,5.00695,114.50676,92,7,183,2,28.5
504921610000000000,truck_6,East,Trish,G-2000,v1.0,5000,300,19,41.59389,57.90166,403,0,149,0,22.7
504921610000000000,truck_7,East,Andy,H-2,v2.0,1500,150,12,7.74392,14.95756,,0,320,0,28.2
504921610000000000,truck_12,East,Trish,G-2000,v1.0,5000,300,19,36.03979,113.8752,34,0,293,1,26.3
504921610000000000,truck_13,West,Derek,H-2,v1.0,1500,150,12,14.07315,110.77235,150,0,72,,21.9
504921610000000000,truck_14,North,Albert,H-2,v1.0,1500,150,12,,105.76727,218,5,253,1,21.9
504921610000000000,truck_15,South,Albert,H-2,v1.5,1500,150,12,6.78254,166.86685,5,0,110,0,26.3
504921610000000000,truck_16,South,Albert,G-2000,v1.5,5000,300,19,51.16405,121.32556,445,0,294,3,29.9
504921610000000000,truck_17,West,Derek,H-2,v1.5,1500,150,12,8.12913,56.57343,9,0,6,4,29
504921610000000000,truck_18,West,Trish,F-150,v2.0,2000,200,15,67.15167,153.56094,260,1,239,1,23.3
504921610000000000,truck_19,West,Derek,G-2000,v1.5,5000,300,19,19.69456,139.49545,448,4,298,0,29.9
504921610000000000,truck_20,North,Rodney,G-2000,v2.0,5000,300,19,38.88968,65.86504,103,0,41,1,23.6
504921610000000000,truck_21,East,Derek,G-2000,v2.0,5000,300,19,81.88232,167.81287,345,0,326,0,20.8
504921610000000000,truck_0,South,Trish,H-2,v2.3,1500,150,12,52.32335,4.71786,128,9,225,0,25.8
504921610000000000,truck_22,West,Albert,G-2000,v1.5,5000,300,19,39.94345,16.02353,440,1,45,0,27.8
504921610000000000,truck_8,South,Seth,G-2000,v1.0,5000,300,19,21.89464,44.58628,402,0,234,0,20.3
504921610000000000,truck_23,South,Andy,F-150,v2.0,2000,200,15,73.28131,98.05635,201,7,277,0,25.3
504921610000000000,truck_24,West,Rodney,G-2000,v2.3,5000,300,19,22.19506,0.27702,217,0,321,2,29.5
504921610000000000,truck_9,South,Andy,H-2,v2.3,1500,150,12,,112.40429,402,9,75,4,29.5
504921610000000000,truck_26,South,Seth,F-150,v1.5,2000,200,15,45.65798,144.60844,59,1,183,0,21.7
504921610000000000,truck_27,West,Seth,G-2000,v1.5,5000,300,19,79.55699,97.86561,255,7,348,2,20.2
504921610000000000,truck_25,North,Trish,F-150,v2.0,2000,200,15,17.26506,16.91691,453,8,186,0,24.3
504921610000000000,truck_28,West,Rodney,G-2000,v1.5,5000,300,19,60.33272,4.61578,84,3,198,0,23.1
504921610000000000,truck_29,East,Rodney,G-2000,v2.0,5000,300,19,80.30331,146.54254,340,5,118,0,25.6
504921610000000000,truck_30,East,Andy,G-2000,v1.5,5000,300,19,29.62434,83.73246,300,0,270,4,22.3
504921610000000000,truck_33,West,Rodney,G-2000,v1.5,5000,300,19,89.19593,10.47733,403,8,170,0,29.6
504921610000000000,truck_36,North,Andy,G-2000,v1.0,5000,300,19,87.53087,45.07276,163,0,132,1,27.6
I expected the rows [][]byte contain the csv data row by row
As already suggested, you really should look to use encoding/csv.
That said, the reason for your issue is explained in the godoc above the Bytes() function:
// Bytes returns the most recent token generated by a call to Scan.
// The underlying array may point to data that will be overwritten
// by a subsequent call to Scan. It does no allocation.
func (s *Scanner) Bytes() []byte {
return s.token
}
So the returned byte slice may be modified by subsequent calls to Scan(). To avoid this, you'd need to make a copy of the byte slice, e.g.
for scanner.Scan() {
row := scanner.Bytes()
bs := make([]byte, len(row))
copy(bs, row)
b.Append(bs)
}
You need to create a copy of the data returned by Bytes.
https://pkg.go.dev/bufio#go1.19.3#Scanner.Bytes
Bytes returns the most recent token generated by a call to Scan. The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.
for scanner.Scan() {
row := make([]byte, len(scanner.Bytes()))
copy(row, scanner.Bytes())
b.Append(row)
}
https://go.dev/play/p/Lqot-wOXiwh

Is there a fast way to uniquely identify a function caller in Go?

As a background, I have a logging facility that wants to output filename and line number, as well as manage some other unique-per-caller information. In c++ this is relatively straightforward using static variables, but I'm having trouble coming up with an equivalent in Go.
I've come across runtime.Caller(), which will get me the PC of the caller (and thus uniquely identify it), but clocking in at ~500ns it's not a cost I want to pay on every invocation of every log statement!
As a really basic example of the behaviour I have
package main
import (
"fmt"
"runtime"
)
type Info struct {
file string
line int
// otherData
}
var infoMap = map[uintptr]Info{}
func Log(str string, args ...interface{}) {
pc, file, line, _ := runtime.Caller(1)
info, found := infoMap[pc]
if !found {
info = Info{file, line}
infoMap[pc] = info
// otherData would get generated here too
}
fmt.Println(info.file, info.line)
}
// client code
func foo() {
// do some work
Log("Here's a message with <> some <> args", "foo", "bar")
// do more work
Log("The cow jumped over the moon")
}
func main() {
foo()
foo()
}
This outputs
/tmp/foo/main.go 33
/tmp/foo/main.go 35
/tmp/foo/main.go 33
/tmp/foo/main.go 35
Now, runtime.Caller(1) is being evaluated on every call here. Ideally, it is evaluated once per statement.
Something like
func Log(str string, args ...interface{}) {
uniqueId = doSomethingFasterThanCaller()
info, found := infoMap[uniqueId]
if !found {
_, file, line, _ := runtime.Caller(1)
info = Info{file, line}
infoMap[pc] = info
// otherData would get generated here too
}
fmt.Println(info.file, info.line)
}
Or even something done from the caller that allows it to be uniquely identified without hardcoding ids.
func Log(uniqueId int, str string, args ...interface{}) {
info, found := infoMap[uniqueId]
if !found {
_, file, line, _ := runtime.Caller(1)
info = Info{file, line}
infoMap[uniqueId] = info
// otherData would get generated here too
}
fmt.Println(info.file, info.line)
}
func Foo() {
Log( uniqueIdGetter(), "message" )
Log( uniqueIdGetter(), "another message" )
// If I could get the offset from the beginning of the function
// to this line, something like this could work.
//
//Log((int)(reflect.ValueOf(Foo).Pointer()) + CODE_OFFSET, "message")
}
Is this something that can be done natively, or am I stuck with the cost of runtime.Caller (which does a bunch of extra work above-and-beyond just getting the pc, which is really all I need)?
Use a package-level variable to store a unique facility value for each function:
var fooFacility = &Facility{UniqeID: id, UsefulData: "the data"}
func foo() {
// do some work
fooFacility.Log( "Here's a message with <> some <> args", arg1, arg2 )
// do more work
fooFacility.Log( "The cow jumped over the moon" )
}
where Facility is something like:
type Facility struct {
UniqeID string
UsefulData string
once sync.Once
file string
line string
}
func (f *Facility) Log(fmt string, args ...interface{}) {
f.Once.Do(func() {
var ok bool
_, f.file, f.line, ok = runtime.Caller(2)
if !ok {
f.file = "???"
f.line = 0
}
})
// use file and line here
}
Use sync.Once to grab the file and line once.

Does go provide variable sanitization?

I am a beginner in Golang.
I have a problem with variable type assigning from user input.
When the user enters data like "2012BV352" I need to be able to ignore the BV and pass 2012352 to my next function.
There has a package name gopkg.in/validator.v2 in doc
But what it returns is whether or not the variable is safe or not.
I need to cut off the unusual things.
Any idea on how to achieve this?
You could write your own sanitizing methods and if it becomes something you'll be using more often, I'd package it out and add other methods to cover more use cases.
I provide two different ways to achieve the same result. One is commented out.
I haven't run any benchmarks so i couldn't tell you for certain which is more performant, but you could write your own tests if you wanted to figure it out. It would also expose another important aspect of Go and in my opinion one of it's more powerful tools... testing.
package main
import (
"fmt"
"log"
"regexp"
"strconv"
"strings"
)
// using a regex here which simply targets all digits and ignores everything else. I make it a global var and use MustCompile because the
// regex doesn't need to be created every time.
var extractInts = regexp.MustCompile(`\d+`)
func SanitizeStringToInt(input string) (int, error) {
m := extractInts.FindAllString(input, -1)
s := strings.Join(m, "")
return strconv.Atoi(s)
}
/*
// if you didn't want to use regex you could use a for loop
func SanitizeStringToInt(input string) (int, error) {
var s string
for _, r := range input {
if !unicode.IsLetter(r) {
s += string(r)
}
}
return strconv.Atoi(s)
}
*/
func main() {
a := "2012BV352"
n, err := SanitizeStringToInt(a)
if err != nil {
log.Fatal(err)
}
fmt.Println(n)
}

Compare Go stdout to file contents with testing package [duplicate]

I have a simple function I want to test:
func (t *Thing) print(min_verbosity int, message string) {
if t.verbosity >= minv {
fmt.Print(message)
}
}
But how can I test what the function actually sends to standard output? Test::Output does what I want in Perl. I know I could write all my own boilerplate to do the same in Go (as described here):
orig = os.Stdout
r,w,_ = os.Pipe()
thing.print("Some message")
var buf bytes.Buffer
io.Copy(&buf, r)
w.Close()
os.Stdout = orig
if(buf.String() != "Some message") {
t.Error("Failure!")
}
But that's a lot of extra work for every single test. I'm hoping there's a more standard way, or perhaps an abstraction library to handle this.
One thing to also remember, there's nothing stopping you from writing functions to avoid the boilerplate.
For example I have a command line app that uses log and I wrote this function:
func captureOutput(f func()) string {
var buf bytes.Buffer
log.SetOutput(&buf)
f()
log.SetOutput(os.Stderr)
return buf.String()
}
Then used it like this:
output := captureOutput(func() {
client.RemoveCertificate("www.example.com")
})
assert.Equal(t, "removed certificate www.example.com\n", output)
Using this assert library: http://godoc.org/github.com/stretchr/testify/assert.
You can do one of three things. The first is to use Examples.
The package also runs and verifies example code. Example functions may include a concluding line comment that begins with "Output:" and is compared with the standard output of the function when the tests are run. (The comparison ignores leading and trailing space.) These are examples of an example:
func ExampleHello() {
fmt.Println("hello")
// Output: hello
}
The second (and more appropriate, IMO) is to use fake functions for your IO. In your code you do:
var myPrint = fmt.Print
func (t *Thing) print(min_verbosity int, message string) {
if t.verbosity >= minv {
myPrint(message) // N.B.
}
}
And in your tests:
func init() {
myPrint = fakePrint // fakePrint records everything it's supposed to print.
}
func Test...
The third is to use fmt.Fprintf with an io.Writer that is os.Stdout in production code, but bytes.Buffer in tests.
You could consider adding a return statement to your function to return the string that is actually printed out.
func (t *Thing) print(min_verbosity int, message string) string {
if t.verbosity >= minv {
fmt.Print(message)
return message
}
return ""
}
Now, your test could just check the returned string against an expected string (rather than the print out). Maybe a bit more in-line with Test Driven Development (TDD).
And, in your production code, nothing would need to change, since you don't have to assign the return value of a function if you don't need it.

Golang - Why does this function change the input argument?

I'm learning Golang and I come from PHP background. I have a bit of trouble understanding some of the core functionalities at times.
Specifically, right now I'm building a Hearths game and I've created a CardStack type, that has some convenient methods one might use in a card stack (read: player's hand, discard pile...) such as DrawCards(...), AppendCards(...)...
The problem I have is that the function func (c* CardStack) DrawCards(cards []deck.Card) ([]deck.Card, error) {...} changes the argument cards []deck.Card and I cannot figure out why or how to avoid this.
This is my CardStack:
type CardStack struct {
cards []deck.Card
}
This is my DrawCards method:
func (c *CardStack) DrawCards(cards []deck.Card) ([]deck.Card, error) {
return c.getCardsSlice(cards, true)
}
// Returns cards that are missing
func (c *CardStack) getCardsSlice(cards []deck.Card, rm bool) ([]deck.Card, error) {
var err error
var returnc = []deck.Card{}
for _, card := range cards {
fmt.Println("BEFORE c.findCard(cards): ")
deck.PrintCards(cards) // In my example this will print out {Kc, 8d}, which is what I expect it to be
_, err = c.findCard(card, rm) // AFTER THIS LINE THE cards VAR IS CHANGED
fmt.Println("AFTER c.findCard(cards): ")
deck.PrintCards(cards) // In my example this will print out {8d, 8d}, which is not at all what I expected
if err != nil {
return returnc, err
}
}
return returnc, nil
}
// Expects string like "Ts" or "2h" (1. face 2. suit)
func (c *CardStack) findCard(cc deck.Card, rm bool) (deck.Card, error) {
for i, card := range c.GetCards() {
if cc == card {
return c.cardByIndex(i, rm)
}
}
return deck.Card{}, fmt.Errorf("Card not found")
}
func (c *CardStack) cardByIndex(n int, rm bool) (deck.Card, error) {
if n > len(c.GetCards()) {
return deck.Card{}, fmt.Errorf("Index out of bounds")
}
card := c.GetCards()[n]
if rm {
c.SetCards(append(c.GetCards()[:n], c.GetCards()[n+1:]...))
}
return card, nil
}
To explain a bit more - specifically the findCard(...) method that gets called in getCardsSlice messes with the original value (I've added comments to indicate where it happens).
If it's of any help, this is part of my main() method that I use for debugging:
// ...
ss, _ := cards.SubStack(1, 3) // ss now holds {Kc, 8d}
ss.Print() // Prints {Kc, 8d}
cards.Print() // Prints {5c, Kc, 8d} (assigned somewhere up in the code)
cards.DrawCards(ss) // Draws {Kc, 8d} from {5c, Kc, 8d}
cards.Print() // Prints {5c} - as expected
ss.Print() // Prints {8d, 8d} - ???
What am I doing wrong and how should I go about doing this.
Any kind of help is appreciated.
Edit:
The whole CardStack file: http://pastebin.com/LmhryfGc
Edit2:
I was going to put it on github sooner or later (was hoping after the code looks semi-ok), here it is - https://github.com/d1am0nd/hearths-go/tree/cardstack/redo
In your example, the value of cards in DrawCards is a sub-slice of the CardsStack.cards slice, which is referencing values in the same backing array.
When you call findCard and remove a card from the CardStack.cards slice, you are manipulating the same array that the cards argument is using.
When you want a copy of a slice, you need to allocate a new slice and copy each element. To do this in your example, you could:
ssCopy := make([]deck.Card, len(ss))
copy(ssCopy, ss)
cards.DrawCards(ssCopy)

Resources