I have written following sample program using sync.RWMutex.
package main
import (
"fmt"
"sync"
"time"
)
// SessionData : capture session id and cc-request-number
type SessionData struct {
id string
reqNo string
}
// SessionCache : cache for the SessionData
type SessionCache struct {
sess map[SessionData]bool
sync.RWMutex
}
// InitSessionCache : Init for SessionCache
func InitSessionCache() SessionCache {
return SessionCache{sess: make(map[SessionData]bool)}
}
// Read : read value from session cache
func (s *SessionCache) Read(sd SessionData) bool {
s.RLock()
defer s.RUnlock()
_, found := s.sess[sd]
return found
}
func (s *SessionCache) Write(sd SessionData) {
s.Lock()
defer s.Unlock()
fmt.Println("Entry not found for ", sd.id, sd.reqNo, "Creating the entry now")
s.sess[sd] = true
}
func (s *SessionCache) chkDuplicate(sessionID string, Reqno string) bool {
sd := SessionData{
id: sessionID,
reqNo: Reqno,
}
found := s.Read(sd)
if !found {
s.Write(sd)
return found
}
return found
}
func main() {
mySessionData := InitSessionCache()
for i := 0; i < 10; i++ {
go mySessionData.chkDuplicate("session1", "1")
go mySessionData.chkDuplicate("session1", "1")
go mySessionData.chkDuplicate("session1", "2")
go mySessionData.chkDuplicate("session1", "2")
go mySessionData.chkDuplicate("session1", "4")
go mySessionData.chkDuplicate("session1", "2")
}
time.Sleep(300)
fmt.Println(mySessionData)
}
when I run this program in playground https://play.golang.org/p/g93UtVxZ2dl
I see that it is working correctly as the write happens only 3 times for the unique values.
Entry not found for session1 1 Creating the entry now
Entry not found for session1 2 Creating the entry now
Entry not found for session1 4 Creating the entry now
{map[{session1 1}:true {session1 2}:true {session1 4}:true] {{0 0} 0 0 0 0}}
however when I run the same program from my windows 10 machine (on VS Code) I see following output.
Entry not found for session1 1 Creating the entry now
Entry not found for session1 2 Creating the entry now
Entry not found for session1 2 Creating the entry now
Entry not found for session1 2 Creating the entry now
Entry not found for session1 4 Creating the entry now
{map[{session1 1}:true {session1 2}:true {session1 4}:true] {{0 0} 0 0 0 0}}
Am I doing something wrong?
Why does this behaves differently on my Machine and Playground?
There is no syncronisation between the call to Read and Write. All your goroutines are running concurrently, imagine if they all run up to this line and then yield to another goroutine:
found := s.Read(sd)
They will all return false because none of the goroutines have moved past this point. Now they all move on to the next line and believe that found == false, so all perform the s.Write(sd).
You need to perform the Read and Write without unlocking. Maybe something like:
func (s *SessionCache) TryWrite(sd SessionData) err {
s.Lock()
defer s.Unlock()
if _, found := s.sess[sd]; found {
return fmt.Errorf("Entry already exists")
}
s.sess[sd] = true
}
Related
I have Apache Beam code implementation on Go SDK as described below. The pipeline has 3 steps. One is textio.Read, other one is CountLines and the last step is ProcessLines. ProcessLines step takes around 10 seconds time. I just added a Sleep function for the sake of brevity.
I am calling the pipeline with 20 workers. When I run the pipeline, my expectation was 20 workers would run in parallel and textio.Read read 20 lines from the file and ProcessLines would do 20 parallel executions in 10 seconds. However, the pipeline did not work like that. It's currently working in a way that textio.Read reads one line from the file, pushes the data to the next step and waits until ProcessLines step completes its 10 seconds work. There is no parallelism and there is only one line string from the file throughout the pipeline. Could you please clarify me what I'm doing wrong for parallelism? How should I update the code to achieve parallelism as described above?
package main
import (
"context"
"flag"
"time"
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
"github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
)
// metrics to be monitored
var (
input = flag.String("input", "", "Input file (required).")
numberOfLines = beam.NewCounter("extract", "numberOfLines")
lineLen = beam.NewDistribution("extract", "lineLenDistro")
)
func countLines(ctx context.Context, line string) string {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
return line
}
func processLines(ctx context.Context, line string) {
time.Sleep(10 * time.Second)
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection
{
s = s.Scope("Count Lines")
return beam.ParDo(s, countLines, lines)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
beam.ParDo0(s, processLines, lines)
}
func main() {
// If beamx or Go flags are used, flags must be parsed first.
flag.Parse()
// beam.Init() is an initialization hook that must be called on startup. On
// distributed runners, it is used to intercept control.
beam.Init()
// Input validation is done as usual. Note that it must be after Init().
if *input == "" {
log.Fatal(context.Background(), "No input file provided")
}
p := beam.NewPipeline()
s := p.Root()
l := textio.Read(s, *input)
lines := CountLines(s, l)
ProcessLines(s, lines)
// Concept #1: The beamx.Run convenience wrapper allows a number of
// pre-defined runners to be used via the --runner flag.
if err := beamx.Run(context.Background(), p); err != nil {
log.Fatalf(context.Background(), "Failed to execute job: %v", err.Error())
}
}
EDIT:
After I got the answer about the problem might be caused by fusion, I changed the related part of the code but it did not work again.
Now the first and second step is working in parallel, however the third step ProcessLines is not working in parallel. I only made the following changes. Can anyone tell me what the problem is?
func AddRandomKey(s beam.Scope, col beam.PCollection) beam.PCollection {
return beam.ParDo(s, addRandomKeyFn, col)
}
func addRandomKeyFn(elm beam.T) (int, beam.T) {
return rand.Int(), elm
}
func countLines(ctx context.Context, _ int, lines func(*string) bool, emit func(string)) {
var line string
for lines(&line) {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
emit(line)
}
}
func processLines(ctx context.Context, _ int, lines func(*string) bool) {
var line string
for lines(&line) {
time.Sleep(10 * time.Second)
numberOfLinesProcess.Inc(ctx, 1)
}
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection {
s = s.Scope("Count Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
return beam.ParDo(s, countLines, grouped)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
beam.ParDo0(s, processLines, grouped)
}
Many advanced runners of MapReduce-type pipelines fuse stages that can be run in memory together. Apache Beam and Dataflow are no exception.
What's happening here is that the three steps of your pipeline are fused, and happening in the same machine. Furthermore, the Go SDK does not currently support splitting the Read transform, unfortunately.
To achieve parallelism in the third transform, you can break the fusion between Read and ProcessLines. You can do that adding a random key to your lines, and a GroupByKey transform.
In Python, it would be:
(p | beam.ReadFromText(...)
| CountLines()
| beam.Map(lambda x: (random.randint(0, 1000), x))
| beam.GroupByKey()
| beam.FlatMap(lambda k, v: v) # Discard the key, and return the values
| ProcessLines())
This would allow you to parallelize ProcessLines.
I have a map and want to manually further sharding, the simplified code is
const (
dictShardNum = 16
dictShardSize = 1 << 28
)
type shard struct {
mu sync.Mutex
m map[int64]uint32
}
type dict struct {
shards []shard
}
func newDict() *dict {
shards := make([]shard, 0, dictShardNum)
for i := 0; i < dictShardNum; i++ {
shards = append(shards, shard{ m: make(map[int64]uint32) })
}
return &dict{ shards }
}
func (d *dict) insert(n int64) uint32 {
shardNum := int(n % dictShardNum)
shard := d.shards[shardNum]
shard.mu.Lock()
defer shard.mu.Unlock()
tempID, ok := shard.m[n]
if !ok {
tempID = uint32(len(shard.m) + shardNum*dictShardSize)
shard.m[n] = tempID // fatal error: concurrent map writes
}
return tempID
}
When running I got fatal error: concurrent map writes at that line, but I did lock the mutex, not sure what's wrong with my code
Package sync
import "sync"
type Mutex
A Mutex is a mutual exclusion lock. The zero value for a Mutex is an
unlocked mutex.
A Mutex must not be copied after first use.
Your code doesn't compile!
Playground: https://play.golang.org/p/6AwS0vOZfeP
25:18: undefined: n
30:24: undefined: n
33:11: undefined: n
If I change v int64 to n int64:
A Mutex must not be copied after first use.
$ go vet mutex.go
./mutex.go:26:11: assignment copies lock value to shard: command-line-arguments.shard contains sync.Mutex
$
Playground: https://play.golang.org/p/jExE-m11ny5
package main
import (
"sync"
)
const (
dictShardNum = 16
dictShardSize = 1 << 28
)
type shard struct {
mu sync.Mutex
m map[int64]uint32
}
type dict struct {
shards []shard
}
/// a newDict function
func (d *dict) insert(n int64) uint32 {
shardNum := int(n % dictShardNum)
shard := d.shards[shardNum]
shard.mu.Lock()
defer shard.mu.Unlock()
tempID, ok := shard.m[n]
if !ok {
tempID = uint32(len(shard.m) + shardNum*dictShardSize)
shard.m[n] = tempID // fatal error: concurrent map writes
}
return tempID
}
func main() {}
Command vet
Vet examines Go source code and reports suspicious constructs
Copying locks
Flag: -copylocks
Locks that are erroneously passed by value.
I think the answer is related to copy mutex values.
The dict should be
type dict struct {
shards []*shard
}
All shards are accessed via pointer then it won't have any issue.
In Golang, is it possible to change a pointer parameter's value to something else?
For example,
func main() {
i := 1
test(&i)
}
func test(ptr interface{}) {
v := reflect.ValueOf(ptr)
fmt.Println(v.CanSet()) // false
v.SetInt(2) // panic
}
https://play.golang.org/p/3OwGYrb-W-
Is it possible to have test() change i to point to another value 2?
Not sure if this is what you were looking for,
but yes you can change a pointer's value to something else.
The code below will print 2 and 3:
package main
import (
"fmt"
)
func main() {
i := 1
testAsAny(&i)
fmt.Println(i)
testAsInt(&i)
fmt.Println(i)
}
func testAsAny(ptr interface{}) {
*ptr.(*int) = 2
}
func testAsInt(i *int) {
*i = 3
}
Here's now to set the value using the reflect package. The key point is to set the pointer's element, not the pointer itself.
func test(ptr interface{}) {
v := reflect.ValueOf(ptr).Elem()
v.SetInt(2)
}
playground example
Note that the reflect package is not needed for this specific example as shown in another answer.
I'm trying to achieve parallel processing and communication over the channels in go.
What I basically try to solve is process a specifc data in parallel, and get results in order => introduced type Chunk for the purpose (see bellow).
I just make new channel for each chunk processing and keep them in slice => expect to be ordered once I iterate over them afterwards.
Simplified version of my program is (https://play.golang.org/p/RVtDGgUVCV):
package main
import (
"fmt"
)
type Chunk struct {
from int
to int
}
func main() {
chunks := []Chunk{
Chunk{
from: 0,
to: 2,
},
Chunk{
from: 2,
to: 4,
},
}
outChannels := [](<-chan struct {
string
error
}){}
for _, chunk := range chunks {
outChannels = append(outChannels, processChunk(&chunk))
}
for _, outChannel := range outChannels {
for out := range outChannel {
if out.error != nil {
fmt.Printf("[ERROR] %s", out.error)
return
}
fmt.Printf("[STDOUT] %s", out.string)
}
}
}
func processChunk(c *Chunk) <-chan struct {
string
error
} {
outChannel := make(chan struct {
string
error
})
go func() {
outChannel <- struct {
string
error
}{fmt.Sprintf("from: %d to: %d\n", c.from, c.to), nil}
close(outChannel)
}()
return outChannel
}
The output I see is:
[STDOUT] from: 2 to: 4
[STDOUT] from: 2 to: 4
What I'd however expect to see would be:
[STDOUT] from: 0 to: 2
[STDOUT] from: 2 to: 4
What am I doing wrong here? I don't see it.
The trouble is in the very first for loop of main. When you use for range loop, the loop variable (chunk here) gets created once and is assigned a copy of each slice element per iteration.
When you call processChunk(&chunk), you are passing the address of this loop variable, and the value of this variable changes with each iteration. Thus the function processChunk always ends up working on the last item in the chunks loop since that is what *chunk points to after the for loop finishes.
To fix, use slice indexing:
for i := 0; i < len(chunks); i++ {
// pass chunk objects by indexing chunks
outChannels = append(outChannels, processChunk(&chunks[i]))
}
Fixed code: https://play.golang.org/p/A1_DtkncY_
You can read more about range here.
I'm trying to make use of the flag package. My whole issue is that I need to specify groups/multiple values for the same parameter.
For example I need to parse a command as below:
go run mycli.go -action first -point 10 -action
second -point 2 -action 3rd -point something
I need to retrieve each group of action/point param. Is it possible?
package main
import (
"flag"
"fmt"
"strconv"
)
// Define a type named "intslice" as a slice of ints
type intslice []int
// Now, for our new type, implement the two methods of
// the flag.Value interface...
// The first method is String() string
func (i *intslice) String() string {
return fmt.Sprintf("%d", *i)
}
// The second method is Set(value string) error
func (i *intslice) Set(value string) error {
fmt.Printf("%s\n", value)
tmp, err := strconv.Atoi(value)
if err != nil {
*i = append(*i, -1)
} else {
*i = append(*i, tmp)
}
return nil
}
var myints intslice
func main() {
flag.Var(&myints, "i", "List of integers")
flag.Parse()
}
Ref: http://lawlessguy.wordpress.com/2013/07/23/filling-a-slice-using-command-line-flags-in-go-golang/
The flag package won't help you. Closest you'll get is the os package:
[jadekler#Jeans-MacBook-Pro:~/go/src]$ go run temp.go asdasd lkjasd -boom bam -hello world -boom kablam
[/var/folders/15/r6j3mdp97p5247bkkj94p4v00000gn/T/go-build548488797/command-line-arguments/_obj/exe/temp asdasd lkjasd -boom bam -hello world -boom kablam]
So, the first runtime flag key would be os.Args[1], the value would be os.Args[2], the next key would be os.Args[3], and so on.