Start cronjob at specific epoch time in golang - go

I am using github.com/robfig/cron library. I want to run cronjob at epoc time with millisecond and work every second. The cron starts at 000 millisecond. I need it to start at specific times.
For example if I take the following:
c := cron.New()
c.AddFunc("#every 1s", func() {
// Do Something
})
c.Start()
And run it at 1657713890300 epoc timestamp then I want the function to run at:
1657713891300
1657713892300
1657713893300.
Currently, cron running at
1657713891000
1657713892000
1657713893000.
Is this possible?

When you use #every 1s the library creates a ConstantDelaySchedule which "rounds so that the next activation time will be on the second".
If that is not what you want then you can create your own scheduler (playground):
package main
import (
"fmt"
"time"
"github.com/robfig/cron/v3"
)
func main() {
time.Sleep(300 * time.Millisecond) // So we don't start cron too near the second boundary
c := cron.New()
c.Schedule(CustomConstantDelaySchedule{time.Second}, cron.FuncJob(func() {
fmt.Println(time.Now().UnixNano())
}))
c.Start()
time.Sleep(time.Second * 5)
}
// CustomConstantDelaySchedule is a copy of the libraries ConstantDelaySchedule with the rounding removed
type CustomConstantDelaySchedule struct {
Delay time.Duration
}
// Next returns the next time this should be run.
func (schedule CustomConstantDelaySchedule) Next(t time.Time) time.Time {
return t.Add(schedule.Delay)
}
Follow up: The above uses the time.Time passed to Next which is time.Now() so will the time will slowly advance over time.
Addressing this is possible (see below - playground) but doing this introduces some potential issuers (the CustomConstantDelaySchedule must not be reused and if the jobs take too long to run then you will still end up with discrepancies). I'd suggest that you consider moving away from the cron package and just use a time.Ticker.
package main
import (
"fmt"
"time"
"github.com/robfig/cron/v3"
)
func main() {
time.Sleep(300 * time.Millisecond) // So we don't start cron too nead the second boundary
c := cron.New()
c.Schedule(CustomConstantDelaySchedule{Delay: time.Second}, cron.FuncJob(func() {
fmt.Println(time.Now().UnixNano())
}))
c.Start()
time.Sleep(time.Second * 5)
}
// CustomConstantDelaySchedule is a copy of the libraries ConstantDelaySchedule with the rounding removed
// Note that because this stored the last time it cannot be reused!
type CustomConstantDelaySchedule struct {
Delay time.Duration
lastTarget time.Time
}
// Next returns the next time this should be run.
func (schedule CustomConstantDelaySchedule) Next(t time.Time) time.Time {
if schedule.lastTarget.IsZero() {
schedule.lastTarget = t.Add(schedule.Delay)
} else {
schedule.lastTarget = schedule.lastTarget.Add(schedule.Delay)
}
return schedule.lastTarget
}

Related

Go goroutine test failing Expected number of calls

I'm new to Go here. I am trying to test the function call inside my Go routine but it fails with the error message
Expected number of calls (8) does not match the actual number of calls
(0).
My test code goes like:
package executor
import (
"testing"
"sync"
"github.com/stretchr/testify/mock"
)
type MockExecutor struct {
mock.Mock
wg sync.WaitGroup
}
func (m *MockExecutor) Execute() {
defer m.wg.Done()
}
func TestScheduleWorksAsExpected(t *testing.T) {
scheduler := GetScheduler()
executor := &MockExecutor{}
scheduler.AddExecutor(executor)
// Mock exptectations
executor.On("Execute").Return()
// Function Call
executor.wg.Add(8)
scheduler.Schedule(2, 1, 4)
executor.wg.Wait()
executor.AssertNumberOfCalls(t, "Execute", 8)
}
and my application code is:
package executor
import (
"sync"
"time"
)
type Scheduler interface {
Schedule(repeatRuns uint16, coolDown uint8, parallelRuns uint64)
AddExecutor(executor Executor)
}
type RepeatScheduler struct {
executor Executor
waitGroup sync.WaitGroup
}
func GetScheduler() Scheduler {
return &RepeatScheduler{}
}
func (r *RepeatScheduler) singleRun() {
defer r.waitGroup.Done()
r.executor.Execute()
}
func (r *RepeatScheduler) AddExecutor(executor Executor) {
r.executor = executor
}
func (r *RepeatScheduler) repeatRuns(parallelRuns uint64) {
for count := 0; count < int(parallelRuns); count += 1 {
r.waitGroup.Add(1)
go r.singleRun()
}
r.waitGroup.Wait()
}
func (r *RepeatScheduler) Schedule(repeatRuns uint16, coolDown uint8, parallelRuns uint64) {
for repeats := 0; repeats < int(repeatRuns); repeats += 1 {
r.repeatRuns(parallelRuns)
time.Sleep(time.Duration(coolDown))
}
}
Could you point out to me what I could be doing wrong here? I'm using Go 1.16.3. When I debug my code, I can see the Execute() function being called but testify is not able to register the function call
You need to call Called() so that mock.Mock records the fact that Execute() has been called. As you are not worried about arguments or return values the following should resolve your issue:
func (m *MockExecutor) Execute() {
defer m.wg.Done()
m.Called()
}
However I note that the way your test is currently written this test may not accomplish what you want. This is because:
you are calling executor.wg.Wait() (which will wait until the function has been called the expected number of times) before calling executor.AssertNumberOfCalls so your test will never complete if Execute() is not called at least the expected number of times (wg.Wait() will block forever).
After m.Called() has been called the expected number of times there is a race condition (if executor is still be running there is a race between executor.AssertNumberOfCalls and the next m.Called()). If wg.Done() does get called an extra time you will get a panic (which I guess you could consider a fail!) but I'd probably simplify the test a bit:
scheduler.Schedule(2, 1, 4)
time.Sleep(time.Millisecond) // Wait long enough that all executions are guaranteed to have completed (should be quick as Schedule waits for go routines to end)
executor.AssertNumberOfCalls(t, "Execute", 8)

Parallelism Problem on Cloud Dataflow using Go SDK

I have Apache Beam code implementation on Go SDK as described below. The pipeline has 3 steps. One is textio.Read, other one is CountLines and the last step is ProcessLines. ProcessLines step takes around 10 seconds time. I just added a Sleep function for the sake of brevity.
I am calling the pipeline with 20 workers. When I run the pipeline, my expectation was 20 workers would run in parallel and textio.Read read 20 lines from the file and ProcessLines would do 20 parallel executions in 10 seconds. However, the pipeline did not work like that. It's currently working in a way that textio.Read reads one line from the file, pushes the data to the next step and waits until ProcessLines step completes its 10 seconds work. There is no parallelism and there is only one line string from the file throughout the pipeline. Could you please clarify me what I'm doing wrong for parallelism? How should I update the code to achieve parallelism as described above?
package main
import (
"context"
"flag"
"time"
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
"github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
)
// metrics to be monitored
var (
input = flag.String("input", "", "Input file (required).")
numberOfLines = beam.NewCounter("extract", "numberOfLines")
lineLen = beam.NewDistribution("extract", "lineLenDistro")
)
func countLines(ctx context.Context, line string) string {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
return line
}
func processLines(ctx context.Context, line string) {
time.Sleep(10 * time.Second)
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection
{
s = s.Scope("Count Lines")
return beam.ParDo(s, countLines, lines)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
beam.ParDo0(s, processLines, lines)
}
func main() {
// If beamx or Go flags are used, flags must be parsed first.
flag.Parse()
// beam.Init() is an initialization hook that must be called on startup. On
// distributed runners, it is used to intercept control.
beam.Init()
// Input validation is done as usual. Note that it must be after Init().
if *input == "" {
log.Fatal(context.Background(), "No input file provided")
}
p := beam.NewPipeline()
s := p.Root()
l := textio.Read(s, *input)
lines := CountLines(s, l)
ProcessLines(s, lines)
// Concept #1: The beamx.Run convenience wrapper allows a number of
// pre-defined runners to be used via the --runner flag.
if err := beamx.Run(context.Background(), p); err != nil {
log.Fatalf(context.Background(), "Failed to execute job: %v", err.Error())
}
}
EDIT:
After I got the answer about the problem might be caused by fusion, I changed the related part of the code but it did not work again.
Now the first and second step is working in parallel, however the third step ProcessLines is not working in parallel. I only made the following changes. Can anyone tell me what the problem is?
func AddRandomKey(s beam.Scope, col beam.PCollection) beam.PCollection {
return beam.ParDo(s, addRandomKeyFn, col)
}
func addRandomKeyFn(elm beam.T) (int, beam.T) {
return rand.Int(), elm
}
func countLines(ctx context.Context, _ int, lines func(*string) bool, emit func(string)) {
var line string
for lines(&line) {
lineLen.Update(ctx, int64(len(line)))
numberOfLines.Inc(ctx, 1)
emit(line)
}
}
func processLines(ctx context.Context, _ int, lines func(*string) bool) {
var line string
for lines(&line) {
time.Sleep(10 * time.Second)
numberOfLinesProcess.Inc(ctx, 1)
}
}
func CountLines(s beam.Scope, lines beam.PCollection) beam.PCollection {
s = s.Scope("Count Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
return beam.ParDo(s, countLines, grouped)
}
func ProcessLines(s beam.Scope, lines beam.PCollection) {
s = s.Scope("Process Lines")
keyed := AddRandomKey(s, lines)
grouped := beam.GroupByKey(s, keyed)
beam.ParDo0(s, processLines, grouped)
}
Many advanced runners of MapReduce-type pipelines fuse stages that can be run in memory together. Apache Beam and Dataflow are no exception.
What's happening here is that the three steps of your pipeline are fused, and happening in the same machine. Furthermore, the Go SDK does not currently support splitting the Read transform, unfortunately.
To achieve parallelism in the third transform, you can break the fusion between Read and ProcessLines. You can do that adding a random key to your lines, and a GroupByKey transform.
In Python, it would be:
(p | beam.ReadFromText(...)
| CountLines()
| beam.Map(lambda x: (random.randint(0, 1000), x))
| beam.GroupByKey()
| beam.FlatMap(lambda k, v: v) # Discard the key, and return the values
| ProcessLines())
This would allow you to parallelize ProcessLines.

why the "infinite" for loop is not processed?

I need to wait until x.Addr is being updated but it seems the for loop is not run. I suspect this is due the go scheduler and I'm wondering why it works this way or if there is any way I can fix it(without channels).
package main
import "fmt"
import "time"
type T struct {
Addr *string
}
func main() {
x := &T{}
go update(x)
for x.Addr == nil {
if x.Addr != nil {
break
}
}
fmt.Println("Hello, playground")
}
func update(x *T) {
time.Sleep(2 * time.Second)
y := ""
x.Addr = &y
}
There are two (three) problems with your code.
First, you are right that there is no point in the loop at which you give control to the scheduler and such it can't execute the update goroutine. To fix this you can set GOMAXPROCS to something bigger than one and then multiple goroutines can run in parallel.
(However, as it is this won't help as you pass x by value to the update function which means that the main goroutine will never see the update on x. To fix this problem you have to pass x by pointer. Now obsolete as OP fixed the code.)
Finally, note that you have a data race on Addr as you are not using atomic loads and stores.

Golang Revel Job spec every 1st monday on every month

I'm using golang revel and I need a job to be run every first monday of every month, a quartz cron spec for that would look like this:
0 0 0 ? 1/1 MON#1
But robfig/cron doesn't accept a spec like that, hence neither revel/jobs.
Anyone knows how can I solve that [using revel jobs]?
To me, the easiest solution would be something like this:
func (e SomeStruct) Run() {
t := time.Now().Local()
day_num, _ := t.Day()
if day_num <= 7 {
fmt.Println("Hello, playground")
}
}
func init() {
revel.OnAppStart(func() {
jobs.Schedule("0 0 * * 1", SomeStruct{})
})
Where you simply run the job EVERY monday, but in the job itself, check if it's the FIRST monday before you actually do anything. There may be a better way (not very familiar with Revel), but glancing through how their jobs work this would work and it's not like it will be a performance issue.
To check for the first Monday in the month,
package main
import (
"fmt"
"time"
)
func IsFirstMonday() bool {
t := time.Now().Local()
if d := t.Day(); 1 <= d && d <= 7 {
if wd := t.Weekday(); wd == time.Monday {
return true
}
}
return false
}
func main() {
fmt.Println(IsFirstMonday())
}

How to identify the stack size of goroutine?

I know go routine can have a few blocking actions, wonder if a goroutine can call a user-defined blocking function like a regular function. A user-defined blocking function has a few steps like, step1, step2.
In another word, I would like to find out whether we can have nested blocking calls in a go routine.
UPDATE:
Original intention was to find the stack size used by goroutine, especially with nested blocking calls. Sorry for the confusion. Thanks to the answer and comments, I created the following function that has 100,000 goroutines, it took 782MB of virtual memory and 416MB of Resident memory on my Ubuntu desktop. It evens out to be 78KB of memory for each go routine stack. Is this a correct statement?
package main
import (
"fmt"
"time"
)
func f(a int) {
x := f1(a);
f2(x);
}
func f1(a int) int {
r := step("1a", a);
r = step("1b", r);
return 1000 * a;
}
func f2(a int) {
r := step("2a", a);
r = step("2b", r);
}
func step(a string, b int) int{
fmt.Printf("%s %d\n", a, b);
time.Sleep(1000 * time.Second)
return 10 * b;
}
func main() {
for i := 0; i < 100000; i++ {
go f(i);
}
//go f(20);
time.Sleep(1000 * time.Second)
}
I believe you're right, though I'm unsure of the relationship between "virtual" and "resident" memory it's possible there's some overlap.
Some things to consider: you're running 100,000 it appears, not 10,000.
The stack itself might contain things like the strings used for the printfs, method parameters, etc.
As of go 1.2 the default stack size (per go routine) is 8KB which may explain some of it.
As of go 1.3 it also uses an exponentially increasing stack size, but I doubt that's the problem you're running into.
Short answer yes.
A goroutine is a "lightweight thread", that means it can do stuff independently from other code in your program. It's almost as if you started a new program, but you can communicate with your other code using the constructs golang provides (channels, locks, etc.).
P.S. Once the main function ends, all goroutines are killed (that's why you need the time.Sleep() in the example)
Here's the quick example (won't run in the golang playground because of their constraints):
package main
import (
"fmt"
"time"
)
func saySomething(a, b func()){
a()
b()
}
func foo() {
fmt.Println("foo")
}
func bar() {
fmt.Println("bar")
}
func talkForAWhile() {
for {
saySomething(foo, bar)
}
}
func main() {
go talkForAWhile()
time.Sleep(1 * time.Second)
}

Resources