I'm implementing a Cadence Workflow that needs to call functions with context.Context parameters. How do I go about getting a context.Context from the workflow.Context? Is it just a matter of ctx.(*context.Context)?
It is not context.Context.
You should never write any workflow code that uses context.Context at all. All the calls that needs context.Context should be written within workflow activity or local activity for determinism.
In other words, Workflow code should only contain logic to orchestrate/manage other workflow entities like activities/childWF/Signal/etc.
workflow.Context is a special data structure for worker to pass in workflow run-time information during workflow execution. For example, workflowID and runID. It happens to call Context just because this looks very similar with Golang style. Other than that, it has nothing directly related to context.Context.
In Java client, there is no workflow.Context and the way that worker pass through these data is via ThreadLocal.
If you really want to pass through some KV data from external to workflow code, you can use context propagation: https://github.com/uber-common/cadence-samples/tree/master/cmd/samples/recipes/ctxpropagation
Related
Is there a good way to orchestrate lambda functionality that changes based on a queue message? I was thinking about taking a similar approach described in the strategy pattern.
The lambda function is polling an SQS queue. The queue message would contain some context that is passed into a lambda telling it what workflow needs to be executed. Based on this message, the lambda would execute some corresponding script.
The idea behind this is that I can write code for different ad hoc jobs and use the same queue + lambda function for these jobs but have it delegate the work. This way, I can track unsuccessful jobs in a dead letter queue. Are there any red flags here or potential pitfalls I should be aware of when you hear this? Any advice would be appreciated. TIA!
EDIT: For some additional context, this different workflows triggered by this lambda will vary in compute resources needed. An example is ingesting a large dataset from an api call and doing some custom schematization on the contents before making an api call.
This is indeed possible, but there's a variety of approaches you may take. These depend on what type of workflow/processing you require.
As you highlight, Lambda could be used for this. It's worth noting that Lambda functions do not work well for computationally-intensive tasks.
If you were looking to perform a workflow with some complexity, you should consider AWS Step Functions. Suppose you had three "tasks" to choose from, you could define a Step Function for each, then use Lambda to (1.) receive the message & work out which task is required, then (2.) start an execution for the desired Step Function.
FYI, you don't need to make your Lambda function poll the SQS queue, instead, you can set up SQS to automatically trigger Lambda once a new message is added to the queue. See AWS Docs - Configuring a queue to trigger an AWS Lambda function.
If you edit your question with more info on what you're looking to do (processing-wise) with each message, people will be able to better help with your use-case.
Best of luck! :)
Suppose, I have an api POST /order which invokes PlaceOrder lambda and expects response from this. PlaceOrder lambda does some works, invokes another lambda ProcessPayment lambda and expects response. Also, ProcessPayment invokes CreateInvoice lambda expecting response. Whole architecture is like a RequestResponse cycle. I woulde like to achieve that without lambda invoking another lambda as it is considered as anti-pattern. My question is what is the best design pattern to achieve this behavior within 29 seconds with event-driven architecture.
What AWS suggests: As per this official documentation, they suggests to use SQS. But regarding using SQS, I have some thoughts.
My thoughts:
At event sources architecture, I can orchestrate these lambdas with SQS, SNS etc other event sources, but in that case, the nature would not be synchronous and thus I would not get response from the api.
My other solution:
Using Step Function: I can orchestrate this workflow with step function, and I think it is more elegant solution in this synchronous calling case. But I would like to achieve
this via event sources.
How can I design this scenerio with best practices using event-based achitecture?
In an Event-Driven Architecture, the communication between producers and consumers is asynchronous by design, that's the way the architecture scales.
You can get nearly synchronous communication between 2 services in an EDA, by creating dedicated queues / channels to communicate between them, make sure they're scaled up to a level where the latency is acceptable (close to synchronous values).
This adds some complexity, because the services which need responses, have to wait in a hot-loop to get them as soon as possible, and also if messages are lost, you need to have retry policies, etc.
I think you need to focus more on the mechanics of your program and a bit less on design patterns. You need to use the design patterns that fit your use-case, the other way around will not work. In the end, you build a program to fulfill a certain task or set of tasks, so that should be your end goal.
You’re stating that you have a process order Lambda, a create invoice Lambda and a process payment Lambda. I’d say the most interesting question is what you need to get done before you return a response to the user. Maybe you can process the order, respond to the user that it is done and handle the invoicing and payments on a later moment. Typically that would mean you put a message in a SQS queue or on an SNS topic.
It could be that you need your payment to be processed before you respond to the user, because they need to be informed about the status of the payment. You could then combine both actions in a single Lambda, because there is no way to spit the two tasks from one another. Keep in mind that often another option exist where you process the order first, put a message in a queue for the process payment (as it typically is a process that involves a third party) and the front end will poll for an update on the payment status. This way you can return a response quickly and still give an update on the payment as soon as possible.
The create invoice process is typically something you would never want to synchronously invoke during order confirmation. What if your invoicing application (intern or extern) is down? Theoretically you could still process orders as long as you create the invoice at some later moment in time. If you couple everything together you make order confirmation dependent on your invoice creation process, which I would regard as an unnecessary dependency.
I would really advice against step functions for this use-case. They can be utilized for long running processes that need to keep state and ‘wake up’ at specific moments, but for this specific flow I would say they do not help and are unnecessarily complex. If you have 3 things you need to do that you cannot separate from
one another, just run them in the same Lambda.
I'm using gin-gonic as HTTP handler. I want to prerender some graphical resources after my users make POST request. For this, I put a middleware that assign a function (with a timer inside) to a map[string]func() and call this function directly after assignation.
The problem is, when the user make two subsequent request, the function is called twice.
Is there any way to clear function ref and/or his currently running call like a clearInterval or clearTimeout in Javascript ?
Thanks
No; whatever function you've scheduled to run as a goroutine needs to either return or call runtime.Goexit.
If you're looking for a way to build cancellation into your worker, Go provides a primitive to handle that, which is already part of any HTTP request - contexts. Check out these articles from the Go blog:
Concurrency Patterns: Context
Pipelines and cancellation
I suppose your rendering function is calling into a library, so you don't have control over the code where the bulk of the time is spent. If you do have such control, just pass a channel into the goroutine, periodically check if the channel is closed, and just return from the goroutine if that happens.
But actually I would recommend a different, and simpler, solution: keep track (in a map) of the file names (or hashes) of the files that are currently being processed, and check that map before launching a second one.
I have a web application written in Go with multiple modules, one deals with all database related things, one deals with reports, one consists all web services, one for just business logic and data integrity validation and several others. So, I have numerous methods, functions have been covered by these modules.
Now, the requirement is to use session in web service as well as we need to use transaction in some APIs. The first approach came to my mind is to change the signature of the existing methods to support session, transaction (*sql.Tx) (which is a painful task, but have to do in anyways!). Now, I'm afraid actually what if something will come in future that needs to be passed through all these methods and then should I have to go through this cycle again to change the method signature again? This does not seem to be a good approach.
Later, I found that context.Context might be a good approach (well, you can suggest other approaches too, apart from this!) that for every method call, just pass context parameter at first argument place in a method call hence I've to change methods signature only one time. If I go with this approach, can anyone tell me how would I set/pass multiple keys (session, sql.Tx) in that context object?
(AFAIK, context.Context provides WithValue method, but can I use it for multiple keys? How would I set a key in the nested function call, is that even possible?)
Actually, this question has two questions:
Should I consider context.Context for my solution? If not, give me a light on another approach.
How do I set multiple keys and values in context.Context?
For your second question you can group all your key/values in struct as follows:
type vars struct {
lock sync.Mutex
db *sql.DB
}
Then you can add this struct in context:
ctx := context.WithValue(context.Background(), "values", vars{lock: mylock, db: mydb})
And you can retrieve it:
ctxVars, ok := r.Context().Value("values").(vars)
if !ok {
log.Println(err)
return err
}
db := ctxVars.db
lock := ctxVars.lock
I hope it helps you.
Finally, I decided to go with context package solution, after studying the articles from the Go context experience reports. And especially I found Dave Cheney's article helpful.
Well, I can make my custom solution for context as gorilla (Ah, somewhat!). But as Go already have a solution for this, I would go with context package.
Right now, I only need session and database transaction in each method to support transaction if began and user authentication, authorization.
It might be overhead, of having context.Context in each method of the application cause I don't need cancellation, deadline, timeout functionality at the moment but it could be helpful in future.
In Go, if we have a type with a method that starts some looped mechanism (polling A and doing B forever) is it best to express this as:
// Run does stuff, you probably want to run this as a goroutine
func (t Type) Run() {
// Do long-running stuff
}
and document that this probably wants to be launched as a goroutine (and let the caller deal with that)
Or to hide this from the caller:
// Run does stuff concurrently
func (t Type) Run() {
go DoRunStuff()
}
I'm new to Go and unsure if convention says let the caller prefix with 'go' or do it for them when the code is designed to run async.
My current view is that we should document and give the caller a choice. My thinking is that in Go the concurrency isn't actually part of the exposed interface, but a property of using it. Is this right?
I had your opinion on this until I started writing an adapter for a web service that I want to make concurrent. I have a go routine that must be started to parse results that are returned to the channel from the web calls. There is absolutely no case in which this API would work without using it as a go routine.
I then began to look at packages like net/http. There is mandatory concurrency within that package. It is documented at the interface level that it should be able to be used concurrently, however the default implementations automatically use go routines.
Because Go's standard library commonly fires of go routines within its own packages, I think that if your package or API warrants it, you can handle them on your own.
My current view is that we should document and give the caller a choice.
I tend to agree with you.
Since Go makes it so easy to run code concurrently, you should try to avoid concurrency in your API (which forces clients to use it concurrently). Instead, create a synchronous API, and then clients have the option to run it synchronously or concurrently.
This was discussed in a talk a couple years ago: Twelve Go Best Practices
Slide 26, in particular, shows code more like your first example.
I would view the net/http package as an exception because in this case, the concurrency is almost mandatory. If the package didn't use concurrency internally, the client code would almost certainly have to. For example, http.Client doesn't (to my knowledge) start any goroutines. It is only the server that does so.
In most cases, it's going to be one line of the code for the caller either way:
go Run() or StartGoroutine()
The synchronous API is no harder to use concurrently and gives the caller more options.
There is no 'right' answer because circumstances differ.
Obviously there are cases where an API might contain utilities, simple algorithms, data collections etc that would look odd if packaged up as goroutines.
Conversely, there are cases where it is natural to expect 'under-the-hood' concurrency, such as a rich IO library (http server being the obvious example).
For a more extreme case, consider you were to produce a library of plug-n-play concurrent services. Such an API consists of modules each having a well-described interface via channels. Clearly, in this case it would inevitably involve goroutines starting as part of the API.
One clue might well be the presence or absence of channels in the function parameters. But I would expect clear documentation of what to expect either way.