How is the orchestration function suspended in Azure Durable Functions - async-await

I believe I conteptually understand what's goind on in Azure Durable Functions. You have to start an Orchestration Function where you can await Activities. When an activity is completed, the Orchestration Function starts from the top, but since the activity now has a result, the result is used instead of invoking it again.
This means the Orchestration Functions 'goes to sleep'. I've been looking into the source of both the Azure Durable Functions and Durable Task Framework on GitHub, but I can't quite find the actual lines of code that impact the callback of the awaited tasks.
Can anyone point me in the right direction?
Thanks!

I'm no expert on the topic (have been looking into how durable task myself lately) but essentially, the 'sleep' is achieved by scheduling tasks. All durable tasks (orchestrator and activities) are triggered by queue messages and in case of orchestrations, are replayed each time.
Coming to the code, most of this done in the Durable Task framework itself. And for your specific query
Orchestrator Execution runs here, returning any pending orchestration actions
Schedule next steps (activity, timers, etc.) here. The implementation for CompleteTaskOrchestrationWorkItemSync is in on the providers and for Durable Functions, its Azure Storage Queues.
There is no real sleep, but once the orchestration action is completed, another message with the response would trigger the orchestration function, causing it to replay.

Related

Should we store Events in a database? (Event Driven Design)

We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.

need clarification on microservices

I need some clarifications on microservices.
1) As I understand only choreography needs event sourcing and in choreography we use publish/subscribe pattern. Also we use program likes RabbitMQ to ensure communication between publisher and subscribers.
2) Orchestration does not use event sourcing. It uses observer pattern and directly communicate with observers. So it doesn't need bus/message brokers (like RabbitMQ). And to cooridante all process in orchestration we use mediator pattern.
Is that correct?
In microservice orchestration , a centralized approach is followed for execution of the decisions and control with help of orchestrator. The orchestrator has to communicate directly with respective service , wait for response and decide based on the response from and hence it is tightly coupled. It is more of synchronous approach with business logic predominantly in the orchestrator and it takes ownership for sequencing with respect to business logic. The orchestration approach typically follows a request/response type pattern whereby there are point-to-point connection between the services.
In, microservice choreography , a decentralized approach is followed whereby there is more liberty such that every microservice can execute their function independently , they are self-aware and it does not require any instruction from a centralized entity. It is more of asynchronous approach with business logic spread across the microservices, whereby every microservice shall listen to other service events and make it's own decision to perform an action or not. Accordingly, the choreography approach relies on a message broker (publish/subscribe) for communication between the microservices whereby each service shall be observing the events in the system and act on events autonomously.
TLDR: Choreography is the one which doesn't need persistance of the status of the process, orchestration needs to keep the status of the process somewhere.
I think you got this somewhat mixed up with implementation details.
Orchestration is called such, because there is a central process manager (sometimes mentioned as saga, wrongly imho) which directs (read orchestrates) operations across other services. In this pattern, the process manager directs actions to BC's, but needs to keep a state on previous operations in order to undo, roll back, or take any corrective or reporting actions deemed necessary. This status can be held either in an event stream, normal form db, or even implicitly and in memory (as in a method executing requests one by one and undoing the previous ones on an error), if the oubound requests are done through web requests for example. Please note that orchestrators may use synchronous, request-response communication (like making web requests). In that case the orchestrator still keeps a state, it's just that this state is either implicit (order of operations) or in-mem. State still exists though, and if you want to achieve resiliency (to be able to recover from an exception or any catastrophic failure), you would again need to persist that state on-disk so that you could recover.
Choreography is called such because the pieces of business logic doing the operations observe and respond to each other. So for example when a service A does things, it raises an event which is observed by B to do a follow up actions, and so on and so forth, instead of having a process manager ask A, then ask B, etc. Choregraphy may or may not need persistance. This really depends on the corrective actions that the different services need to do.
An example: As a practical example, let's say that on a purchase you want to reserve goods, take payment, then manifest a shipment with a courier service, then send an email to the recipient.
The order of the operations matter in both cases (because you want to be able to take corrective actions if possible), so we decide do the payment after the manifestation with the courier.
With orchestration, we'd have a process manager called PM, and the process would do:
PM is called when the user attempts to make a purchase
Call the Inventory service to reserve goods
Call the Courier integration service to manifest the shipment with a carrier
Call the Payments service to take a payment
Send an email to the user that they're receiving their goods.
If the PM notices an error on 4, they only corrective action is to retry to send the emai, and then report. If there was an error during payment then the PM would directly call Courier integration service to cancel the shipment, then call Inventory to un-reserve the goods.
With choreography, what would happen is:
An OrderMade event is raised and observed by all services that need data
Inventory handles the OrderMade event and raises an OrderReserved
CourierIntegration handles the OrderReserved event and raises ShipmentManifested
Payments service handles the ShipmentManifested and on success raises PaymentMade
The email service handles PaymentMade and sends a notification.
The rollback would be the opposite of the above process. If the Payments service raised an error, Courier Integration would handle it and raise a ShipmentCancelled event, which in turn is handled by Inventory to raise OrderUnreserved, which in turn may be handled by the email service to send a notification.

How to Trigger an AWS Lambda Function from External SQS Queue Activity

I'm trying to configure a lambda function to consume from an SQS queue that I've been given read and delete permissions to, that I do not own/have configuration of. Is there a way to use lambda's SQS trigger functionality for a queue that doesn't exist inside my AWS account?
If not, what are some alternatives that don't include checking the queue on a scheduled event.
If the owner of the SQS queue gives you the necessary permissions (see the setup docs for what those permissions are), you can do this. But, you shouldn't.
Subscribing to someone else's SQS queue is an anti-pattern. This is because a queue represents a backlog of work, and the implicit functionality is that everything that goes in eventually comes out. All the queue does is separate input flow from output flow (data can flow in both faster and slower than they flow out).
This idea of flow, however, means that when something comes out, it's no longer in the queue. (Caveat here: there are work-arounds to this, but they're usually not preferred). A consumer, however, always has the goal of processing everything in the queue. This may be done by multiple threads under the control of one consumer, but the end result is still that everything is processed. If there are multiple consumers, then they by necessity compete with one another, and none of them get to process everything in the queue.
How do we ensure there aren't multiple consumers? Simple: the consumer owns the queue. No other consumer is granted read permissions. It might well be the case that someone other than the consumer controls the filling of the queue (receiving write permissions) - and AWS has the perfect solution for this:
SNS Topics: An SNS topic is a source of data. It is, in effect, a publisher. When someone else wants you to have access to their data, they allow you to become a subscriber to their topic. When a new message is published to the SNS topic, everyone who is subscribed to the topic gets a copy. What happens to that copy is decided by the subscriber: it may be acted upon directly, stored for later action, or acted on indirectly, e.g. by being placed in a queue. This is the Pub-Sub model. It separates the details of one entity (the publisher) creating messages and sending them out to many others, from each recipient's (subscriber's) individual decision about how to consume those messages.
TL;DR: get whoever is currently owning the queue to publish to an SNS topic instead, then set up a queue (or whatever you prefer) subscribed to that topic.

Best way to schedule one-time events in serverless environments

Example use case
Send the user a notification 2 hours after signup.
Options considered
setTimeout(() => { /* send notification */ }, 2*60*60*1000); is not an option in serverless environments since the function terminates after execution (so it has to be stateless).
CloudWatch events can schedule lambda invocations using cron expressions - but this was designed for repetitive invocations (there's a limit of 100 rules/region).
I have not seen scheduling options in AWS SNS/SQS or GCP Pub/Sub. Are there alternatives with scheduling?
I want to avoid (if possible) setting up a dedicated message broker (overkill) or stateful/non-serverless instance - is there a serverless way to do this?
I can queue the events in a database and invoke a lambda function every minute to poll the database for events to execute in that minute... is there a more elegant solution?
Use AWS Step functions, they are like serverless functions that don't have the 15 minute limit like AWS Lambda does. You can design a workflow in AWS step that integrates with API Gateway, Lambda and SNS to send email and text notifications as follows:
Create a REST API via API gateway that will invoke a Lambda function passing in for example, the destination address (email, phone #) of the SNS notification, when it should be sent, notification method (e.g. email, text, etc.).
The Lambda function on invocation will invoke the Step function passing in the data (Lambda is needed because API Gateway currently can't invoke Step functions directly).
The Step function is basically a workflow, you can define states for waiting (like waiting for the specified time to send the notification e.g. 30 seconds), and states for invoking other Lambda functions that can use SNS to send out an email and/or text notifications.
A rudimentary example is provided by AWS w/ their Task Timer example.
Things are coming on GCP for doing this, but not very soon. Thereby, today, the solution is to poll a database.
You can to that with Datastore/firestore with the execution datetime indexed (to prevent to read all the documents each minute). But be careful of traffic spike, you could create hotspot.
You can use Cloud Scheduler on Google Cloud Platform. As is is stated in the official documentation :
Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.
Here you can check a quickstart for using it with Pub/Sub and Cloud Functions.

Microservice and RabbitMQ

I am new to Microservices and have a question with RabbitMQ / EasyNetQ.
I am sending messages from one microservice to another microservice.
Each Microservice are Web API's. I am using CQRS where my Command Handler would consume message off the Queue and do some business logic. In order to call the handler, it will need to make a request to the API method.
I would like to know without having to explicit call the API endpoint to hit the code for consuming messages. Is there an automated way of doing it without having to call the API endpoint ?
Suggestion could be creating a separate solution which would be a Console App that will execute the RabbitMQ in order to start listening. Create a while loop to read messages, then call the web api endpoint to handle business logic every time a new message is sent to the queue.
My aim is to create a listener or a startup task where once messages are in the queue it will automatically pick it up from the Queue and continue with command handler but not sure how to do the "Automatic" way as i describe it. I was thinking to utilise Azure Webjob that will continuously be running and it will act as the Consumer.
Looking for a good architectural way of doing it.
Programming language being used is C#
Much Appreciated
The recommended way of hosting RabbitMQ subscriber is by writing a windows service using something like topshelf library and subscribe to bus events inside that service on its start. We did that in multiple projects with no issues.
If you are using Azure, the best place to host RabbitMQ subscriber is in a "Worker Role".
I am using CQRS where my Command Handler would consume message off
the Queue and do some business logic. In order to call the handler, it
will need to make a request to the API method.
Are you sure this is real CQRS? CQRS occures when you handle queries and commands differently in your domain logic. Receiving a message via a calss, that's called CommandHandler and just reacting to it is not yet CQRS.
My aim is to create a listener or a startup task where once messages
are in the queue it will automatically pick it up from the Queue and
continue with command handler but not sure how to do the "Automatic"
way as i describe it. I was thinking to utilise Azure Webjob that will
continuously be running and it will act as the Consumer. Looking for
a good architectural way of doing it.
The easier you do that, the better. Don't go searching for complex solutions until you tried out all the simple ones. When I was implementing something similar, I was just running a pool of message handler scripts using Linux cron. A handler poped a message off the queue, processed it and terminated. Simple.
I think using the CQRS pattern, you will have events as well and corresponding event handlers. As you are using RabbitMQ for asynchronous communication between command and query then any message put on specific channel on RabbitMQ, can be listened by a callback method
Receiving messages from the queue is more complex. It works by subscribing a callback function to a queue. Whenever we receive a message, this callback function is called by the Pika library.

Resources