I have a question regarding the dequeue mechanism during discrete event simulation.
Most of the implementations use some kind of priority queue which can be used to quickly retrieve the event with the earliest timestamp. What happens when such an event cannot be scheduled because, say, it needs a resource to be able to run.
There may be another event in the queue whose timestamp is greater than the timestamp of the event that is blocked on a resource.
For example, let us assume we are modelling a grocery-store with separate checkout lines and a cashier per line. A shopper entering a checkout line is an event. We enqueue this event based on the time the shopper entered the checkout line. However, the order in which our simulation should execute two such events in not necessarily the time order in which they entered the checkout line because the cashiers might free up in a different order.
In such a scenario how does using a priority queue solely based on timestamp --- and independent of resource availability --- work out?
You need a queue for each cashier, or at least a count of waiting customers if customer identity is not important in your simulation ( e.g. I would join a queue of three people with one item each over a queue with one person with a full trolley, so just a queue length may not capture the information needed to incorporate that heuristic ).
When a customer joins the queue, the number of queuing customers is incremented or the customer is pushed onto the cashier's queue.
When the cashier is ready to serve, the first customer is popped of the cashier's queue. So the customer service event is dependent not on the time the customer arrives, but when the cashier is ready.
These queues or counters are independent of the scheduling mechanism for events - the events scheduled manipulate these queues, they aren't dependent on them for scheduling.
As Pete Kirkham pointed out, it's important to be aware that the lines (queues) that customers wait in are completely separate things from the priority queue that's used to determine event ordering.
In discrete-event simulation an event is a point in time at which the system state changes. When an event occurs you figure out what to do next based on the state. Joining the line of customers is an event, but so is becoming eligible for service. Once a customer becomes eligible for service, the logic of that event has to check whether service is possible or not. If so, schedule a new event for when the service will end. If there are resource constraints, then nothing gets scheduled and that customer is on hold. However, at some point in the future the required resource will become available. That's an event too, and that event's logic should check to see if there are customers on hold due to lack of the resource. If not, there's no need to schedule anything, but if so, you can now schedule the actual service for the customer. You can see that customer delays in the queue will increase with resource constraints.
For a much fuller explanation of how discrete-event simulations work, please look at this introductory tutorial paper.
Related
I'm working on architecting a micro-service solution where most code will be C# and most likely Angular for any front end. My question is about message chaining. I am still figuring out what message broker to use; Azure Service Bus , RabbitMQ, etc.. There is a concept which I haven't found much about.
How do I handle cases when I want to fire a message when a specific set of messages have fired. An example but not part of my actual solution: I want to say Notify someone when pays a bill. We send a message "PAIDBILL"
which will fire off microservices which will be processed independently:
FinanceService to Debit the ledger and fire "PaymentPosted"
EmailService: email Customer Saying thank you for paying the bill
"CustomerPaymentEmailSent"
DiscountService: Check if they get a discount for paying on time then send
"CustomerCanGetPaymentDiscount"
If all three messages have fired for the Same PAIDBILL: Message "PaymentPosted", "CustomerPaymentEmailSent", "CustomerCanGetPaymentDiscount"
then I want to email the customer that they will get a discount on their next bill. It Must be done AFTER all three have tiggered and the order doesn't matter. How do I Schedule a new message to be sent "EmailNextTimeDiscount" message, without having to poll for what messages have fired every minute, hour, day?
All I can think of is to have a SQL table which marks that each one is complete (by locking the table) and when the last one is filled then send off the message. Would this be a good solution? I find it an anti-pattern for the micro-service & message queue design.
If you're using messages (e.g. Service Bus / RabbitMQ), then I think the solution you have described is the best one. This type of design - where services have knowledge about the other domains in the system - is typically known as choreography.
You'll want to pick a service which will be responsible for this business logic. That service will need to receive all the preceding types of messages so that it can determine when (if) all have been met, which it probably wants to do by recording which of the gates have already passed in a database.
One alternative you could consider is chaining the business processes instead of doing them in parallel. So...
PAYBILL causes FinanceService to Debit the ledger and fire "PaymentPosted"
"PayentPosted" causes EmailService to email Customer Saying thank you for paying the bill and broadcasts "CustomerPaymentEmailSent"
"CustomerPaymentEmailSent" causes DicsountService to check if they get a discount for paying on Time then sends "CustomerCanGetPaymentDiscount"
The email you want to send is just triggered by "CustomerCanGetPaymentDiscount".
If I'm honest, I would switch around the dependency model you're using at this last stage. So, instead of some component listening for "CustomerCanGetPaymentDiscount" events from DiscountService and sending an email, I think I would instead have the DiscountService tell some other component to send an email. It seems natural to me for something that calculates discounts to know that an email should be sent. It seems less natural for something that sends emails to know about discounts (and everything else that needs emails sent). This is why I don't like architectures where the assumption is that every message should be an event and every action should be triggered by an event: it removes a lot of decisions about where domain logic can live, because the message receiver always has to know about the domain of the message sender, never vice versa.
Suppose we have 3 different services producing events, each of them publishing to its own event store.
Each of these services consumes other producers services events.
This because each service has to process another service's events AND to create its own projection. Each of the service runs on multiple instances.
The most straight forward way to do it (for me) was to put "something" in front of each ES which is picking events and publishing (pub/sub) them in queues of every other service.
This is perfect because every service can subscribe to each topics it likes, while the event publisher is doing the job and if a service is unavailable events are still delivered. This seems to me to guarantee high scalability and availability.
My problem is the queue. I can't get an easily scalable queue that guarantees ordering of the messages. It actually guarantees "slightly out of order" with at-least once delivery: to be clear, it's AWS SQS.
So, the ordering problems are:
No order guaranteed across events from the same event stream.
No order guaranteed across events from the same ES.
No order guaranteed across events from different ES (different services).
I though I could solve the first two problems just by keeping track of the "sequence number" of the events coming from the same ES.
This would be done by tracking the last sequence number of each topic from which we are consuming events
This should be easy for reacting to events and also building our projection.
Then, when I pop an event from the queue, if the eventSequenceNumber > previousAppliedEventSequenceNumber + 1 i renqueue it (or make it invisible for a certain time).
But it turns out that using this solution, it will destroy performances when events are produced at high rates (I can use a visibility timeout or other stuff, the result should be the same).
This because when I'm expecting event 10 and I ignore event 11 for a moment, I should ignore also all events (from ES) with sequence numbers coming after that event 11, until event 11 shows up again and it's effectively processed.
Other difficulties were:
where to keep track of the event's sequence number for build the projection.
how to keep track of the event's sequence number for build the projection so that when appling it, I have a consistent lastSequenceNumber.
What I'm missing?
P.S.: for the third problem think at the following scenario. We have a UserService and a CartService. The CartService has a projection where for each user keeps track of the products in the cart. Each cart's projection must have also user's name and other info's that are coming from the UserCreated event published from the UserService. If UserCreated comes after ProductAddedToCart the normal flow requires to throw an exception because the user doesn't exist yet.
What I'm missing?
You are missing flow -- consumers pull messages from sources, rather than having sources push the messages to the consumers.
When I wake up, I check my bookmark to find out which of your messages I read last, and then ask you if there have been any since. If there have, I retrieve them from you in order (think "document message"), also writing down the new bookmarks. Then I go back to sleep.
The primary purpose of push notifications is to interrupt the sleep period (thereby reducing latency).
With SQS acting as a queue, the idea is that you read all of the enqueued messages at once. If there are no gaps, then you can order the collection then start processing them and acking them. If there are gaps, you either wait (leaving the messages in the queue) or you go to the event store to fetch copies of the missing messages.
There's no magic -- if the message pipeline is promising "at least once" delivery, then the consumers must take steps to recognize duplicate messages as they arrive.
If UserCreated comes after ProductAddedToCart the normal flow requires to throw an exception because the user doesn't exist yet.
Review Race Conditions Don't Exist, by Udi Dahan: "A microsecond difference in timing shouldn’t make a difference to core business behaviors."
The basic issue is assuming we can get messages IN ORDER...
This is a fallacy in distributed computing...
I suggest you design for no message ordering in your system.
As for your issues, try and use UTC time in the message body/header created by the originator and try and work around this data point. Sequence numbers are going to fail unless you have a central deterministic sequence creator (which will be a non-scalable, single point of failure).
Using Sagas/State machine is a path that can help to make sense of (business) events ordering.
we are currently working in a message driven Microservice environment and some of our messages/events are event sourced (using Apache Kafka). Now we are struggling with implementing more complex business requirements, were we have to take multiple events into account to create new events and side effects.
In the current situation we are working with devices that can produce errors and we already process them and have a single topic which contains ERROR_OCCURRED and ERROR_RESOLVED events (so they are in order). We also make sure, that all messages regarding a specific device always go onto the same partition. And both messages share an ID that identifies that specific error incident. We already have a projection that consumes those events and provides an API for our customers, s.t. they can see all occurred errors and their current state.
Now we have to deal with the following requirement:
Reporting Errors
We need a push system that reports errors of devices to our external partners, but only after 15 minutes and if they have not been resolved in that timeframe. Our first approach was to consume all ERROR_RESOLVED events, store the IDs and have another consumer that is handling the ERROR_OCCURRED events in a delayed fashion (e.g. by only consuming the next ERROR_OCCURRED event on the topic if its timestamp is at least 15 minutes old). We would then be able to know if that particular error has already been resolved and does not need to be reported (since they share a common ID with the corresponding ERROR_RESOLVED event). Otherwise we send an HTTP request to our external partner and create an ERROR_REPORTED event on a new topic. Is there any better approach for delayed and conditional message processing?
We also have to take the following special use cases into account:
Service restarts: currently we are planning to keep the list of resolved errors in memory, so if a service restarts, that list has to be created from scratch. We could just replay the ERROR_RESOLVED messages, but that may take some time and in that time no ERROR_OCCURRED events should be processed because that may result in reporting errors that have been resolved in less then 15 minutes, but we are just not aware of it. Are there any good practices regarding replay vs. "normal" processing?
Scaling: we may increase or decrease the number of instances of our service at any time, so the partition assignment may change during runtime. That should not be a problem if we create a consumer group for each service instance when consuming the ERROR_RESOLVED events, s.t. every instance knows all resolved errors while still only handling the ERROR_OCCURRED events of its assigned partitions (in another consumer group which is shared by all instances). Is there a better approach for handling partition reassignment and internal state?
Thanks in advance!
For side effects, I would record all "side" actions in the event store. In your particular example, when it is time to send a notification, I would call SEND_NOTIFICATION command that emit NOTIFICATION_SENT event. These events would be processed by some worker process that does actual HTTP request.
Actually I would elaborate this even furter, since notifications could fail, so I would have, say, two events NOTIFICATION_REQUIRED, and NORIFICATION_SENT, so we can retry failed notifications.
And finally your logic would be "if error was not resolved in 15 minutes and notification was not sent - send a notification (or just discard if it missed its timeframe)"
How does an event-sourcing system deal with derived data? All the examples I've read on event-sourcing demonstrate services reacting to fact events. A popular example seems to be:
Bank Account System
Events
Funds deposited
Funds withdrawn
Services
Balance Service
They then show how the Balance service can, at any point, derive a state (I.e. balance) from the events. That makes sense; those events are facts. There's no question that they happened - they are external to the system.
However, how do we deal with data calculated BY the system?
E.g.
Overdrawn service:
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Does the event-sourcing approach dictate how we should use (or not use) derived data? I.e. The balance. Perhaps one of the following?
1) Use: [Funds Withdrawn event] + [Balance service query]
Listen for the "Funds withdrawn" event and then ask the Balance service for the current balance.
2) Use: [Balance changed event]
Get the balance service to throw a "Balance changed" event containing the current balance. Presumably this isn't a "fact" as it's not external to the system, therefore prone to miscalculation.
3) Use: [Funds withdrawn event] + [Funds deposited event]
We could just skip the Balance service and have each service maintain its own balance directly from the facts. ...though that would result in each service having its own (potentially different) version of the balance.
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Executive summary: the way this is handled in event sourced systems is not actually all that different from the alternatives.
Stepping back a second - the advantage of having a domain model is to ensure that all proposed changes satisfy the business rules. Borrowing from the CQRS language: we send command messages to a command handler. The handler loads the state of the model, and tries to apply the command. If the command is allowed, the changes to the state of the domain model is updated and saved.
After persisting the state of the model, the command handler can query that state to determine if their are outstanding actions to be performed. Udi Dahan describes this in detail in his talk on Reliable messaging.
So the most straight forward way to describe your service is one that updates the model each time the account balance changes, and sets the "account overdrawn" flag if the balance is negative. After the model is saved, we schedule any actions related to that state.
Part of the justification for event sourcing is that the state of the domain model is derivable from the history. Which is to say, when we are trying to determine if the model allows a command, we load the history, and compute from the history the current state, and then use that state to determine whether the command is permitted.
What this means, in practice, is that we can write an AccountOverdrawn event at the same time that we write the AccountDebited event.
That AccountDebited event can be subscribed to - Pub/Sub. The typical handling is that the new events get published after they are successfully written to the book of record. An event listener subscribing to the events coming out of the domain model observes the event, and schedules the command to be run.
Digression: typically, we'll want at-least-once execution of these activities. That means keeping track of acknowledgements.
Therefore, the event handler is also a thing with state. It doesn't have any business state in it, and certainly no rules that would allow it to reject events. What it does track is which events it has seen, and which actions need to be scheduled. The rules for loading this event handler (more commonly called a process manager) are just like those of the domain model - load events from the book of record to obtain the current state, then see if the event being handled changes anything.
So it is really subscribing to two events - the AccountDebited event, and whatever event returns from the activity to acknowledge that it has completed.
This same mechanic can be used to update the domain model in response to events from elsewhere.
Example: suppose we get a FundsWithdrawn event from an ATM, and we need to update the account history to match it. So our event handler gets loaded, updates itself, and schedules a RecordATMWithdrawal command to be run. When the command loads, it loads the account, updates the balances, and writes out the AccountCredited and AccountOverdrawn events as before. The event handler sees these events, loads the correct state process state based on the meta data, and updates the state of the process.
In CQRS terms, this is all taking place in the "write models"; these processes are all about updating the book of record.
The balance query itself is easy - we already showed that the balance can be derived from the history of the domain model, and that's just how your balance service is expected to do it.
To sum up; at any given time you can load the history of the domain model, to query its state, and you can load up the history of the event processor, to determine what work has yet to be acknowledged.
Event sourcing is an evolving discipline with a bunch of diverse practices, practitioners and charismatic people. You can't expect them to provide you with some very consistent modelling technique for all scenarios like you described. Each one of those scenarios has it's pros and cons and you specified some of them. Also it may vary dramatically from one project to another, because business requirements (evolutionary pressures of the market) will be different.
If you are working on some mission-critical system and you want to have very consistent balance all the time - it's better to use RDBMS and ACID transactions.
If you need maximum speed and you are okay with eventually consistent states and not very anxious about precision of your balances (some events may be missing here and there for bunch of reasons) then you can derive your projections for balances from events asynchronously.
In both scenarios you can use event sourcing, but you don't necessarily have to generate your projections asynchronously. It's okay to generate projection in the same transaction scope as you making changes to your write model if you really need to do that.
Will it make Greg Young happy? I have no idea, but who cares about such things if your balances one day may go out of sync in mission-critical system ...
I have a site that contains 100 available books.
But at the same moment of time I have 101 request that want to reserve this book. How to handle this situation?
Robert Hanmer's book Patterns for Fault Tolerant Software has a pattern called Queue for resources (46):
Store requests for service that cannot be handled immediately in a queue [...]. Give the queue a finite length to improve the likelihood that the request is still important when it reaches the head of the line.
When the requests are computer generated and must be processed in order, a First In First Out (FIFO) queue should be used. When people are generating the requests, the queue should use a Last in First Out (LIFO, a.k.a. a stack) strategy (as in FRESH WORK BEFORE STALE (55)) to govern insertion and removal. This will help people receive good service. The request that was placed on the queue last will think that they received excellent service, and the person that placed the longest ago request on the queue probably gave up already.
Allocation of resources under the guidance of EQUITABLE RESOURCE ALLOCATION (45) should recognize both the requests that have been queued and those that are fresh and have never been queued.
You should have a waitlist table which holds waitlisted/reserved requests with subscriber number, book id and time requested. Then when one of books is returned - trigger a procedure which allots the freed book to the oldest waitlisted request and notifies the subscriber who has raised this request.
Instead of a trigger you can also have a scheduled jobs which runs daily/twice a day and checks whether any of the waitlisted books are available now. If available it allocates the book to the subscriber who has the oldest waitlisted request for that book.
Use trigger if your system load is not heavy, else use scheduled job and time it for non-peak hours.