Atomically update database and send message. Outbox pattern or not? - microservices

You have a command/operation which means you both need to save something in database end send an event/message to another system. For example you have an OrderService and when a new order is created you want to publish an "OrderCreated"-event for another system/systems to react on (either direct message or using a message broker) and do something.
The easiest (and naive) implementation is to save in db and if successful then send message. But of course this is not bullet proof because the other service/message broker is down or your service crash before sending message.
One (and common?) solution is to implement "outbox pattern", i.e. instead of publish messages directly you save the message to an outbox table in your local database as part of your database transaction (in this example save to outbox table as well as order table) and have a different process (polling db or using change data capture) reading the outbox table and publish messages.
What is your solution to this dilemma, i.e. "update database and send message or do neither"? Note: I am not talking about using SAGAs (could be part of a SAGA though but this is next level).
I have in the past used different approaches:
"Do nothing", i.e just try to send the message and hope it will be sent. Which might be fine in some cases especially with a stable message broker running on same machine.
Using DTC (in my case MSDTC). Beside all the problem with DTC it might not work with your current solution.
Outbox pattern
Using an orchestrator which will retry process if you have not got a "completed" event.
In my current project it is not handled well IMO and I want to change it to be more resilient and self correcting. Sometimes when a service is calling another service and it fails the user might retry and it might work ok. But some operations might require out support to fix it (if it is even discovered).
ATM it is not a Microservice solution but rather two large (legacy) monoliths communicating and is running on same server but moving to a Microservice architecture in the near future and might run on multiple machines.

Related

Should we store Events in a database? (Event Driven Design)

We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.

Preventing data loss in client authoritative database writes

A project I'm working on requires users to insert themselves into a list on a server. We expect a few hundred users over a weekend and while very unlikely, a collision could happen in which two users submit the list concurrently and one of them is lost. The server has no validation, it simply allows you to get and put data.
I was pointed in the direction of "optimistic locking" but I'm having trouble grasping when exactly the data should be validated and how it prevents this from happening. If one of the clients reads the data, adds itself and then checks again to ensure that the data is the same with the use of an index or timestamp, how does this prevent the other client from doing the same and then one overwriting the other?
I'm trying to understand the flow in the context of two clients getting data and putting data.
The point of optimistic locking is that the decision to accept or reject a write is taken on the server, and is protected against concurrency by a pessimistic transaction or some sort of hardware protection, such as compare-and-swap. So a client requests a write together with some sort of timestamp or version identifier, and the server only accepts the write if the timestamp is still accurate. If it isn't the client gets some sort of rejection code and will have to try again. If it is, the client gets told that its write succeeded.
This is not the only way to handle receiving data from multiple clients. One popular alternative is to use a reliable messaging system - for example the Java Messaging Service specifies an interface for such systems for which you can find open source implementations. Clients write into the messaging system and can go away as soon as their message is accepted. The server reads requests from the messaging system and acts on them. If the server or the network goes down it's no big deal: the messages will still be there to be read when they come back (typically they are written to disk and have the same level of protection as database data although if you look at a reliable message queue implementation you may find that it is not, in fact, built on top of a standard database table).
One example of a writeup of the details of optimistic locking is the HTTP server Etag specification e.g. https://en.wikipedia.org/wiki/HTTP_ETag

How to maintain order of messages being processed in a mule flow from VM to JMS using one-way message exchange pattern?

I am using mulesoft ESB with Anypoint studio for a project. In one of my flows I am using one-way message exchange pattern to dispatch from VM (persistence file store VM connector) to JMS, both xa transaction enabled to avoid losing messages.
Consider a scenario where we send a message every time user updates his/her last name to ESB. For example, let's say user changes last name to 'A', but quickly changes to 'B', so final result is expected to be 'B'.
1) Is it likely that message 'B' gets processed before message 'A' in my case? and thus last name being set to 'A' instead of 'B'?
2) How do I avoid that apart from using 'request-response' MEP?
3) Is there a way to write unit tests for making sure order of messages being processed is maintained from VM (one-way, xa enabled) to JMS (one-way, xa enabled)?
4) How do I go about testing that manually?
Thank you in advance. Any pointers/help will be appreciated.
It's not likely, since your system would normally react way quicker than a user can submit requests. However, that may be the case during a load peak.
To really ensure message order, you really need a single bottleneck (a single instance/thread) in your solution to handle all requests. That is, you need to make sure your processing strategy in Mule is synchronous and that you only have a single consumer on the VM queue. If you have a HA setup with multiple Mule servers, you may have potential to get messages out of order. In that case, and if the user initially is connected using HTTP, you can get around most of the problem using a load balancer with a sticky session strategy.
A perhaps more robust and scalable solution is to make sure the user submits it's local timestamp on each request with high resolution. Then you can make sure to discard any "obsolete" updates when storing the information into a database. However, that is not in the mule VM/JMS layer, but rather in the database.
For testability - no, I don't think there is a truly satisfying way to be 100% sure messages won't come out of order during any condition by just writing integration tests or performing manual tests. You need to verify the message path theoretically to make sure there is no part where one message can bypass another.

How to get a return value from a send.Message and include the returned value as part of second message in MSMQ?

I'm pretty new to MSMQ 4.0. I got stuck with below scenario;
Service A takes User Details and Returns an User ID.
Then Service B takes Billing detials with User ID.
Now I have to Queue these steps. I'm planning to use Transaction Queue.
Could some one please help me with
1)Get the ID from first message and include it in the second message.
2)If at least one step failed I have to rollback(transaction Queue does it for me) retry or 5 times and if it still failed then move it to VerifyAdminQueue for verification by Admin.I dont like using DeadLetter Queue etc.,
Thanks in advance.
Services built with MSMQ queues are truly one-way. This means that there is no built in concept of a response. There are many ways you can implement a request-response communication pattern using MSMQ but with all of them you will need to construct and send the response back to the caller yourself.
With one way actions, rollback is very simple, and indeed MSMQ will rollback any failed steps in the transmission of a message. More complex operations such as request-response however lack any concept of a transaction in MSMQ and so any rollback across more than one message transmission steps will require you to write compensatory code.

Cache values in Java EE

I'm building a simple message delegation application. Messages are being send on both ends via JMS. I'm using a MDB to process incoming messages, transform them and send them to a target queue. Unfortunately the same messages can be send to the incoming queue more than once but it is not allowed to forward duplicates.
So what is the best way to accomplish that?
Since there can be multiple MDBs listening on the incoming queue a need a single cache where I can store the unique message uuids of the incoming messages for at least an hour. How should this cache be accessed? Via a singleton/ static class (I'm running Java EE 5 and thus don't have the singleton annotation)?
In addition I think all operations must be synchronized, right? Does that harm performance too much?
#Ingo: are you OK with database solution. You can full fledged DB server or simple apache derby solution for this..
If so, you can have a simple table where you can store message unique UId and can check against it for uniqueness....this solution will have following benefits:
Simple code
No need of time bound cache(1 hour). You can check for uniqueness of a message forever.
Persistent record of what messages came in.
No need of expensive synchronized, you can rely on DB isolation level to have consistency.
centralized solution for your possibly many deployments of application.

Resources