2pc - XA/Distributed Transaction Coordinator Implementation - microservices

I have watch several of article about XA/Distributed Transaction Coordinator, many of them just mentioned that a DBMS must explicitly support XA in order to run. They also talk about how a Distributed Transaction Coordinator works. However, after reading a lot of these information, yes, I know what a DTC does, but I don't even know how to start.
Have looking so long but I don't find a out of box DTC. Do we need to implement DTC by ourselves? Isn't there any existing usable DTC framework?
Some related StackOverflow post:
How do two-phase commits prevent last-second failure?
2PC distributed transactions across many microservices?
How to process distributed transaction within postgresql?

Related

How to rehydrate state stored aggregates in Axon Framework

I have state stored aggregates in a PostgreSQL database. I'm testing replay by deleting the state stored tables and the token_entry table and restarting the application. All events will be replayed and aggregates are restored in-memory. However, my state stored tables stay empty. I was thinking that they would also be restored?
I'm using SpringBoot and latest Axon. The code, at this moment, is as simple as it can get.
Whenever you're using Axon's State-Stored Aggregates, they'll only be stored as-is. Hence, throwing away the stored instances and starting the application will not trigger a replay.
When removing TrackingTokens, or initiating a replay by invoking StreamingEventProcessor#resetTokens (this is the recommended approach, by the way), you're effectively telling the Event Processors of your application to start event streaming from scratch.
This part of Axon Framework supports the so-called Query Side of CQRS. The Aggregate support in Axon Framework is specifically for the Command Side of an application.
Long story short, State-Stored Aggregates don't have replay support. If you want your Aggregates (Command Models) to be replayed, you will have to use Event Sourcing Aggregates instead.
I hope the above clarifies a little about the misconception between Token and Aggregates from your question. And by the way, if you feel Axon's Reference Guide should be adjusted to clarify your situation, you're always free to file an issue.

Microservice - persisting to RDBMS & queue within a transaction

I have a REST service - all its requests are persisted to its own relational database. So far, good. But, there is also a small business functionality (email notification, sms alert) that should be run on the newly received/updated data. For this process to work on data in background, it requires some way to know about the persisted data - a message queue would fix the problem. Three common ways I see designing this,
The REST service inserts into the database, also, publish to the queue, too.
The problem here is, distributed transaction - combining different types within one transaction - relational database & the queue. Some tools may support, some may not.
As usual REST service persists only to its database. Additionally it also inserts the data into another table to which a scheduled job queries, publishes them to queue (from which the background job should start its work).
The problem I see is the scheduler - not reactive, batchprocessing, limited by the time slot, not realtime, slow and others.
The REST endpoint publishes the data directly to a topic. A consumer persists it to the database, whereas another process it in the background.
Something like eventsourcing. TMU, it is bit complex to implement as the number of services grow. Also, if the db is down, the persistent service would fail to save the data, however the background service (say, the emailer) would send email which is functionaly wrong. This may lead to inconsistency among the services, also functional.
I have also thought of reading database transaction-logs, but it seems more complex, requires tools to configurations to make it work, also, it seems right for data processing systems than for our use case.
What's your thought on this - did I miss anything? How do you manage such scenarios? What should be looked for? Thinking reactive, say Vertx?
Apologies if this looks very naive, but I have to ask.
I think best approach is 2 with a CDC(change data capture) system like debezium.
See [https://microservices.io/patterns/data/transactional-outbox.html][1]
I usually recommend option 3 if you don't need immediate read after write consistency. Background job should retry if the database record is still not updated by the message it processes.
Your post exemplifies why queues shouldn't be used for these types of scenarios. They are good for delivering analytical data or logs, but for task orchestration developers have to reinvent the wheel every time.
The much better approach is to use a task orchestration system like Cadence Workflow that eliminates issues you described and makes multi-service orchestration much simpler.
See this presentation that explains the Cadence programming model.

Testing transaction capability of applications

Application 1(A1) sends messages to A2 over MQ. A2 uses XA transactions so that a message dropped on the queue is picked by A2, processed and written to the DB and the whole transaction is committed at once.
I would like to test whether A2 correctly maintains system consistency if the transaction fails mid-way and whether XA has been implemented correctly.
I would like to stop the DB as soon as A2 picks up the message. But I am not sure whether I will have enough time to stop the DB and whether I will know for sure that the message has been picked.
Any other suggestions for testing this?
Thanks,
Yash
I am assuming you are using Java here, otherwise, some of this won't be applicable.
The quick, pragmatic solution is to inject a delay into your process which will give you time to take your transactionally destructive action. The easiest way to do this would be to run the app in a debugger. Place a breakpoint at some suitable location (perhaps after the message has been received and the DB write is complete but not committed) and kill the DB when the debugger pauses the thread. Alternatively, add a test hook to your code whereby the thread will sleep if the MQ message has a header titled something unlikely like 'sleeponmessagereceived'.
A more complex but sophisticated technique is to use error injection via some AOP tool. I would definitely look at Byteman. It allows you to inject bytecode at runtime and was originally written to test XA scenarios like yours for the Arjuna transaction manager. You can inject code procedurally, or you can annotate unit testing procedures. Some of the advantages of this approach is that you can direct Byteman to trigger an error condition based on a variety of other conditions, such as nth invocation, or if a method arg is X. Also, depending on how detailed your knowledge of your transaction manager is, you can recreate a wider set of scenarios to produce some more tricky XA outcomes like Heuristic exceptions. There's some examples here that demonstrate how to use Byteman scripts to validate MQ XA Recovery. This project is intended to help reproducing XA failure and recovery. It is JBoss specific, but I would imagine you would be able to adapt to your environment.

When to use persistence with Java Messaging and Queuing Systems

I'm performing a trade study on (Java) Messaging & Queuing systems for an upcoming re-design of a back-end framework for a major web application (on Amazon's EC2 Cloud, x-large instances). I'm currently evaluating ActiveMQ and RabbitMQ.
The plan is to have 5 different queues, with one being a dead-letter queue. The number of messages sent per day will be anywhere between 40K and 400K. As I plan for the message content to be a pointer to an XML file location on a data store, I expect the messages to be about 64 bytes. However, for evaluation purposes, I would also like to consider sending raw XML in the messages, with an average file size of 3KB.
My main questions: When/how many messages should be persisted on a daily basis? Is it reasonable to persist all messages, considering the amounts I specified above? I know that persisting will decrease performance, perhaps by a lot. But, by not persisting, a lot of RAM is being used. What would some of you recommend?
Also, I know that there is a lot of information online regarding ActiveMQ (JMS) vs RabbitMQ (AMQP). I have done a ton of research and testing. It seems like either implementation would fit my needs. Considering the information that I provided above (file sizes and # of messages), can anyone point out a reason(s) to use a particular vendor that I may have missed?
Thanks!
When/how many messages should be persisted on a daily basis? Is it
reasonable to persist all messages, considering the amounts I
specified above?
JMS persistence doesn't replace a database, it should be considered a short-lived buffer between producers and consumers of data. that said, the volume/size of messages you mention won't tax the persistence adapters on any modern JMS system (configured properly anyways) and can be used to buffer messages for extended durations as necessary (just use a reliable message store architecture)
I know that persisting will decrease performance, perhaps by a lot.
But, by not persisting, a lot of RAM is being used. What would some of
you recommend?
in my experience, enabling message persistence isn't a significant performance hit and is almost always done to guarantee messages. for most applications, the processes upstream (producers) or downstream (consumers) end up being the bottlenecks (especially database I/O)...not JMS persistence stores
Also, I know that there is a lot of information online regarding
ActiveMQ (JMS) vs RabbitMQ (AMQP). I have done a ton of research and
testing. It seems like either implementation would fit my needs.
Considering the information that I provided above (file sizes and # of
messages), can anyone point out a reason(s) to use a particular vendor
that I may have missed?
I have successfully used ActiveMQ on many projects for both low and high volume messaging. I'd recommend using it along with a routing engine like Apache Camel to streamline integration and complex routing patterns
A messaging system must be used as a temporary storage. Applications should be designed to pull the messages as soon as possible. The more number of messages lesser the performance. If you are pulling of messages then there will be a better performance as well as lesser memory usage. Whether persistent or not memory will still be used as the messages are kept in memory for better performance and will backed up on disk if a message type is persistent only.
The decision on message persistence depends on how critical a message is and does it require to survive a messaging provider restart.
You may want to have a look at IBM WebSphere MQ. It can meet your requirements. It has JMS as well as proprietary APIs for developing applications.
ActiveMQ is a good choice for open source JMS, more expensive ones I can recommend are TIBCO EMS or maybe Solace.
But JMS is actually built for once-only delivery and longer persistence is left out of the specification. You could of course go database, but that's heavy weight and possibly expensive.
What I would recommend (Note: I work for CodeStreet) is our 'ReplayService for JMS'. It let's you store any type of JMS messages (or native WebSphere MQ ones) in a high-performance file-based disk storage. Each message is automatically assigned a nanosecond timestamp and a globalMsgID that you can overwrite on publication. So the XML messages could be recorded by the ReplayServer and your actual message could just contain the globalMsgID as reference. And maybe some properties ?
Once a receiver receives the globalMsgID, it could then replay that message from the ReplayServer, if needed.
But on the other hand, 400K*3KB XML message should be easily doable for ActiveMQ or others. Also, you should compress your XML messages before sending.

Ensure durability with messages when using WCF

I am wondering how we can ensure message durability when using websphere MQ and WCF. I want to be able to have my WCF process pick messages off of the queue and if there is an issue that the applciation encounters (power outage, etc) I don't lose the messages. I also would like to not have to use a transaction if at all possible because I want to eliminate distributed transactions.
Thanks,
S
Well, there's transactions and there's distributed transactions. The "right" answer is to use the WMQ 1-phase commit here. That doesn't have the complexity of XA transactions but it does give you the ability to roll back a message without losing it. In fact, when using clients you really should be using at least 1-phase commit just to prevent loss of messages.
Short of that there is always the "browse-with-lock, delete-message-under-cursor" method. I'm pretty sure everything you need to do the browseing, locking and deleting is exposed under .NET but perhaps Shashi will comment and confirm.
WebSphere MQ WCF custom channel has a feature "Assured Delivery" that guarantees that a service request or reply is actioned and not lost. This is the 1-phase commit (also known as SYNC_POINT in) WMQ.
"Assuered Delivery" is a service contract attribute. Here are more details about the feature.

Resources