DB transactionality in API exit - ibm-mq

This link states that the exit function can operate in the application's unit of work.
Imagine the application starts a UOW:
MQPUT a message to a queue
Insert a record in table T1 of a DB
Also, we have a Put_After exit function that also inserts a record in table T2 of the same DB.
As per the above link WebSphere MQ acting as a XA Transaction Manager, will treat the insertion into T1 and T2 as a single XA transaction.
My question is, will DB2 treat the two insertions as a single transaction?

Whether the inserts are one or two transactions depends on whether both are contained between the same MQBEGIN and MQCOMMIT calls. That said, the more you do within an exit the slower and less reliable MQ becomes. For example, an API exit that copies messages to another queue uses only the resources of the QMgr.
However an API exit that calls a DB traverses the network, possibly including the latency of DNS lookups, creates a TLS session (assuming the DB credentials are to be kept private), authenticates and signs-on to the DB, issues the XA PREPARE statements and other XA protocol, performs any inserts, then issues the COMMIT and the XA transaction completion.
Now imagine doing that for every message PUT or GET.
Also, if any of these actions taking place off the QMgr fails, it definitely kills the transaction, possibly kills the app, and depending on FASTPATH and other options, possibly trashes the QMgr.
It would be a much better design to put the before/after images to a separate MQ queue and have a completely independent program load those into the DB. This keeps the API calls extremely fast as they remain within MQ. You might lose only 50% of your throughput instead of 80% ~ 90% as would be the case is every PUT or GET entailed two external calls from the API exit and all the network traversal. This technique of offloading messages to a second queue for subsequent upload to the DB is in fact a common approach to posting audit messages.

Related

How to lock an object during a distributed transaction

I have been reading about microservices and distributed transactions. Most articles talk about 2 Phase commit or Saga pattern, but does not go into detail on how an object is locked, so that other can't access that data when the transaction has not completed.
if I have a customer service and an order service and I initiate a request to lock customers funds till the order has been processed. In a distributed system, how is this achieved.
In DB's is it possible to explicitly lock a row and then another request goes and unlocks the row or is this achieved using a locked field on the customers table that the first transaction sets it to locked and once the order is complete, it goes back and sets it to unlocked or empties that row?
If there are some examples with code samples that will be great
Most articles talk about 2 Phase commit or Saga pattern, but does not
go into detail on how an object is locked, so that other can't access
that data when the transaction has not completed.
The 2PC is defined as blocking. That means that if transaction manager, which manages the 2PC transaction, is down the 2PC can't be resolved. The transaction manager is then a single point of failure.
If you are ensured that a failed transaction manager is restarted then even the 2PC protocol is said to be blocking you are the assurance that transaction manager will be available and the resolution won't be blocked.
Then 2PC uses locks. They are required as a fundamental element of the protocol. Transaction manager communicates with participants - resources. The participant is the database. When the 2PC starts running then the call of prepare means that the database makes a persistent locks on all rows that participated in the transaction. This lock is released when transaction manager calls commit.
It's important to understand that the transaction before the 2PC is in-flight (not persistent). It's stored in-memory. After the prepare is called the transaction state is stored persistently, until commit is called (and at that time the protocol may be blocked by unavailable transaction manager - the lock is persistent and the system waits for the transaction manager to do release it).
That's about locking from the 2PC perspective. But there are transaction locks from database perspective.
When you update a row in database then the transaction is in-flight (stored in memory). At that time the database needs to ensure that concurrent updates won't corrupt your data. One way is to lock the row and do not permit the concurrent updates.
But, the most databases do not lock the row - by default, in dependence to isolation level - in these cases as they use snapshot isolation (MVCC, https://en.wikipedia.org/wiki/Snapshot_isolation). That particularly means that the row is locked optimistically and the database permits other transactions to update the row.
But! the 2PC prepare can't be processed optimistically. When the database replies 'OK' to prepare request from the transaction manager the row is just locked.
Plus, you can't manage this locking by hand. If you try to do so you ruin the 2PC guarantee of consistency.
As in your example there is a customer service and an order service. When the 2PC transaction spans over the both services. Then customer updates database and order service updates the database as well. There is still running in-flight transactions in the database. Then request finishes and transaction manager commands the in-flight transaction to commit. It runs the 2PC. It invokes prepare on the customer service db transaction, then on the order service transaction and then it calls to commit.
If you use the saga pattern then saga is span over the both services. From the transaction perspective the customer service creates a database in-flight transaction and it commits it immediately. Then the call goes to the order service where the same happens too. When the request finishes the saga checks that everything run fine. When a failure happened a compensate callback is called.
The failure is "the trouble" from the perspective of ease of use. For saga you need to maintain the failure resolution on your own in the callback method. For 2PC the failure resolution is processed automatically by rollback call.
A note: I tried to summarized the 2PC here: https://developer.jboss.org/wiki/TwoPhaseCommit2PC
I'm not truly sure if the explanation is comprehensible enough but you can try to check. And you may let me know what's wrongly explained there. Thanks.
In the micro-services world the transaction boundaries are within a service. The services rely on eventual consistency. So in your example the Order service will send a request(Synchronous or Asynchronous depending on application semantics and scale requirements) like Deduct x amount from customer y for order z.
The customer service would do the action on customer record in a transaction and return response to the customer like Order z successfully processed or Order z processing failed.
The order service can then trigger the confirmation/failure process of the order depending on response received.
The application typically choose between availability and ACID strong consistency. Most microservice based scenarios demand availability and high scalability as opposed to strong consistency, which means the communication between services is asynchronous and consistent state is eventually achieved.

spring batch: process large file

I have 10 large files in production, and we need to read each line from the file and convert comma separated values into some value object and send it to JMS queue and also insert into 3 different table in the database
if we take 10 files we will have 33 million lines. We are using spring batch(MultiResourceItemReader) to read the earch line and have write to write it o db and also send it to JMS. it roughly takes 25 hrs to completed all.
Eventhough we have 10 system in production, presently we use only one system to run this job( i am new to spring batch, and not aware how spring supports in load balancing)
Since we have only one system we configured data source to connect to db and max connection is specified as 25.
To improve the performance we thought to use spring multi thread support. started to use 5 threads. we could see the performance improvement and could see everything completed in 10 hours.
Here i Have below questions:
1) if i process using 5 threads, we will publish huge amount of data into JMS queue. Will queue support huge data.Note we have 10 systems in production to read JMS Message from the queue.
2) Using thread(5) and 1 production system is good approach (or) instead of spring batch insert the data into db i can create a rest service and spring batch calls the rest api to insert the data into db and let spring api inserts data into JmS queue(again, if spring batch process file annd use rest to insert data into db, per second i will read 4 or 5 lines and will call the rest api. Note we have 10 production system). If use rest API approach will my system support(rest can handle huge request using load balancer, and also JMS can handle huge and huge message) or using thread in spring batch app using 1 production system is better approach.
Different JMS providers are going to have different limits, but in general messaging can easily handle millions of rows in a small period of time.
Messaging is going to be faster than inserting directly into the database because a message has very little data to manage (other than JMS properties) instead of the overhead of a complete RDBMS or NoSQL database or whatever, messaging out performs them all.
Assuming the individual lines can be processed in any order, then sending all data to the same queue and have n consumers working the back-end is a sound solution.
Your big bottleneck, however, is getting the data into the database. If the destination table(s) have m/any keys/indices on them, there is going to be serious contention because each insert/update/delete needs to rebuild the indices, so even though you have n different consumers trying to update the database, they're going to trounce on each other as the transactions are completed.
One solution I've seen is disabling all database constrains before you start and enabling at the end, and hopefully if things worked the data is consistent and usable; of course, the risk is there was bad data that you didn't catch and now you need to clean up or reattempt the load
A better solution might be to transform the files into a single file that can be batch loaded into the database using a platform-specific tool. These tools often disable indexes, contraint checking, and anything else that's going to slow things down - often times bypassing SQL itself - to get performance.

In SQL Server Always ON configuration - will Transaction Log backup to Nul breaks Always On configuration?

Imagine we have two nodes participating in SQL 2012 AO. This is a test instance. During one of the index rebuild operation the log was grown up really big (250 GB). We are unable to back it up due to space constraint. What if we backup the Tlog to Nul (just to shrink it down) – will that break Always On?
AlwaysOn is a (marketing) umbrella term that covers both Availability Groups (AGs) and Failover Cluster Instances (FCIs). From context, I assume you are asking about AGs?
For both FCIs and AGs, the short answer is the same: performing transaction log backups (regardless of the destination) will not "break" your HA capabilities. However, I would urge you to NEVER EVER back up to NUL:, unless you don't care about the data in your database. Taking a log backup to NUL: (Regardless of if you were using an AG, FCI, or neither) will break your log backup chain, and prevent point-in-time recovery.
If you are using an Availability Group, SQL Server does not use transaction log backups to synchronize between nodes. It uses the transaction log itself, and therefore will not clear the transaction log if there is log data that needs to be synchronized to another node. That is to say: if your AG synchronization is behind, your transaction log will continue to fill/grow until synchronization catches up, regardless of the number of transaction log backups performed.
There are multiple reasons your transaction log might continue to grow, and AG synchronization is just one of those reasons. If SQL Server cannot reuse the transaction log because of unsynchronized transactions in the AG, the log_reuse_wait_desc column in sys.databases will show the value "AVAILABILITY_REPLICA".
Getting back to your root problem: Rebuilding an index made your transaction log get really, really big.
When you perform an ALTER INDEX...REBUILD, SQL Server creates the entire new index (a size-of-data operation), and must be able to roll back the index creation if it errors or is killed prior to completion. Therefore, you may see the log_reuse_wait_desc column in sys.databases showing as "ACTIVE_TRANSACTION" during a very large, long-running index rebuild. The rebuild itself would prevent you from reusing the log, and would cause the log to grow.

Dead lock is happening same data base record updating in multiple connection sessions concurrently

We have implemented client server socket based application to process multiple shopping cart requests. Daily we receive thousands of shopping cart requests.
For this we implemented multi threaded architecture to process requests concurrently. We are using Oracle Connection Pool for data base operations and we set optimal value for connection pool size. As per our business process we have a main database table and we need to update same set of rows by multiple threads using multiple connection sessions concurrently. Now are getting some dead lock issues because of multiple threads will try to update the data on same rows using multiple connection sessions concurrently and also we are some other primary key violations on tables. Sometimes data base is also getting locked by inserting same data in multiple connection sessions concurrently.
Please suggest me good approach to handle above problems immediately.
There are a few different general solutions to writing multithreaded code that does not encounter deadlocks. The simplest is to ensure that you always lock resources in the same order.
A deadlock occurs when one session holds a lock on A and wants a lock on B while another session holds a lock on B and wants a lock on A. If you ensure that your code always locks A before B (or B before A), you can be guaranteed that you won't have a deadlock.
As for your comment about primary key violations, are you using something other than an Oracle sequence to generate your primary keys? If so, that is almost certainly the problem. Oracle sequences are explicitly designed to provide unique primary keys in the case where you have multiple sessions doing simultaneous inserts.

Compare and Contrast Change Data Capture and Database Change Notification

Oracle has two seemingly competing technologies. CDC and DCN.
What are the strengths of each?
When would you use one and not the other?
In general, you would use DCN to notify a client application that the client application needs to clear/ update the application's cache. You would use CDC for ETL processing.
DCN would generally be preferable when you have an OLTP application that needs to be notified immediately about data changes in the database. Since the goal here is to minimize the number of network round-trips and the number of database hits, you'd generally want the application to use DCN for queries which either are mostly static. If a large fraction of the query is changing regularly, you may be better off just refreshing the application's cache on a set frequency rather than running queries constantly to get the changed data (DCN does not contain the changed data, just the ROWID of the row(s) that changed). If the application goes down, I believe DCN allows changes to be lost.
CDC would generally be preferable when you have a DSS application that needs to periodically pull over all the data that changed in a number of tables. CDC can guarantee that the subscriber has received every change to the underlying table(s) which can be important if you are trying to replicate changes to a different database . CDC allows the subscriber to pull the changes at its convenience rather than trying to notify the subscriber that there are changes, so you'd definitely want CDC if you wanted the subscriber to process new data every hour or every day rather than in near real time. (note: DCN also has a guaranteed delivery mode, see comments below. --Mark Harrison)
CDC seems to be much more complex to set up than DCN.
I mean to setup DCN I wrap a select in a start and end DCN block and then write a procedure to be called with a collect of changes. That's it.
CDC requires publishers and subscribers and anyways, seems like more work.

Resources