What's a good way to handle "async" commits? - oracle

I have a WCF service that uses ODP.NET to read data from an Oracle database. The service also writes to the database, but indirectly, as all updates and inserts are achieved through an older layer of business logic that I access via COM+, which I wrap in a TransactionScope. The older layer connects to Oracle via ODBC, not ODP.NET.
The problem I have is that because Oracle uses a two-phase-commit, and because the older business layer is using ODBC and not ODP.NET, the transaction sometimes returns on the TransactionScope.Commit() before the data is actually available for reads from the service layer.
I see a similar post about a Java user having trouble like this as well on Stack Overflow.
A representative from Oracle posted that there isn't much I can do about this problem:
This maybe due to the way OLETx
ITransaction::Commit() method behaves.
After phase 1 of the 2PC (i.e. the
prepare phase) if all is successful,
commit can return even if the resource
managers haven't actually committed.
After all the successful "prepare" is
a guarantee that the resource managers
cannot arbitrarily abort after this
point. Thus even though a resource
manager couldn't commit because it
didn't receive a "commit" notification
from the MSDTC (due to say a
communication failure), the
component's commit request returns
successfully. If you select rows from
the table(s) immediately you may
sometimes see the actual commit occur
in the database after you have already
executed your select. Your select will
not therefore see the new rows due to
consistent read semantics. There is
nothing we can do about this in Oracle
as the "commit success after
successful phase 1" optimization is
part of the MSDTC's implementation.
So, my question is this:
How should I go about dealing with the possible delay ("asyc" via the title) problem of figuring out when the second part of the 2PC actually occurs, so I can be sure that data I inserted (indirectly) is actually available to be selected after the Commit() call returns?
How do big systems deal with the fact that the data might not be ready for reading immediately?

I assume that the whole transaction has prepared and a commit outcome decided by the TransactionManager, therefore eventually (barring heuristic damage) the Resource Managers will receive their commit message and complete. However, there are no guarantees as to how long that might take - could be days, no timeouts apply, having voted "commit" in the Prepare the Resource Manager must wait to hear the collective outcome.
Under these conditions, the simplest approach is to take "an understood, we're thinking" approach. Your request has been understood, but you actually don't know the outcome, and that's what you tell the user. Yes, in all sane circumstances the request will complete, but under some conditions operators could actually choose to intervene in the transaction manually (and maybe cause heuristic damage in doing so.)
To go one step further, you could start a new transaction and perform some queries to see if the data is there. Now, if you are populating a result screen you will naturally be doing such as query. The question would be what to do if the expected results are not there. So again, tell the user "your recent request is being processed, hit refresh to see if it's complete". Or retry automatically (I don't much like auto retry - prefer to educate the user that it's effectively an asynch operation.)

Related

CQRS Where to Query for business logic/Internal Processes

I'm currently looking at implementing CQRS driven by events (not yet event sourcing) in for a service at work; the reasoning being:
I need aggregate data to support a RestAPI coming out of this service (which will be used to populate views)- however the aggregated data will not be used by the application logic/processing (ie the data originating outside this service, the bits that of the aggregate originating within it will be used)
I need to stream events to other systems so that they can react to the data (will produce to a Kafka topic, so the 'read'/'projection' side of this system will consume the same events as the external systems, from these Kafka topics
I will be consuming events from internal systems to help populate the aggregate for the views in first point (ie it's data from this service and other's)
The reason for not going event sourced currently is that a) we're in a bit of a time crunch, and b) due to still learning about it. Having said which, it is something that we are looking to do in the future- though currently, we have a static DB in the 'Command' side of the system, which will just store current state
I'm pretty confident with the concept of using the aggregate data to provide the Rest API; however my confusion is coming from when I want to change a resource from within the system (for example via a cron job triggered 5 times a day) Example:
If I have resource of class x, which (given some data), wants a piece of state changing
I need to select instances of the class x which meet the requirements (from one of the DB's). Think select * from {class x} where last_changed_ date > 5 days ago;
Then create a command to change the state of these instances of x (in my case, the static command DB would be updated, as well as an event made to update the read DB)
The middle bullet point is what is confusing me. If I pull the data out of the Read DB, and check some information on it, then decide to change a property; I then have to convert the object from the 'Read Object' to the 'Command Object', so that I can then persist it and create an event? With my current architecture- I could query the command DB no problem, to find all the instances of {class x} that match the criteria, however I don't know if a) this is the right thing to do, and b) how this would work if I was using an event store as a DB? I'd have to query a table with millions of rows to find the most recent bit of state about the objects, to then see if they match?
Lots of what I read online has been very conceptual- so I think when it comes to implementations it maybe seems more difficult than it is? Anyhow, if anyone has any advice it would be hugely appreciated!
TIA :)
CQRS can be interpreted in a "permissive" way: rather than saying "thou shalt not query the command/write side", it says "it's OK to have a query/read side that's separate from the command/write side". Because you have this permission to do such separation, it follows that one can optimize the command/write side for a more write-heavy workload (in practice, there are always some reads in the command/write side: since command validation is typically done against some state, that requires some means of getting the state!). From this, it's extremely likely that there will be some queries which can be performed efficiently against the command/write side and some that can't be (without deoptimizing the command/write side). From this perspective, it's OK to perform the first kind of query against the command/write side: you can get the benefit of strong consistency by doing that, though be sure to make sure that you're not affecting the command/write side's primary raison d'etre of taking writes.
Event sourcing is in many ways the maximally optimized persistence model for a command/write side, especially if you have some means of keeping the absolute latest state cached and ensuring concurrency control. This is because you can then have many times more writes than reads. The tradeoff in event sourcing is that nearly all reads become rather more expensive than in an update-in-place model: it's thus generally the case that CQRS doesn't force event sourcing but event sourcing tends to force CQRS (and in turn, event sourcing can simplify ensuring that a CQRS system is eventually consistent, which can be difficult to ensure with update-in-place).
In an event-sourced system, you would tend to have a read-side which subscribes to the event stream and tracks the mapping of X ID to last updated and which periodically queries and issues commands. Alternatively, you can have a scheduler service that lets you say "issue this command at this time, unless canceled or rescheduled before then" and a read-side which subscribes to updates and schedules a command for the given ID 5 days from now after canceling the command from the previous update.

Saga Compensating Transaction

Currently working on initial phase of Microservice Architecture for product. It is evident that for many operation it required to have distributed transaction or say there are couple of operation required across different microservice in-order to complete one business process.
For this purpose I found that Saga is useful. Now for ideal case or where everything goes correct then it works fine but when something is not correct or some activity failed at that moment we may have to rollback those operation. For this there is something called "Compensating" transaction or operation required. Now It is completely possible when operation performed specially for successful operation, it may possible that other transaction also performed on that service so db might be in different state then when actually operation performed.
What will be the solution for this ? One solution I think is that somehow state needs to preserve so it can revisit but like for stock may be change to some other transaction so I feel that compensating transaction would be a problem.

Does using a transaction but not actually making any queries have a resource cost?

Ok so one of our team members has suggested that at the beginning of every http request we begin a DB transaction (we are using Entity Framework Core), do the work of the request, and then complete the transaction if the response is 200 Ok, or roll back if it is anything else.
This means we would only commit on successful requests.
That is well and good, when we perform reads and writes to the DB.
However I am wondering does this come at a cost, if we don't actually make any reads or writes to the db?
If you use TransactionScope for this then the transaction is only physically opened on the first database access. The cost for an unused scope is extremely low.
If you use normal EF transactions then an empty transaction will hit the database three times:
BEGIN TRAN
COMMIT
Reset connection for connection pooling
Each of these is extremely low cost. You can test the cost of this by simply running this 100000 times in a loop. It might very well be the case that you don't care about this small cost.
I still would advise against this. In my experience web applications require more flexibility than a 1:1 correspondence of web request and transaction. Also, the rule to use the HTTP status code to decide the transaction will turn out to be inflexible.
Also, you must pick an isolation level (and possibly timeout) for each transaction. At the beginning of an HTTP request it is not known what the right values are. Only the action knows.
I had good experiences with using one EF context per HTTP request and then manually using transactions inside of each action. The overhead in terms of LOC is very small. There is no pressing need to centralize this.
Don't blindly put BEGIN...COMMIT around everything. There are cases where this is just wrong.
What if the web page records the presence of the user, or the loading of the particular page? Having a ROLLBACK destroys that information.
What if there are two actions on the page, and they are independent of each other? That is a ROLLBACK for one is OK, but you want to COMMIT the other?
What if there are no writes on the page? Then there is no need for BEGIN...COMMIT.

Real life scenarios for Transaction propagations

There are various transaction propagations like
REQUIRED - This is the case for DML operation.
SUPPORTS - This is the case for querying database.
MANDATORY - ?
REQUIRES_NEW - ?
NOT_SUPPORTED - ?
NEVER - ?
NESTED - ?
What are some real life scenarios for these transaction propagations? Why these are perfect fit into that situation?
There are various usages and there is no simple answer but I'll try to be the most explainatory
MANDATORY (expecting transaction): Typically used when you expect that any "higher-context" layer started the transaction and you only want to continue in it. For example if you want one transaction for whole operation from HTTP request to response. Then you start transaction on JAX-RS resource level and lower layers (services) require transaction to run within (otherwise exception is thrown).
REQUIRES_NEW (always create new transaction): This creates a new transaction and suspends the current one if any exists. From above example, this is the level you set on your JAX-RS resource for example. Or generally if your flow somehow changes and you want to split your logic into multiple transactions (so your code have mutliple logic operations that should be separated).
REQUIRED (continue in transaction or create if needed): Some kind of mix between MANDATORY and REQUIRES_NEW. In MANDATORY you expect that the transaction exists, for this level you hope that it exists and if not, you create it. Typically used in DAO-like services (from my experience), but it really depends on logic of your app.
SUPPORTS (caller decides whether to not/run in transaction): Used if want to use the same context as caller (higher context), if your caller was running in transaction, then you run in too. If it didn't, you are also non-transactional. May also be used in DAO-like services if you want higher context to decide.
NESTED (sub-transaction): I must admit I didn't use this one in real code but generally you create a real sub-transaction that works as some kind of checkpoint. So it runs in the context of "parent" transaction but if it fails it returns to that checkpoint (start of nested transaction). This may be useful when you require this kind of logic in your application, for example if you want to insert large number of items to the database, commiting valid ones and keeping track of invalid ones (so you can catch exception when nested transaction fails but can still commit the whole transaction for valid ones)
NEVER (expecting there is no transaction): This is the case when you want to be sure that no transaction exists at all. If some code that runs in transaction reaches this code, exception is thrown. So typically this is for cases completely opposite to MANDARTORY. E.g. when you know that no transaction should be affected by this code - very likely because the transaction should not exist.
NOT_SUPPORTED (continue non-transactionally): This is weaker than NEVER, you want the code to be run non-transactionally. If somehow you enter this code from context where transaction is, you suspend this transaction and continue non-transactionally.
From my experience, you very often want one business action to be atomic. Thus you want only one transaction per request/... For example simple REST call via HTTP that does some DB operations all in one HTTP-like transaction. So my typical usage is REQUIRES_NEW on the top level (JAX-RS Resource) and MANDATORY on all lower level services that are injected to this resource (or even lower).
This may be useful for you. It describes how code behave with given propagation (caller->method)
REQUIRED: NONE->T1, T1->T1
REQUIRES_NEW: NONE->T1, T1->T2
MANDATORY: NONE->Exception, T1->T1
NOT_SUPPORTED: NONE->NONE, T1->NONE
SUPPORTS: NONE->NONE, T1->T1
NEVER: NONE->NONE, T1->Exception

Pause the execution of a process for a while in ejb

I think I may not be the first one with this problem.
Sometimes, the user submits a bunch of data to the server, and these data
is going to be displayed in the response page. In order to give users the illusion
that the data submission and process is fast. We usually do this asynchronously.
Now the problem is, for some reason, these data need to go to database first,
and be fetched to appear in the response page. If the response page is displayed
to the user too fast, asynchronous submission may not finish; Now I call
Thread.sleep();
before I call I setResponsePage().
but native thread is not recommended in EJB. Anyone knows alternatives ? Thanks
It's just been discussed in this question: Thread.sleep() in an EJB.
I'd split the logic into two EJBs: one for inserting the user data into DB, and one for fetching it. Your web layer would call one after the other, resulting in two separate transactions, which should be ordered properly by the database (still, it might depend on other factors, like transaction isolation).
EDIT
The problem with sleep() is that you never know how long to wait, so it's almost always a bad idea. I see a case here for Ajax push — your EJB should return immediately with a page to which data will be pushed when processing is complete. I won't advise you further on this topic, as I'm far from expertise in this area.
A still imperfect, but better than sleep(), could be syncing on database locks: first EJB would insert data and lock some record in its transaction, the second EJB would try to lock the same record and read the data. This way the second EJB would wait for minimal time that's needed.

Resources