When to session.commit() in a NiFi Processor - apache-nifi

I am implementing a NiFi processor and have couple of clarifications to make with respect to best practices:
session.getProvenanceReporter().modify(...) - Should we emit the event immediately after every session.transfer()
session.commit() - Documentation says, after performing operations on flowfiles, either commit or rollback can be invoked.
Developer guide: https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#process_session
Question is, what do I lose by not invoking these methods explicitly?

1) Yes typically the provenance event is emitted after transferring the flow file.
2) It depends if you are extending AbstractProcessor, or AbstractSessionFactoryProcessor. AbstractProcessor will call commit or rollback for you so you don't need to, AbstractSessionFactoryProcessor requires you to call them appropriately.
If you are extending AbstractSessionFactoryProcessor and never call commit, eventually that session will get garbage collected and rollback will be called, and all the operations performed by that session will be rolled back.
There is also an annotation #SupportsBatching which can be placed on a processor. When this annotation is present, the UI shows a slider on the processor's scheduling tab that indicates how many milliseconds worth of framework operations like commit() can be batched together behind the scenes for increased throughput. If latency is more important then leaving the slides at 0 milliseconds is appropriate, but the key here is that the user gets to decide this when building the flow and configuring the processor.

Related

Saga Compensating Transaction

Currently working on initial phase of Microservice Architecture for product. It is evident that for many operation it required to have distributed transaction or say there are couple of operation required across different microservice in-order to complete one business process.
For this purpose I found that Saga is useful. Now for ideal case or where everything goes correct then it works fine but when something is not correct or some activity failed at that moment we may have to rollback those operation. For this there is something called "Compensating" transaction or operation required. Now It is completely possible when operation performed specially for successful operation, it may possible that other transaction also performed on that service so db might be in different state then when actually operation performed.
What will be the solution for this ? One solution I think is that somehow state needs to preserve so it can revisit but like for stock may be change to some other transaction so I feel that compensating transaction would be a problem.

How to avoid two bookings when one create booking at same time where the available booking count 1

I am beginner to spring , now in my project we have method to make booking ,now if the available booking is 1 , when two user are trying booking at same time ideally only one booking is allowed , but my application is allowing two bookings, now i made the method as synchronized now it is working fine, but synchronization concept belongs to JVM now if I am configuring my application in cluster mode there are different servers in different machines(so different JVMS), now synchronization wont work.
Can any one please tell me the possible solution for this to restrict the booking,tell me solution from JAVA side and and also from DB side
If the application may be deployed in cluster, the synchronized method will indeed not be enough.
You should rely on a node shared by all server instances : the DB.
You could use DB locking features, especially optimistic locking and pessimistic locking.
With them, you could avoid collisions resulting from concurrent updates to the same line by concurrent clients.
Choose which one that matches better to your use case.
Concurrency control in databases extract:
Optimistic - Delay the checking of whether a transaction meets the
isolation and other integrity rules (e.g., serializability and
recoverability) until its end, without blocking any of its (read,
write) operations ("...and be optimistic about the rules being
met..."), and then abort a transaction to prevent the violation, if
the desired rules are to be violated upon its commit. An aborted
transaction is immediately restarted and re-executed, which incurs an
obvious overhead (versus executing it to the end only once). If not
too many transactions are aborted, then being optimistic is usually a
good strategy.
Pessimistic - Block an operation of a transaction, if
it may cause violation of the rules, until the possibility of
violation disappears. Blocking operations is typically involved with
performance reduction.
JDBC and JPA support both.
You should try with ETags - optimistic locking mechanism.
It can be easily implemented with Spring. Please check official documentation.
Hope this will help.

CQRS + Microservices Handling event rollback

We are using microservices, cqrs, event store using nodejs cqrs-domain, everything works like a charm and the typical flow goes like:
REST->2. Service->3. Command validation->4. Command->5. aggregate->6. event->7. eventstore(transactional Data)->8. returns aggregate with aggregate ID-> 9. store in microservice local DB(essentially the read DB)-> 10. Publish Event to the Queue
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
Any suggestions would be highly appreciated.
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
You retry it later.
The "book of record" is the event store. The downstream views (the "published events", the read models) are derived from the book of record. They are typically behind the book of record in time (eventual consistency) and are not typically synchronized with each other.
So you might have, at some point in time, 105 events written to the book of record, but only 100 published to the queue, and a representation in your service database constructed from only 98.
Updating a view is typically done in one of two ways. You can, of course, start with a brand new representation and replay all of the events into it as part of each update. Alternatively, you track in the metadata of the view how far along in the event history you have already gotten, and use that information to determine where the next read of the event history begins.
Inside your event store, you could track whether read-side replication was successful.
As soon as step 9 suceeds, you can flag the event as 'replicated'.
That way, you could introduce a component watching for unreplicated events and trigger step 9. You could also track whether the replication failed multiple times.
Updating the read-side (step 9) and flagigng an event as replicated should happen consistently. You could use a saga pattern here.
I think i have now understood it to a better extent.
The Aggregate would still be created, answer is that all the validations for any type of consistency should happen before my aggregate is constructed, it is in case of a failure beyond the purview of the code that a failure exists while updating the read side DB of the microservice which needs to be handled.
So in an ideal case aggregate would be created however the event associated would remain as undispatched unless all the read dependencies are updated, if not it remains as undispatched and that can be handled seperately.
The Event Store will still have all the event and the eventual consistency this way is maintained as is.

Real life scenarios for Transaction propagations

There are various transaction propagations like
REQUIRED - This is the case for DML operation.
SUPPORTS - This is the case for querying database.
MANDATORY - ?
REQUIRES_NEW - ?
NOT_SUPPORTED - ?
NEVER - ?
NESTED - ?
What are some real life scenarios for these transaction propagations? Why these are perfect fit into that situation?
There are various usages and there is no simple answer but I'll try to be the most explainatory
MANDATORY (expecting transaction): Typically used when you expect that any "higher-context" layer started the transaction and you only want to continue in it. For example if you want one transaction for whole operation from HTTP request to response. Then you start transaction on JAX-RS resource level and lower layers (services) require transaction to run within (otherwise exception is thrown).
REQUIRES_NEW (always create new transaction): This creates a new transaction and suspends the current one if any exists. From above example, this is the level you set on your JAX-RS resource for example. Or generally if your flow somehow changes and you want to split your logic into multiple transactions (so your code have mutliple logic operations that should be separated).
REQUIRED (continue in transaction or create if needed): Some kind of mix between MANDATORY and REQUIRES_NEW. In MANDATORY you expect that the transaction exists, for this level you hope that it exists and if not, you create it. Typically used in DAO-like services (from my experience), but it really depends on logic of your app.
SUPPORTS (caller decides whether to not/run in transaction): Used if want to use the same context as caller (higher context), if your caller was running in transaction, then you run in too. If it didn't, you are also non-transactional. May also be used in DAO-like services if you want higher context to decide.
NESTED (sub-transaction): I must admit I didn't use this one in real code but generally you create a real sub-transaction that works as some kind of checkpoint. So it runs in the context of "parent" transaction but if it fails it returns to that checkpoint (start of nested transaction). This may be useful when you require this kind of logic in your application, for example if you want to insert large number of items to the database, commiting valid ones and keeping track of invalid ones (so you can catch exception when nested transaction fails but can still commit the whole transaction for valid ones)
NEVER (expecting there is no transaction): This is the case when you want to be sure that no transaction exists at all. If some code that runs in transaction reaches this code, exception is thrown. So typically this is for cases completely opposite to MANDARTORY. E.g. when you know that no transaction should be affected by this code - very likely because the transaction should not exist.
NOT_SUPPORTED (continue non-transactionally): This is weaker than NEVER, you want the code to be run non-transactionally. If somehow you enter this code from context where transaction is, you suspend this transaction and continue non-transactionally.
From my experience, you very often want one business action to be atomic. Thus you want only one transaction per request/... For example simple REST call via HTTP that does some DB operations all in one HTTP-like transaction. So my typical usage is REQUIRES_NEW on the top level (JAX-RS Resource) and MANDATORY on all lower level services that are injected to this resource (or even lower).
This may be useful for you. It describes how code behave with given propagation (caller->method)
REQUIRED: NONE->T1, T1->T1
REQUIRES_NEW: NONE->T1, T1->T2
MANDATORY: NONE->Exception, T1->T1
NOT_SUPPORTED: NONE->NONE, T1->NONE
SUPPORTS: NONE->NONE, T1->T1
NEVER: NONE->NONE, T1->Exception

What's a good way to handle "async" commits?

I have a WCF service that uses ODP.NET to read data from an Oracle database. The service also writes to the database, but indirectly, as all updates and inserts are achieved through an older layer of business logic that I access via COM+, which I wrap in a TransactionScope. The older layer connects to Oracle via ODBC, not ODP.NET.
The problem I have is that because Oracle uses a two-phase-commit, and because the older business layer is using ODBC and not ODP.NET, the transaction sometimes returns on the TransactionScope.Commit() before the data is actually available for reads from the service layer.
I see a similar post about a Java user having trouble like this as well on Stack Overflow.
A representative from Oracle posted that there isn't much I can do about this problem:
This maybe due to the way OLETx
ITransaction::Commit() method behaves.
After phase 1 of the 2PC (i.e. the
prepare phase) if all is successful,
commit can return even if the resource
managers haven't actually committed.
After all the successful "prepare" is
a guarantee that the resource managers
cannot arbitrarily abort after this
point. Thus even though a resource
manager couldn't commit because it
didn't receive a "commit" notification
from the MSDTC (due to say a
communication failure), the
component's commit request returns
successfully. If you select rows from
the table(s) immediately you may
sometimes see the actual commit occur
in the database after you have already
executed your select. Your select will
not therefore see the new rows due to
consistent read semantics. There is
nothing we can do about this in Oracle
as the "commit success after
successful phase 1" optimization is
part of the MSDTC's implementation.
So, my question is this:
How should I go about dealing with the possible delay ("asyc" via the title) problem of figuring out when the second part of the 2PC actually occurs, so I can be sure that data I inserted (indirectly) is actually available to be selected after the Commit() call returns?
How do big systems deal with the fact that the data might not be ready for reading immediately?
I assume that the whole transaction has prepared and a commit outcome decided by the TransactionManager, therefore eventually (barring heuristic damage) the Resource Managers will receive their commit message and complete. However, there are no guarantees as to how long that might take - could be days, no timeouts apply, having voted "commit" in the Prepare the Resource Manager must wait to hear the collective outcome.
Under these conditions, the simplest approach is to take "an understood, we're thinking" approach. Your request has been understood, but you actually don't know the outcome, and that's what you tell the user. Yes, in all sane circumstances the request will complete, but under some conditions operators could actually choose to intervene in the transaction manually (and maybe cause heuristic damage in doing so.)
To go one step further, you could start a new transaction and perform some queries to see if the data is there. Now, if you are populating a result screen you will naturally be doing such as query. The question would be what to do if the expected results are not there. So again, tell the user "your recent request is being processed, hit refresh to see if it's complete". Or retry automatically (I don't much like auto retry - prefer to educate the user that it's effectively an asynch operation.)

Resources