Read ruleset topic/partition by multiple kafka stream instances of the same app - apache-kafka-streams

I am having a Kafka Stream app that does some processing in a main event topic and I also have a side topic that
is used to apply a ruleset to the main event topic.
Till now the app was running as a single instance and when
a rule was applied a static variable was set for the other processing operator (main topic consumer) to continue
operating evaluating rules as expected. This was necessary since the rule stream would be written to a single partition depending
on the rule key literal e.g. <"MODE", value> and therefore that way (through static variable) all the other tasks
involved would made aware of the change.
Apparently though when deploying the application to multiple nodes this approach could not work since having a
single consumer group (from e.g. two instance apps) would lead only one instance app setting its static variable to
the correct value and the other instance app never consuming that rule value (Also setting each instance app to a
different group id would lead to the unwanted side-effect of consuming the main topic twice)
On the other hand a solution of making the rule topic used as a global table would lead to have the main processing
operator querying the global table every time an event is consumed by that operator in order to retrieve the latest rules.
Is it possible to use some sort of a global table listener when a value is introduced in that topic to execute some
callback code and set a static variable ?
Is there a better/alternative approach to resolve this issue ?

Instead of a GlobalKTable, you can fall back to addGlobalStore() that allows you to execute custom code.

Related

CQRS Where to Query for business logic/Internal Processes

I'm currently looking at implementing CQRS driven by events (not yet event sourcing) in for a service at work; the reasoning being:
I need aggregate data to support a RestAPI coming out of this service (which will be used to populate views)- however the aggregated data will not be used by the application logic/processing (ie the data originating outside this service, the bits that of the aggregate originating within it will be used)
I need to stream events to other systems so that they can react to the data (will produce to a Kafka topic, so the 'read'/'projection' side of this system will consume the same events as the external systems, from these Kafka topics
I will be consuming events from internal systems to help populate the aggregate for the views in first point (ie it's data from this service and other's)
The reason for not going event sourced currently is that a) we're in a bit of a time crunch, and b) due to still learning about it. Having said which, it is something that we are looking to do in the future- though currently, we have a static DB in the 'Command' side of the system, which will just store current state
I'm pretty confident with the concept of using the aggregate data to provide the Rest API; however my confusion is coming from when I want to change a resource from within the system (for example via a cron job triggered 5 times a day) Example:
If I have resource of class x, which (given some data), wants a piece of state changing
I need to select instances of the class x which meet the requirements (from one of the DB's). Think select * from {class x} where last_changed_ date > 5 days ago;
Then create a command to change the state of these instances of x (in my case, the static command DB would be updated, as well as an event made to update the read DB)
The middle bullet point is what is confusing me. If I pull the data out of the Read DB, and check some information on it, then decide to change a property; I then have to convert the object from the 'Read Object' to the 'Command Object', so that I can then persist it and create an event? With my current architecture- I could query the command DB no problem, to find all the instances of {class x} that match the criteria, however I don't know if a) this is the right thing to do, and b) how this would work if I was using an event store as a DB? I'd have to query a table with millions of rows to find the most recent bit of state about the objects, to then see if they match?
Lots of what I read online has been very conceptual- so I think when it comes to implementations it maybe seems more difficult than it is? Anyhow, if anyone has any advice it would be hugely appreciated!
TIA :)
CQRS can be interpreted in a "permissive" way: rather than saying "thou shalt not query the command/write side", it says "it's OK to have a query/read side that's separate from the command/write side". Because you have this permission to do such separation, it follows that one can optimize the command/write side for a more write-heavy workload (in practice, there are always some reads in the command/write side: since command validation is typically done against some state, that requires some means of getting the state!). From this, it's extremely likely that there will be some queries which can be performed efficiently against the command/write side and some that can't be (without deoptimizing the command/write side). From this perspective, it's OK to perform the first kind of query against the command/write side: you can get the benefit of strong consistency by doing that, though be sure to make sure that you're not affecting the command/write side's primary raison d'etre of taking writes.
Event sourcing is in many ways the maximally optimized persistence model for a command/write side, especially if you have some means of keeping the absolute latest state cached and ensuring concurrency control. This is because you can then have many times more writes than reads. The tradeoff in event sourcing is that nearly all reads become rather more expensive than in an update-in-place model: it's thus generally the case that CQRS doesn't force event sourcing but event sourcing tends to force CQRS (and in turn, event sourcing can simplify ensuring that a CQRS system is eventually consistent, which can be difficult to ensure with update-in-place).
In an event-sourced system, you would tend to have a read-side which subscribes to the event stream and tracks the mapping of X ID to last updated and which periodically queries and issues commands. Alternatively, you can have a scheduler service that lets you say "issue this command at this time, unless canceled or rescheduled before then" and a read-side which subscribes to updates and schedules a command for the given ID 5 days from now after canceling the command from the previous update.

Camunda: Receive multiple, different messages at once

I am currently developing a kinda complex workflow with camunda. The goal of this workflow is to orchestrate the execution of different external business processes. Which includes start, overwatch and synchronize these workflows. Everything besides the synchronization works as expected.
Example:
My example has one main workflow which starts multiple sub workflows. The main workflow has to be aware when all sub workflows are finished. Every sub workflow is triggered by a message and sends a message back to the main workflow at the end of execution. Therefore, all sub workflows should be synchronized in the main workflow.
Xml can be accessed on this site: https://pastebin.com/2aj4z0zU
Unfortunately, this leads to numerous message correlation exceptions at the choke point in the main workflow (1st lane, after the first parallel gateway). I am using the following code to correlate the messages:
this.runtimeService.createMessageCorrelation(messageName)
.processInstanceId(processInstanceId)
.setVariables(payload)
.correlate();
The whole workflow is executable and runs without errors, but only if one example_workflow at a time is executed. Starting multiple example_workflows quickly one after another results in this type of exception randomly for every message type:
ENGINE-16004 Exception while closing command context: Cannot correlate message 'PROCESS_B_FINISHED': No process definition or execution matches the parameters org.camunda.bpm.engine.MismatchingMessageCorrelationException: Cannot correlate message 'PROCESS_B_FINISHED': No process definition or execution matches the parameters
at org.camunda.bpm.engine.impl.cmd.CorrelateMessageCmd.execute(CorrelateMessageCmd.java:88) ~[camunda-engine-7.14.0.jar!/:7.14.0]
Currently, the correlation exceptions occur if a postgresql database is used. The same workflow runs much better, but not perfect, when we use a h2 file-based database. All receive tasks are not configured asynchronously, only send tasks are (async before + exclusive).
Questions:
Is this already the best practice to synchronize multiple messages in one workflow?
What could be the reason for the correlation exceptions while using a postgresql database?
Used software:
spring boot application [Version:2.3.4]
camunda [Version:7.14.0]
h2 [Version:1.4.200]
postgresql [Version:42.2.22]
the process model seems to contain sequences where it can run into a deadlock (What if blue is followed directly by green? Or yellow?) or where you have race conditions. If the process has not reached a state where it is in a receiving state for the message, then the message delivery will fail (as indicated in the error message you shared)
(The reason you are observing the CorellationException more frequently on postgresql if the race condition. With this external database some operations take slightly more time, increasing the chance of the race condition occurring).
The process engine needs to be able to match a message to a unique receiver. If there are multiple potential receivers for the same message name, and no other correlation criteria creating a unique match is provided, then the delivery will also fail. You either need to use unique message names per instance or better use a businessKey or a process data which is unique per instance as additional correlation criteria. This is why it does not work when you run multiple process instances.
Modelling a workflow with this parallel message bottleneck leads to a race condition, as mentioned by #rob2universe's post.
To solve this problem, I had firstly to correlate the messages directly. I did this by adding a unique identifier to every message, which was not a big deal due to the fact that an item ID was defined within the payload of every message. Secondly, I had to remove all asynchronous and exclusive markers for every receive task and connected gateways. And thirdly, I had to reset the job executor properties to default values. Limiting the pool size and jobs per acquisition did not benefit the workflow execution.
After all these changes, my workflow now runs as expected with no errors. Unfortunately, due to the described bottleneck optimistic logging exceptions are common, but the workflow engine handles these exceptions without further errors.

Queueing mechanism and Elasticsearch 1.4.0

I have a RabbitMQ broker, on which I post different messages that will end up as documents in Elasticsearch. There are multiple consumers from the broker, which are actually different threads in a task executor assigned to an amqp inbound gateway (using spring integration and spring amqp here).
Think at the following scenario: I have created a doc in ES with the structure
{
"field1" : "value1",
"field2" : "value2"
}
Afterwards I send two update requests, both updating the same field, let's say field1. If I send this messages one right after another(common use case in production), my consumer threads will fetch the messages in the right order(amqp allows this), but the processing could happen in the wrong order and the later updated value could be overwritten by the first one. I will end up having wring data.
How can I make sure my data won't get corrupted? =>Having 1 single consumer thread is not enough, because if I want to scale out by adding more machines with my consuming app, I will still end up having multiple consumers. I might need ordering of messages, but having multiple machines I will probably need to create some sort of a cluster aware component, I am using SI, so this seems really hard to do in my opinion.
In pre 1.2 versions of ES, we used an external version, like a timestamp, and ES would have thrown VersionConflictException in my scenario:first update would have had version 10000 let's say, the second 10001 and if the first would have been processed first, ES would reject the request with version 10000 as it's lower than the existing one. But from the latest versions, ES guys have removed this functionality for update operations.
One solution might be to use multiple queues and have a single consumer on each queue; use a hash function to always route updates to the same document to the same queue see the RabbitMQ Tutorials for the various options.
You can scale out by adding more queues (and changing your hash function).
For resiliency, consider running your consumers in Spring XD. You can have a single instance of each rabbit source (for each queue) and XD will take care of failing it over to another container node if it goes down.
Otherwise you could roll your own by having a warm standby - inbound adapters configured with auto-startup="false" and have something monitor and use a <control-bus/> to start a new instance if the active one goes down.
EDIT:
In response to the fourth comment below.
As I said above, to scale out, you would have to change the hash function. So adding consumers automatically while running would be tricky.
You don't have to hard-code the queue names in the jar, you can use a property placeholder and fill it from properties, system properties, or an environment variable.
This solution is the simplest but does have these limitations.
You could, however, build a management app that could scale it out - stop the producer, wait for all queues to quiesce, reconfigure the consumers and restart the producer - Spring Integration provides a <control-bus/> to start/stop adapters; you can also do it via JMX.
Alternative solutions are possible but will generally require maintaining some shared state across a cluster (perhaps using zookeeper etc), so are much more complex; and you still have to deal with race conditions (where the second update might arrive at some consumer before the first).
You can use the default mechanism for consistency checks. Basically you want to verify that you have the latest version of whatever you are updating.
So for that you need to fetch the _version with the object. In queries you can do this by setting version=true on the toplevel. That will cause the _version to be returned along with your query results. Then when doing an update, you simply set the version parameter in the url to the value you have and it will generate a version conflict if it doesn't match.
Nicer is to handle updates using closures. Basically this works as follows: have an update method that fetches the object by id, applies a closure (parameter to the update function) that encapsulate the modifications you want to make, and then stores modified object. If you trap the still possible version conflict, you can simply get the object again and re-apply the closure to the object. We do this and added a random sleep before the retry as well, this vastly reduces the chance of multiple updates failing and is a nice design pattern. Keeping the read and write together minimizes the chance of a conflict and then retrying with a sleep before that minimizes it further. You could add multiple retries to further reduce the risk.

why do we use tibco mapper activity?

The tibco documentation says
The Mapper activity adds a new process variable to the process definition. This variable can be a simple datatype, a TIBCO ActiveEnterprise schema, an XML schema, or a complex structure.
so my question is tibco mapper does only this simple function.We can create process variables in process definition also(by right clicking on process definition).I looked for it in google but no body clearly explains why to use this activity and I have also tried in youtube and there also only one video and it does not explain clearly.I am looking for an example how it is used in large organizations and a real time example.Thanks in advance
The term "process variable" is a bit overloaded I guess:
The process variables that you define in the Process properties are stateful. You can use (read) their values anywhere in the process and you can change their values during the process using the Assign task (yellow diamond with a black equals sign).
The mapper activity produces a new output variable of that task that you can only use (read) in activities that are downstream from it. You cannot change its value after the mapper activity, as for any other activity's output.
The mapper activity is mainly useful to perform complex and reusable data mappings in it rather than in the mappers of other activities. For example, you have a process that has to map its input data into a different data structure and then has to both send this via a JMS message and log it to a file. The mapper allows you to perform the mapping only once rather than doing it twice (both in the Send JMS and Write to File activity).
You'll find that in real world projects, the mapper activity is quite often used to perform data mapping independently of other activities, it just gives a nicer structure to the processes. In contrast the Process Variables defined in the Process properties together with the Assign task are used much less frequently.
Here's a very simple example, where you use the mapper activity once to set a process variable (here the filename) and then use it in two different following activities (create CSV File and Write File). Obviously, the mapper activity becomes more interesting if the mapping is not as trivial as here (though even in this simple example, you only have one place to change how the filename is generated rather than two):
Mapper Activiy
First use of the filename variable in Create File
Second use of the filename variable in Write File
Process Variable/Assign Activity Vs Mapper Activity
The primary purpose of an assign task is to store a variable at a process level. Any variable in an assign task can be modified N times in a process. But a mapper is specifically used for introducing a new variable. We cannot change the same mapper variable multiple times in a project.
Memory is allocated to Process Variable when the process instance is created but in case of TIBCO Mapper the memory is allocated only when the mapper activity is executed in a process instance.
Process Variable is allocated a single slot of memory which is used to update/modify the schema thought the process instance execution i.e. N number of assign activity will access same memory allocated to the variable. Whereas using N mapper for a same schema will create N amount of memory.
Assign Activity can be is used to accumulate the output of a tibco activity inside a group.

Will MQSC define queue command ever delete or corrupt messages?

In WebSphere MQ 6, I want to script the creation of new queues. However the queues may already exist, and I need the script to be idempotent.
I can create queues using the commands documented here. For example:
DEFINE QREMOTE(%s) RNAME(%s) RQMNAME(%s) XMITQ(%s) DEFPSIST(YES) REPLACE
or
DEFINE QLOCAL(%s) DESCR(%s) DEFPSIST(YES) REPLACE
The REPLACE keyword ensures that creation does not fail if the queue already exists.
I've tested this with an existing, non-empty queue and it seems that no messages were lost. However this is not proof enough. I need to be certain that no messages will ever be lost or corrupted if I run a DEFINE Q... REPLACE command against an existing queue. The existing queue might even be participating in transactions at the time.
Can anyone confirm or deny this behaviour?
A DEFINE command with REPLACE fails if the object is open. Therefore you cannot redefine a queue with pending transactions. The manual states that all messages in the queue are retained during a DEFINE with REPLACE, and this implies no loss of message integrity. You can ALTER a queue with FORCE option to change a queue that is currently open as described here. That too retains messages in the queue without loss of integrity.
The DEFINE command will not affect the messages in a queue. The only effects you might notice are, for example, if you change the queue from FIFO to PRIORITY or vice versa. This only changes the indexing and ordering for new messages in the queue and does not affect existing messages. Similarly, changing attributes of the queue that affect handles only take effect the next time the queue is opened. An example of that is changing BIND(ONOPEN) to BIND(NOTFIXED).
One of the things that I have been recommending for a while for WMQ clusters is to split the queue definition up into build-time and run-time attributes. For example:
DEFINE QLOCAL (APP.FUNCTION.SUBFUNCTION.QA) +
GET(DISABLED) +
PUT(DISABLED) +
NOTRIGGER +
NOREPLACE
ALTER QLOCAL (APP.FUNCTION.SUBFUNCTION.QA) +
DESCR('APP service queue for QA') +
DEFPSIST(NO) +
BOTHRESH(5) +
BOQNAME('APP.FUNCTION.BACKOUT.QA') +
CLUSTER('DIV_QA') +
CLUSNL(' ') +
DEFBIND(NOTFIXED)
In this case the GET, PUT and TRIGGER attributes are considered run-time and are only set when the queue is first defined. This allows you to define a new queue in the cluster and have it be disabled until you are ready to turn on the app. In subsequent runs of the script, these attributes are never changed because the statement uses NOREPLACE. So once you enable GET and PUT on the queue these attributes (and the function of the app) are never disturbed by subsequent script runs.
The ALTER then handles all the attributes that are considered build-time. For example, if you change the description, you want it picked up in the next script run. Because we defined the queue in the previous step (or that step failed because the queue exists), we know the ALTER will work.
Whether any attribute such as the cluster membership is build-time or run-time is up to you to decide. This is just an example born from many cases where administrators inadvertently broke something by re-running the MQSC script.
But to answer your question a bit more on point, the things that break are because someone reset a run-time attribute such as GET(DISABLED) (which can cause an in-flight transaction to be backed out if the app tries to perform a GET on that queue after gets are disabled) and not because the change caused an integrity failure of the queue, a message or a transaction.

Resources