Selective dequeue of unrelated messages in Oracle Advanced Queueing - oracle

This question refers to the dequeueing of messages in Oracle Streams Advanced Queueing.
I need to ensure that the messages which are related to each other are processed sequentially.
For example, assume the queue is seeded with the four messages that have a business-related field called transaction reference (txn_ref) and two of the messages (1,3) belong to the same transaction (000001):
id | txn_ref |
---+---------+
1 | 000001 |
2 | 000002 |
3 | 000001 |
4 | 000003 |
Assume also that I am running 4 threads/processes that wish to dequeue from this queue. The following should occur:
thread 1 dequeues message #1
thread 2 dequeues message #2
thread 3 dequeues message #4 (because message #3 is related to #1 and #1 has not yet completed).
thread 4 blocks waiting for a message
thread 1 commits its work for message #1
thread 4 (or perhaps thread 1) dequeues message #3.
My initial thought was that I could achieve this with a dequeue condition where the ENQ_TIME (enqueue time) is not later than any other ENQ_TIME of all the messages that have the same TXN_REF. But my problem is how to reference the TXN_REF of a message that I have not yet selected, in order to select it. e.g.
// Java API
String condition = "ENQ_TIME = (select min(ENQ_TIME) from AQ_TABLE1 where ??";
dequeueOption.setCondition(condition);
Is it possible to achieve what I want here?

To answer your direct question, this can be achieved using the correlation field (called CORRID in the table), which is designed for this purpose.
So, on the enqueue, you'd use the AQMessageProperties.setCorrelation() method with the TXN_REF value as the parameter. Then, in your condition you would do something like this:
// Java API
String condition = "tab.ENQ_TIME = (select min(AQ_TABLE1.ENQ_TIME) from AQ_TABLE1 self where tab.CORRID=AQ_TABLE1.CORRID)";
dequeueOption.setCondition(condition);

A strategy which you can try, if possible, is using Message Groups. The Oracle Documentation describes it briefly, but I found this Toad World article to be far more useful. Basically, you setup the queue table to treat all messages committed at the same time as one "group". When dequeueing, only one user at a time can dequeue from a "group" of messages.

Related

Consumer assignment with multiple topics with Kafka Streams

Apologies if this has been already covered before here, I couldn't find anything closely related. I have this Kafka Streams app which reads from multiple topics, persist the records on a DB and then publish an event to an output topic. Pretty straightforward, it's stateless in terms of kafka local stores. (Topology below)
Topic1(T1) has 5 partitions, Topic2(T2) has a single partition. The issue here is, while consuming from two topics, if I want to go "full speed" with T1 (5 consumers), it doesn't guarantee that I will have dedicated consumers for each partition on T1. It will be distributed within the two topic partitions and I might end up with unbalanced consumers (and idle consumers), something like below:
[c1: t1p1, t1p3], [c2: t1p2, t1p5], [c3: t1p4, t2p1], [c4: (idle consumer)], [c5: (idle consumer)]
[c1: t1p1, t1p2], [c2: t1p5], [c3: t1p4, t2p1], [c4: (idle consumer)], [c5: t1p3]
With that said:
Is it a good practice having a topology that reads from multiple topics within the same KafkaStreams instance?
Is there any way to achieve a partition assignment like the following if I want go "full speed" for T1? [c1: t1p1, t2p1], [c2: t1p2], [c3: t1p3], [c4: t1p4], [c5: t1p5]
Which of the topologies below is most optimal to what I want to achieve? Or is it completely unrelated?
Option A (Current topology)
Topologies:
Sub-topology: 0
Source: topic1-source (topics: [TOPIC1])
--> topic1-processor
Processor: topic1-processor (stores: [])
--> topic1-sink
<-- topic1-source
Sink: topic1-sink (topic: OUTPUT-TOPIC)
<-- topic1-processor
Sub-topology: 1
Source: topic2-source (topics: [TOPIC2])
--> topic2-processor
Processor: topic2-processor (stores: [])
--> topic2-sink
<-- topic2-source
Sink: topic2-sink (topic: OUTPUT-TOPIC)
<-- topic2-processor
Option B:
Topologies:
Sub-topology: 0
Source: topic1-source (topics: [TOPIC1])
--> topic1-processor
Source: topic2-source (topics: [TOPIC2])
--> topic2-processor
Processor: topic1-processor (stores: [])
--> response-sink
<-- topic1-source
Processor: topic2-processor (stores: [])
--> response-sink
<-- topic2-source
Sink: response-sink (topic: OUTPUT-TOPIC)
<-- topic2-processor, topic1-processor
If I use two streams for each topic instead of a single streams with multiple topic, would that work for what I am trying to achieve?
config1.put("application.id", "app1");
KakfaStreams stream1 = new KafkaStreams(config1, topologyTopic1);
stream1.start();
config2.put("application.id", "app2");
KakfaStreams stream2 = new KafkaStreams(config2, topologyTopic2);
stream2.start();
The initial assignments you describe, would never happen with Kafka Streams (And also not with any default Consumer config). If there are 5 partitions and you have 5 consumers, each consumer would get 1 partition assigned (for a plain consumer with a custom PartitionAssignor you could do the assignment differently, but all default implementations would ensure proper load balancing).
Is it a good practice having a topology that reads from multiple topics within the same KafkaStreams instance?
There is not issue with that.
Is there any way to achieve a partition assignment like the following if I want go "full speed" for T1? [c1: t1p1, t2p1], [c2: t1p2], [c3: t1p3], [c4: t1p4], [c5: t1p5]
Depending how you write your topology, this would be the assignment Kafka Streams uses out-of-the-box. For you two options, option B would result in this assignment.
Which of the topologies below is most optimal to what I want to achieve? Or is it completely unrelated?
As mentioned above, Option B would result in the assignment above. For Option A, you could actually even use a 6th instance and each instance would processes exactly one partition (because there are two sub-topologies, you get 6 tasks, 5 for sub-topology-0 and 1 for sub-topology-1; sub-topologies are scaled out independently of each other); for Option A, you only get 5 tasks though because there is only one sub-topology and thus the maximum number of partitions of both input topic (that is 5) determines the number of tasks.
If I use two streams for each topic instead of a single streams with multiple topic, would that work for what I am trying to achieve?
Yes, it would be basically the same as Option A -- however, you get two consumer groups and thus "two application" instead of one.

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose
max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.
You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

Overwriting Messages with Same Correlation ID and Sequence Number in Spring

Based on my own experiments (can't find this documented anywhere), if 2 messages, having the same correlation ID and sequence number, aggregator will only take the 1st message and discard/ignore the other message.
Is there a way to make aggregator use the last message received instead?
The aggregation will merge the payload into 1 string.
Simple scenario:
3 messages with same correlation ID and sequence size of 2, ordered by time received
sequence#: 1; payload: abc
sequence#: 1; payload: def
sequence#: 2; payload: ghi
Current output: abcghi
Expected output: defghi
This scenario happens when sequence# 2 is missing which is meant for the 1st message. And the correlation ID (obtained from decoded payload) is very limited, hence it will be used multiple times.
Original message
The raw messages came in this format:
"Sequence Size","Sequence Number","ID","Text"
ID ranges between 0-9
example message: 2,1,8,abc
Sample raw message payload:
2,1,8,abc
2,1,8,def
2,2,8,ghi
The aggregator basically combine the text
You need to use a custom release strategy (it can have the same logic as the default SequenceSizeReleaseStrategy, but it can't be that class). With the default strategy duplicate sequences are skipped.
However, you will also need a custom message group store to enact your desired behavior; otherwise the output will be abcdefghi.
However, the discarded message will not be sent to the discard channel in that case.
It's generally not a good idea to reuse correlation id; if you must, then use the group-timeout or a reaper to discard the partial group before any possibility of reuse of the correlation id.
BTW, you might find it easier to use a custom CorrelationStrategy and ReleaseStrategy rather than using the default and manipulating the headers.

How to keep track of messages exchanged between a server and clients?

My app sends notification to the pc when a new text message is received on the phone. I am doing that over bluetooth if it matters.
(This is relevant to PC side)
What I am struggling with is keeping track of messages for each contact. I am thinking of having a linked list that grows as new contacts come in. Each node will represent a new contact.
There will be another list that grows vertically and this will be the messages for that contact.
Here is a diagram to make it clear:
=======================
| contact 1 | contact 2 ...
=======================
|| ||
========= =========
| msg 0 | | msg 0 |
========= =========
|| ||
========= =========
| msg 1 | | msg 1 |
========= =========
. .
. .
. .
This will handle the messages received but how do I keep track of the responses sent? Do I tag the messages as TAG_MSG_SENT, TAG_MSG_RECEIVED etc?
I have not written code for this part as I want to do the design first.
Why does it matter?
well when the user clicks on a contact from a list I want to be able display the session like this in a new window:
==============================
| contact 1 |
==============================
|Received 0 |
| Sent 0|
| Sent 1|
|Received 1 |
==============================
I am using C/C++ on windows.
Simple approach would be to use of existing file systems to store message as follows :-
Maintain a received file and sent file for each contact in specific folder.
Name them contact-rec-file and contact-sent-file.
Every time you receive or send message.
Append the message to corresponding sent or receive file
first write the size of message in bytes to the end of file
then write the content of the message.
Whenever you need to display messages open the file
read the size of file then read the contents of message using the size.
Note: Using main memory to store message is pretty inefficient as a lot of memory is used if there are more messages sent.
Optimization :- Use another file to store the number of messages and their seek position in send or receive files so that you can read that file at loading time and then directly seek the file to correct position if you to read only particular message.
It depends on what you want to keep track of, If you just want the statistics of the sent and received messages, then two counters for each contact will do. If you just want the messages sent and received by the client, not caring about how they are interleaved, then 2 lists for each client will do. If you also need to know the order of how they are interleaved, then as you suggested, a single list with an additional flag indicating if it was a sent or received message will work. There are other possibilities definitely, these are just to get you started.
Ok, if order matters, then here are 2 more ways that I can think of off the top of my head:
1) in the linked list, instead of having a flag indicating the status, have 3 next pointers, one for next message, one for next sent message, one for next received message. The next message pointer will have the same value as one of the others, but that's just so you can know how they are interleaved. So now you can easily get a list of sent messages, received messages, both, or some other weird walk.
2) Have only 1 linked list/array/table, each entry will include the contact info and the SENT/RECEIVED flag. This is not good if there's lots of other info about the contact that you wish to keep since now they need to be replicated. But for simplicity, only 1 list instead of list of lists. To remedy this problem, you could create a separate list with just the contact info, and put a reference in the messages linked list to this contact info list. You could also create a contacts_next_message pointer in the list of messages, this way you can walk using that and get all of that contacts messages.
And so on, there's lots of ways you can do this.

Check if a Win32 thread is running or in a suspended state

How do I check to see if a Win32 thread is running or in suspended state?
I can't find any Win32 API which gives the state of a thread. So how do I get the thread state?
I think - originally - this information was not provided because any API that provided this info would be misleading and useless.
Consider two possible cases - the current thread has suspended the thread-of-interest. Code in the current thread knows about the suspended state and should be able to share it so theres no need for the kernel team to add an API.
The 2nd case, some other / a 3rd thread in the system has suspended the thread of interest (and theres no way to track which thread that was). Now you have a race condition - that other thread could, at any time - unsuspend the thread of interest and the information gleaned from the API is useless - you have a value indicating the thread is suspended when it is in fact, not.
Moral of the story - if you want to know that a thread is suspended - suspend it: The return value from SuspendThread is the previous suspend count of the thread. And now you DO know something useful - The thread WAS AND STILL IS suspended - which is useful. Or that it WASN't (but now is) suspended. Either way, the thread's state is now deterministically known so you can in theory make some intelligent choices based on that - whether to ResumeThread, or keep it suspended.
You can get this information by calling NtQuerySystemInformation() with the value for SystemProcessesAndThreadsInformation (integer value 5).
If you want an example of what you can do with this information take a look at Thread Status Monitor.
WMI's Win32_Thread class has a ThreadState property, where 5: "Suspended Blocked" and 6:Suspended Ready.
You will need the Thread's Id to get the right instance directly (the WMI object's Handle property is the thread id).
EDIT: Given this PowerShell query:
gwmi win32_thread | group ThreadState
gives
Count Name Group
----- ---- -----
6 2 {, , , ...}
966 5 {, , , ...}
WMI has a different definition of "Suspended" to Win32.
In Windows 7, you can use QueryUmsThreadInformation. (UMS stands for User mode scheduling).
See here for UmsThreadIsSuspended.
you could get thread suspend count with code like this:
DWORD GetThreadSuspendCount(HANDLE hThread) {
DWORD dwSuspendCount = SuspendThread(hThread);
ResumeThread(hThread);
return dwSuspendCount;
}
but, as already said - it is not accurate.
Moreover, suspending a thread is evil.
YES: it IS possible to get the thread state and determine if it is suspended.
And NO: You don't need Windows 7 to do that.
I published my working class here on Stackoverflow: How to get thread state (e.g. suspended), memory + CPU usage, start time, priority, etc
This class requires Windows 2000 or higher.
I think the state here is referred to as
If thread is in thread proc , doing some processing Or
Waiting for event
This can be taken care of by using variable which can tell that if a thread is actually running or waiting for event to happen.
These scenarios appear when considering threadpools, having some n threads and based on each thread running status , tasks can be assigned to idle threads.

Resources