Oracle AQ doesn't call notification callback procedure - oracle

Environment:
Oracle 12.2 under Linux 64-bit
Job_queue_processes 4000
Aq_tm_processes 1
I wrote a asynchronous notification callback PL/SQL procedure for my advanced queueing testing program. I successfully register it via dbms_aq.register.
The procedure simply dequeues messages with no additional actions.
I provided a logging of that callback procedure.
It simply writes into the log table message id of dequeued message and error text(if procedure fails).
And sometimes(not always) I can see in queue table that messages in queue remain unprocessed.
And there is no errors in log and no message ids. Just nothing.
I not sure that it occurs due to high workload of database. It’s always middle.
My guess is that Oracle server process simply loses callback registration.
However it always presents in Dba_subscr_registrations.
And I want to know what server process should I trace:QMNC master process, QMNC slave, or EMON, with purpose to find records about missing registration.
Or the trouble is much deeper that I can imagine?
Any help would be very appreciated.
Andrew.

Related

Atomically update database and send message. Outbox pattern or not?

You have a command/operation which means you both need to save something in database end send an event/message to another system. For example you have an OrderService and when a new order is created you want to publish an "OrderCreated"-event for another system/systems to react on (either direct message or using a message broker) and do something.
The easiest (and naive) implementation is to save in db and if successful then send message. But of course this is not bullet proof because the other service/message broker is down or your service crash before sending message.
One (and common?) solution is to implement "outbox pattern", i.e. instead of publish messages directly you save the message to an outbox table in your local database as part of your database transaction (in this example save to outbox table as well as order table) and have a different process (polling db or using change data capture) reading the outbox table and publish messages.
What is your solution to this dilemma, i.e. "update database and send message or do neither"? Note: I am not talking about using SAGAs (could be part of a SAGA though but this is next level).
I have in the past used different approaches:
"Do nothing", i.e just try to send the message and hope it will be sent. Which might be fine in some cases especially with a stable message broker running on same machine.
Using DTC (in my case MSDTC). Beside all the problem with DTC it might not work with your current solution.
Outbox pattern
Using an orchestrator which will retry process if you have not got a "completed" event.
In my current project it is not handled well IMO and I want to change it to be more resilient and self correcting. Sometimes when a service is calling another service and it fails the user might retry and it might work ok. But some operations might require out support to fix it (if it is even discovered).
ATM it is not a Microservice solution but rather two large (legacy) monoliths communicating and is running on same server but moving to a Microservice architecture in the near future and might run on multiple machines.

MQPUT is successful but message not available in remote queue

MQPUT returns a successful post response (00) on IBM ZOS IMS online service, but the message is not getting inserted into the Remote queue. The queue connection was successful as well.
Program is written in cobol with IMS interface and the module is invoked through the IMS Websphere bridge interface.
By default for MQ on z/OS, MQPUTs are done under a local UOW. i.e. MQPMO-SYNCPOINT is set for Put Message Options. Hence, the code must perform an MQCMIT API call.
Update the code to use MQPMO-NO-SYNCPOINT and then the message will not be held waiting for an MQCMIT.
Thanks for your response, actually, the program had a rollback on the Logical unit of work when the program faces a failure situation, so in this case the program updated to the MQ but the roll back happened at the end of the processing ...

WebSphere MQ: Message keeps toggling between input queue and backout queue

The logic flow is like this
A message is sent to an input queue
A ProcessorMDB's onMessage() is invoked. Within this method several operations/validations are done
In case of a poison message(msg that application code cannot handle) a RuntimeException is thrown.
This should rollback the transaction. We are seeing evidence in the log file.
There is a backout threshold defined with a backout queue name
once threshold is reached, the message is sent to backout queue
But immediately it starts going back and forth between the input queue and backout queue.
We are using MQMON tool to observe this weird behavior. It continues for ever almost even after the app server(where MDB is running) is shutdown.
We are using Weblogic 10.3.1 and WebSphere MQ 6.02
Any help will be much appreciated, looks like we are running out of ideas.
This sounds like a syncpoint issue. If the QMgr were to issue a COMMIT when a message is requeued inside of a unit of work it would affect all messages under syncpoint inside of that thread. This would cause serious problems if an application had performed several PUT or GET calls prior to hitting the poison message. Rather than issue a COMMIT outside of the program's control, the QMgr just leaves the message on the backout queue inside the unit of work and waits for the program to issue the COMMIT. This can lead to some unexpected behavior such as what you are seeing where a message lands back on the input queue.
If another message is in the queue behind the "bad" one and it is processed successfully by the same thread, everything works out perfectly. The app issues a COMMIT on the new message and this also affects the poison message on the Backout Queue. However if the thread were to exit uncleanly (without an explicit disconnect or COMMIT) then the transaction is rolled back and the poison message is returned to the input queue.
The usual way of dealing with this is that the next good message (or batch of messages if transactions are batched) in the input queue will force the COMMIT. However in some cases where the owning thread gets no new work (perhaps it was performing a GET by Correlation ID) there is nothing to push the bad message through. In these cases, it is important to make sure that the application issues a COMMIT before ending. One way to do this is to write the code to perform the GET by CORRELID with a wait interval. If the wait interval expires, the application would get a return code of 2033 and then issue a COMMIT before closing the thread. If the reply message is legitimately late for whatever reason, the COMMIT will have no effect. But if the message arrived and had been backed out and requeued, the COMMIT will cause it to stay in the Backout Queue.
One way to see exactly what is going on is to run a trace against the queue in question. You can use the built-in trace function - strmqtrc - which has a few more options in V7 than does the V6 version. However if you want very fine grained control you can use the trace exit in SupportPac MA0W. With MA0W you can see exactly what API calls are made by the program and those made on its behalf.
[EDIT] Updating the response with some info from the PMR:
The following is from the WMQ V7 Infocenter:
MessageConsumers are single threaded below the Session level, and
any requeuing of poison messages
takes place within the current unit of
work. This does not affect the
operation of the application, however
when poison messages are requeued
under a transacted or
Client_acknowledge Session, the
requeue action itself will not be
committed until the current unit of
work is committed by the application
code or, if appropriate, the
application container code."
Hence, if it is important for the customer to have poison messages
committed immediately after they are
backed out, it is recommended they
either make use of the Application
Server Facilities
(ConnectionConsumer) which can commit
the message immediately, or
another mechanism to move poison
messages from the queue.
Here is the link to this information in the V6 and V7 Information Centers. Since you are using the V6 client so you would want to refer to the V6 Infocenter. Note that with the V6 client, there is no mention in the Infocenter of ASF being able to commit the poison message immediately, even when using a ConnectionConsumer. The way I read it, this means you probably will need to upgrade to the V7 client to get the behavior you are looking for. Will be interested to see if the PMR results in a similar recommendation.

How to check which point is cause of problem with MQ?

I use MQ for send/receive message between my system and other system. Sometime I found that no response message in response queue, yet other system have already put response message into response queue (check from log). So, how to check which point is cause of problem, how to prove message is not arrive to my response queue.
In addition, when message arrive my queue it will be written to log file.
You can view this in real-time using the QStats interface. The MO71 SupportPac is a desktop client that you can configure to connect similar to WebSphere MQ Explorer. One of the options it has is queue statistics. Each time you view the queue stats, WMQ resets them to zero. So the procedure is this:
Start MO71 and browse the queues.
Filter on the one queue of interest.
View the queue stats a couple of times.
You will see them reset to zero.
Now run your test.
View the queue stats again.
If the remote program actually put a message, you will see that the queue now shows one or more messages PUT.
If your program successfully executed a GET of the message, you will see GET counts equal to the number of PUT counts.
If GET and PUT both zero, the remote program never PUT the response message.
There are a few other approaches to this but this is the easiest. The opposite end of the spectrum is SupportPac MA0W which will show you every API call against that queue, or by PID, or whatever. It shows all the options so if a program tries to open the queue with the wrong options (i.e. open a remote queue for input) it shows that. But MA0W is a installed as an exit and requires the QMgr to be bounced so it's a little invasive.

What does it mean to "register system events... with Oracle Internet Directory"?

This is from the Stream - AQ docs.
You can register system events, user events, and notifications on queues with Oracle Internet Directory. System events are database startup, database shutdown, and system error events. User events include user log on and user log off, DDL statements (create, drop, alter), and DML statement triggers. Notifications on queues include OCI notifications, PL/SQL notifications, and e-mail notifications.
Sounds interesting. What does this get me?
I mean these things look like DDL Triggers... So it's a matter of not building the DDL trigger in a database but instead building it in OID and letting OID manage the firing of the trigger?
Having never used it, this is my guess.
Imagine you have a hundred databases, and you want to log every time people log into each one, you could do it on each individual server, but that would make answering questions like "Which databases did 'Mark' Login to" difficult".
So, instead, you have each database register its "user logon" events with OID (via AQ), you then have a process receiving these events from OID and logging them. You then have a single point where you can audit system wide logins.
You can likely also use it to propagate messages from one AQ to another, and to lookup what queues exist within the system which can be subscribed to.

Resources