On message delivery guarantee between up-stream/down-stream module instances in Spring-XD in case of container failure - spring-xd

For example, the stream is defined as Source | Sink. If the container where Sink is deployed fails, I assume Spring-XD runtime will detect the failure (via Zookeeper) and redeploy the Sink instance. Does the message middleware will buffer the messages sent from the Source during the time when the sink is down, and guarantee to deliver to the Sink after it is redeployed? Is there any chance that the data will be lost in the data path? Say, the Sink fails after receiving a message but before handling over to the external sink channel. After the redeployment, that message will not be delivered again to Sink, so it will never appear in the external destination.

Related

DLQ redrive failed events back to DynamoDB streams?

I have a DynamoDB stream triggering a Lambda, and I want to push any failed events to a DLQ.
If the source of a DLQ is an SQS queue, it looks like you can do something called a redrive back to the source queue, where messages in DLQ will be moved back to the source queue.
I am guessing that this isn't possible with if the source is a DynamoDB stream?
AWS doesn't provide any mechanism as of now to replay failed dynamo DB streams from a DLQ. The messages in the DLQ will have the metadata of the event rather than the actual failed records.
In case there is a need to replay the failed dynamo DB streams, it can be done in two step approach.
Get the shard iterator from the event metadata
Using the shard iterator, get the actual failed records from the Dynamo DB and process accordingly
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_GetShardIterator.html

NiFi from hadoop to kafak with exactly once guarantee

Is it possible for NiFi to read from hdfs (or hive) and publish data-rows to kafka with exactly once delivery guarantee?
Publishing to Kafka from NiFi is at-least-once guarantee because a failure could occur after Kafka has already received the message, but before NiFi receives the response, which could be due to a network issue, or maybe nifi crashed and restarted at that exact moment.
In any of those cases, the flow file would be put back in the original queue before the publish kafka processor (i.e. the session was never committed), and so it would be tried again.
Due to the threading model where different threads may execute the processor, it can't be guaranteed that the same thread that originally did the publishing will be the same thread that does the retry, and therefore can't make use of the "idempotent producer" concept.

Camel JMS Queue Polling and data recovery

Hi I am new to Camel and have a design question related to JMS queues.
I am receiving set of data. These data have a reference date. These data are sent every 15 minutes by a batch process.
I have to process the received data and forward them to another route.
If a given data cannot be processed, I need to reprocess it. And I have to ensure it is processed before the next data set is processed.
So I was thinking of creating a JMS route to receive these data before processing. Then process it. Then send it to another queue.
FTP --> Process data rows (A) --> JMS Queue --> Processor (B) --> direct:call
If processor B fails I want the data to be processed before the next data set is sent by FTP. (because second data set may contain an update of the data of the first dataset)
So I was thinking using a queue, to make sure they are always processed in the order they are being received.
But my experience with JMS, without Camel, is that once the object is consumed from the queue it is not in the queue anymore.
Is it also the case with Camel?
In this case to I have to retry to process the data, or put them back in the queue?
This "recovery" part is not clear to me and I'd like to understand the patterns that do support this.
Many thanks for your help
Gilles
This part "once the object is consumed from the queue it is not in the queue anymore." is not fully correct. Actually, when you are subscribing to the queue and getting a message you need to process it and send acknowledge back to the JMS broker. If acknowledge is successful then the message will be removed from the queue. But if acknowledge will be not successful or if your process will die and connection to the broker will break then the message will not be removed from the queue and will be passed to another consumer.
Often most of the JMS libraries are using mode when acknowledgement is sent right when message was received by consumer but you always have possibility to change this mode and send acknowledgement manually when your processing part will be finished successfully.
What about camel jms (http://camel.apache.org/jms.html) you can use endpoint option "acknowledgementModeName" which has some different possible values like:
AUTO_ACKNOWLEDGE (default) - acknowledgement will be sent right after corresponded "from" in your route
CLIENT_ACKNOWLEDGE - allows the application to control when the acknowledgment is sent and if there are no exceptions will be thrown during exchange processing then message will be acknowledged and removed from queue.

IBM MQ activity log issue

We are using IBM MQ8.0. Activitiy logs are getting logged for outgoing messages which we are sending to external system. But there is no log available for the messages which are from external system to our MQ Manager.
Is it problem with client channel configuration ?
Or MQ logging configuration issue ?
IBM describes these "activity logs" as recover logs in the Knowledge center page "Making sure that messages are not lost (logging)"
IBM MQ records all significant changes to the persistent data controlled by the queue manager in a recovery log.
This includes creating and deleting objects, persistent message updates, transaction states, changes to object attributes, and channel activities. The log contains the information you need to recover all updates to message queues by:
Keeping records of queue manager changes
Keeping records of queue updates for use by the restart process
Enabling you to restore data after a hardware or software failure
Please note that non-persistent messages are not logged to the recover log.
Based on your question it is likely that the messages you are sending to the external system are persistent messages and the messages you are receiving from the external system are non-persistent messages, this would explain why they are not logged to the recover log files.
Persistence is determined at the time the message is first PUT.
IBM has a good Technote "Message persistence FAQs" about this subject.
Q3. What is the best way to be certain that messages are persistent?
A3. Set MQMD message persistence to persistent (MQPER_PERSISTENT), or nonpersistent (MQPER_NOT_PERSISTENT) and your message will always retain that value.
Note: MQPER_PERSISTENCE_AS_Q_DEF is the default setting for the persistence value in the MQMD. See the persistence values listed below.
...
Additional information
MQPER_PERSISTENCE_AS_Q_DEF can lead to unexpected results. If there is more than one definition in the queue-name resolution path, the default persistence attribute is taken from first queue definition in the path at the time of the MQPUT or MQPUT1 call. This queue could be an:
alias queue
local queue
local definition of a remote queue
queue-manager alias
transmission queue
cluster queue
The external system will need to make sure the messages they send you are set as persistent messages if you want them to be logged.

MQRC Resource problem in WebSphere MQ

This is in Cluster Environment. Queue Manager lost its identity in cluster and it is unable to connect to other servers. All channels to repository and others were retrying state.
CPU usage is optimal in this server. This is a UNIX box.
When I checked the logs below is it,
AMQ9532: Program cannot set queue
attributes.
EXPLANATION: The attempt to set the
attributes of queue
'SYSTEM.CLUSTER.TRANSMIT.QUEUE' on
queue manager 'QMGR.SERVER6A' failed
with reason code 2102.
ACTION: Ensure
that the queue is available and retry
the operation.
----- amqrmssa.c : 690 --------------------------------------------------------
AMQ9999: Channel program ended
abnormally.
EXPLANATION: Channel program
'Channel.Coord00' ended abnormally.
ACTION: Look at previous error
messages for channel program
'Channel.Coord00' in the error files to
determine the cause of the failure.
----- amqrccca.c : 883 --------------------------------------------------------
03/06/11 08:24:26 AMQ9544: Messages
not put to destination queue.
EXPLANATION: During the processing of
channel 'Channel.Server6A' one or more
messages could not be put to the
destination queue and attempts were
made to put them to a dead-letter
queue. The location of the queue is
1, where 1 is the local dead-letter
queue and 2 is the remote dead-letter
queue.
ACTION: Examine the contents of
the dead-letter queue. Each message
is contained in a structure that
describes why the message was put to
the queue, and to where it was
originally addressed. Also look at
previous error messages to see if the
attempt to put messages to a
dead-letter queue failed. The program
identifier (PID) of the processing
program was '1372200'.
----- amqrmrca.c : 1318 -------------------------------------------------------
Then I did recycled the queue manager it is now fine?
My question here is how did the MQ resource problem occurr? CPU usage of this server is not more than 15%. Please advise.
There are three different and unrelated problems shown in the log.
AMQ9532: Program cannot set queue
attributes.
EXPLANATION: The attempt to set the
attributes of queue
'SYSTEM.CLUSTER.TRANSMIT.QUEUE' on
queue manager 'QMGR.SERVER6A' failed
with reason code 2102.
The 2102 is MQRC_RESOURCE_PROBLEM and presumably the resource issue referred to in the posting. The 2102 can be any kind of scarce resource, including semaphores, user processes, queue handles, etc. Since the QMgr was attempting to set an attribute of the queue, it would have already had a thread instantiated but it would have required additional queue handles. When something like this occurs, use your admin tool (WMQ Explorer, mqmon or one of the many 3rd party tools) to look into the number of open queue handles, open channels, etc. Note that for a resource error, it will be necessary to maintain an open connection to the QMgr or else the tool will be unable to make a new connection when the resource shortage occurs.
AMQ9999: Channel program ended
abnormally.
EXPLANATION: Channel program
'Channel.Coord00' ended abnormally.
ACTION: Look at previous error
messages for channel program
'C00.US.MP00' in the error files to
determine the cause of the failure.
This error appears to actually be two different errors since it references two different channels. One of these appears to be an outbound cluster channel and the other appears to be a point-to-point channel. Neither channel mentioned in this error are associated with the first and last error message.
03/06/11 08:24:26 AMQ9544: Messages
not put to destination queue.
EXPLANATION: During the processing of
channel 'Channel.Server6A' one or more
messages could not be put to the
destination queue and attempts were
made to put them to a dead-letter
queue. The location of the queue is 1,
where 1 is the local dead-letter queue
and 2 is the remote dead-letter queue.
ACTION: Examine the contents of the
dead-letter queue. Each message is
contained in a structure that
describes why the message was put to
the queue, and to where it was
originally addressed. Also look at
previous error messages to see if the
attempt to put messages to a
dead-letter queue failed. The program
identifier (PID) of the processing
program was '1372200'.
The last error appears to be an inbound cluster channel. Since the first error was trying to set attributes of the cluster transmit queue, it could only have been associated with an outbound channel. Therefore the first and last error messages are unrelated. This error message appears to show an inbound message that was destined for a queue and that queue was full, PUT-disabled, or otherwise unable to accept the message. The message was therefore routed to the dead letter queue.
For the resource error, I would suggest reviewing the performance report appropriate to your platform. Go to the SupportPacs page and look for those SupportPacs named MP* and then look for the one for your platform. The Performance Reports give you specific tuning advice.
You may also want to review the Problem Determination chapter in the System Administration manual for additional advice on how to identify resource issues.
The WebSphere MQ cluster design and operation article in the developerWorks Mission:Messaging series has specific advice about keeping clusters healthy.
Last but not least, the WebSphere MQ MustGather page has sections on troubleshooting for all major platforms and categorized by problem area.
To increase the MAXMSGL to 100 MB in IBMMQ,
(Reason code-2102 - MQRC_RESOURCE_PROBLEM) after setting the MAXMSGL to 100 mb
Category: IBM WebSphere MQ
If you are receiving error Reason code:2102 - MQRC_RESOURCE_PROBLEM, then try
Queue manager->properties->Extended->Increase Log->Log primary files and Log->Log secondary files->value to 20

Resources