How to use Report action in OSB proxy service to record retry attempts

How to use Report action in OSB proxy service to record retry attempts - proxy

I want to record the retry attempts of a proxy service in OSB using report action.
I have created a JMS transport proxy service which would pick messages from an IN_QUEUE and routes the message to a business service which would push the message to an OUT_QUEUE and reports the status (success or failure).
However if there is an error while processing, the proxy service should retry for 5 times before getting failed. To acheive this, I have configured the routing options and gave the retry count as 5 and it works good.
All I want now is to record the retry attempts (using report action) of the proxy service. Please suggest me how to do this.

Logging the retry attempts of a business service is difficult, since it's handled out of the scope of the proxy. About the closest you can come is to set up a SLA alert to notify you when the bizref fails, but that doesn't trigger on every message - just if it detects errors during the aggregation interval.
Logging the retry attempts of the proxy is a lot easier, especially since it's a JMS proxy. Failed processing will put the message back on the queue (XA-enabled resources, you may want to enable Same Transaction For Response), and retries will increment a counter inside the JMS transport header, which the proxy can extract and decide whether to report on it or not.
Just remember that unless you set QoS to Best Effort on the publishes/reports, the publishes themselves will be rolled back if a failure happens, which is probably not what you want.

Related

Ways to wait if server is not available in gRPC from client side

I hope who ever is reading this is doing well.
Here's a scenario that I'm wondering about: there's a global ClientConn that is being used for all grpc requests to a server. Then that server goes down. I was wondering if there's a way to wait for this server to go up with some timeout in order for the usage of grpc in this scenario to be more resilient to failures(either a transient failure or server goes down). I was thinking keep looping if the clientConn state is connecting or a transient failure and if a timeout occurs when the clientConn state was a transient failure then return an error since the server might be down.
I was wondering if this would work if there are multiple requests coming in the client side that would need this ClientConn so then multiple go routines would be running this loop. Would appreciate any other alternatives, suggestions, or advice.

When you call grpc.Dial to connect to a server and receive a grpc.ClientConn, it will automatically handle reconnections for you. When you call a method or request a stream, it will fail if it can't connect to the server or if there is an error processing the request.
You could retry a few times if the error indicates that it is due to the network. You can check the grpc status codes in here https://github.com/grpc/grpc-go/blob/master/codes/codes.go#L31 and extract them from the returned error using status.FromError: https://pkg.go.dev/google.golang.org/grpc/status#FromError
You also have the grpc.WaitForReady option (https://pkg.go.dev/google.golang.org/grpc#WaitForReady) which can be used to block the grpc call until the server is ready if it is in a transient failure. In that case, you don't need to retry, but you should probably add a timeout that cancels the context to have control over how long you stay blocked.
If you want to even avoid trying to call the server, you could use ClientConn.WaitForStateChange (which is experimental) to detect any state change and call ClientConn.GetState to determine in what state is the connection to know when it is safe to start calling the server again.

The transaction was rolled back on failover however commit may have been successful

I have an application using jms that sends data to an ActiveMQ Artemis queue. I got an exception with this message:
The transaction was rolled back on failover however commit may have been successful
This exception is basically telling me that the message may or may not have reached the queue so I don't know if I need to send the message again. Whats the best way to handle an exception like this when:
I cannot send duplicate messages to applications on the other end of the queue.
and
I cannot skip a message.

I can't state it better than the ActiveMQ Artemis documentation:
When sending messages from a client to a server, or indeed from a server to another server, if the target server or connection fails sometime after sending the message, but before the sender receives a response that the send (or commit) was processed successfully then the sender cannot know for sure if the message was sent successfully to the address.
If the target server or connection failed after the send was received and processed but before the response was sent back then the message will have been sent to the address successfully, but if the target server or connection failed before the send was received and finished processing then it will not have been sent to the address successfully. From the senders point of view it's not possible to distinguish these two cases.
When the server recovers this leaves the client in a difficult situation. It knows the target server failed, but it does not know if the last message reached its destination ok. If it decides to resend the last message, then that could result in a duplicate message being sent to the address. If each message was an order or a trade then this could result in the order being fulfilled twice or the trade being double booked. This is clearly not a desirable situation.
Sending the message(s) in a transaction does not help out either. If the server or connection fails while the transaction commit is being processed it is also indeterminate whether the transaction was successfully committed or not!
To solve these issues Apache ActiveMQ Artemis provides automatic duplicate messages detection for messages sent to addresses.
See more details about how to configure and use duplicate detection in the ActiveMQ Artemis documentation.

Nats.io QueueSubscribe behavior on timeout

I'm evaluating NATS for migrating an existing msg based software
I did not find documentation about msg timeout exception and overload.
For Example:
After Subscriber has been chosen , Is it aware of timeout settings posted by Publisher ? Is it possible to notify an additional time extension ?
If the elected subscriber is aware that some DBMS connection is missing and cannot complete It could be possible to bounce the message
NATS server will pickup another subscriber and will re-post the same message ?
Ciao
Diego

For your first question: It seems to me that you are trying to publish a request message with a timeout (using the nc.Request). If so, the timeout is managed by the client. Effectively the client publishes the request message and creates a subscription on the reply subject. If the subscription doesn't get any messages within the timeout it will notify you of the timeout condition and unsubscribe from the reply subject.
On your second question - are you using a queue group? A queue group in NATS is a subscription that specifies a queue group name. All subscriptions having the same queue group name are treated specially by the server. The server will select one of the queue group subscriptions to send the message to rotating between them as messages arrive. However the responsibility of the server is simply to deliver the message.
To do what you describe, implement your functionality using request/reply using a timeout and a max number of messages equal to 1. If no responses are received after the timeout your client can then resend the request message after some delay or perform some other type of recovery logic. The reply message should be your 'protocol' to know that the message was handled properly. Note that this gets into the design of your messaging architecture. For example, it is possible for the timeout to trigger after the request recipient received the message and handled it but before the client or server was able to publish the response. In that case the request sender wouldn't be able to tell the difference and would eventually republish. This hints that such type of interactions need to make the requests idempotent to prevent duplicate side effects.

OSB Proxy service retry mechanism

I have created a JMS proxy service which fires on a message and routes the message to another JMS business service which puts the message into an out queue.
If the business service gives any error, I want the service to retry for 5 times. For this requirement I have set the retry count in routing options of the proxy service to 5. But, on third retry attempt, I want the proxy service to call a mail alert destination which sends a mail.
I am stuck at this point. Can anybody please help me in solving this ??

Setting retry count as 'N' times in Business service will retry for 'N' times.
If error occurs even in 'N'th try, business service will return error to the route node.
Try calling the business service two times, splitting the retries like 3 and 2.
It would be better to use service callout and two stages
Make call in first service callout with retry count as 3
if it fails, make service call to mail alert destination in stage level error handler and resume
Make call in second service callout with retry count as 2
If first service callout returns success skip second service callout
This may also work
try with retry count 3 in routing node
if it fails, make service call to mail alert destination and call the business service with retry count 2
If it also fails, handle the error in service level error handler

heroku router timeout/interrupt causing lost responses

I have what appears to be a race condition related to losing responses coming from my heroku web service.
The heroku router delivers the request to the web service, the web service processes the request and returns a response, but in the interim the heroku router fails the request, either due to client (interrupt) or backend timeout.
The problem is that the web service request processing changed state on the backend and expected to send the state change to the client in the body of the response. The response never gets to the client, therefore the state change is lost forever.
The state change in my case happens to be the delivery and removal of a message from a RabbitMQ message queue. The web service request handler pops the request from the RabbitMQ queue, but it fails to reach the client and is never heard of again.
I could implement my own client-based message ACK system to mitigate this. However, I suspect that some of you might have a better solution regarding how to deal with ensuring that the responses get to the client. Is there any callback that I can use on my web service to determine if the response was lost? FWIW my web service is a JAX-RS service running embedded Jetty.
Thanks!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio