message flow - end when broker is stopping - ibm-integration-bus

I have an IIB message flow that runs for a few hours each evening, using a java loop to perform some actions once per minute.
During that time, if the multi-instance broker that the flow is running on fails over, the failover hangs until this message flow ends its processing (potentially hours later).
Is there any kind of hook I can use in Java to say "if the broker is stopping or failing over, then cancel this processing to let it happen"?
Edit
I have now tried the following code as a test, but even when a request is made to stop the execution group/flow, the booleans all remain as true
Boolean egIsRunning = true;
Boolean aIsRunning = true;
Boolean msgFlowIsRunning = true;
while (egIsRunning && aIsRunning && msgFlowIsRunning)
{
Thread.sleep(1000);
ExecutionGroupProxy e = ExecutionGroupProxy.getLocalInstance();
egIsRunning = e.isRunning();
ApplicationProxy a = e.getApplicationByName("SANDBOX.APP");
aIsRunning = a.isRunning();
MessageFlowProxy m = a.getMessageFlowByName("SANDBOX_MSGFLOW");
msgFlowIsRunning = m.isRunning();
}
So, I don't think the Integration API is going to help here? Or is there some ".isTryingToStop" method that I'm missing?

I recommend to look at the Integration API.
From within your Java Compute Node you can connect with BrokerProxy.getLocalInstance() (don't forget to disconnect(), otherwise you will run out of memory eventually). Then I would try MessageFlowProxy.isRunning() as an exit criteria for your loop.
Should isRunning not work, there are other options like AdministeredObjectListener to figure out what is happening with your flow.

Related

Datadog Lambda Extension Logs delayed

Im trying to send logs from AWS Lambda using Datadog extension.
It works but the logs arent being sent until the lambda is shut down (as opposed to the end of invocation) which leads to ~10min delay before logs appear within Datadog.
The current environment variables for the lambda are as follows:
DD_API_KEY_SECRET_ARN = secert_arn
DD_CAPTURE_LAMBDA_PAYLOAD = true
DD_ENV = dev
DD_FLUSH_TO_LOG = false
DD_LAMBDA_HANDLER = index.handler
DD_LOG_LEVEL = debug
DD_LOGS_INJECTION = true
DD_SERVERLESS_LOGS_ENABLED = true
DD_SERVICE = MyService
DD_SITE = datadoghq.com
DD_TRACE_ENABLED = true
DD_VERSION $LATEST
You should take a look at this issue:
https://github.com/DataDog/datadog-lambda-extension/issues/29
Let me quote an answer from it:
Hi #stalar, thanks for reaching out.
This is a known behavior based on the way Lambda Extensions and the
Lambda Logs API work. Once your function finishes running, the
extension is frozen until the next invocation. However, there isn't a
guarantee that we have received logs at that time. Logs may arrive on
the subsequent invocation of the function. Furthermore, if your
function is invoked repeatedly, we will switch to a strategy of
periodically flushing logs to reduce overhead, which may mean that
logs do not immediately appear in Datadog after each and every
invocation.
We are in touch with AWS about possible improvements to resolve this
issue.
Let me know if you have any further questions!

How to set up a ZeroMQ request-reply between a c# and python application

I'm trying to communicate between a c#(5.0) and a python (3.9) application via ZeroMQ. For .Net I'm using NetMQ and for python PyZMQ.
I have no trouble letting two applications communicate, as long as they are in the same language
c# app to c# app;
python -> python;
java -> java,
but trouble starts when I try to connect between different languages.
java -> c# and reverse works fine as well [edited]
I do not get any errors, but it does not work either.
I first tried the PUB-SUB Archetype pattern, but as that didn't work, I tried REQ-REP, so some remainders of the "PUB-SUB"-version can still be found in the code.
My Python code looks like this :
def run(monitor: bool):
loop_counter: int = 0
context = zmq.Context()
# socket = context.socket(zmq.PUB)
# socket.bind("tcp://*:5557")
socket = context.socket(zmq.REP)
socket.connect("tcp://localhost:5557")
if monitor:
print("Connecting")
# 0 = Longest version, 1 = shorter version, 2 = shortest version
length_version: int = 0
print("Ready and waiting for incoming requests ...")
while True:
message = socket.recv()
if monitor:
print("Received message:", message)
if message == "long":
length_version = 0
elif message == "middle":
length_version = 1
else:
length_version = 2
sys_info = get_system_info(length_version)
"""if not length_version == 2:
length_version = 2
loop_counter += 1
if loop_counter == 15:
length_version = 1
if loop_counter > 30:
loop_counter = 0
length_version = 0"""
if monitor:
print(sys_info)
json_string = json.dumps(sys_info)
print(json_string)
socket.send_string(json_string)
My C# code :
static void Main(string[] args)
{
//using (var requestSocket = new RequestSocket(">tcp://localhost:5557"))
using (var requestSocket = new RequestSocket("tcp://localhost:5557"))
{
while (true) {
Console.WriteLine($"Running the server ...");
string msg = "short";
requestSocket.SendFrame(msg);
var message = requestSocket.ReceiveFrameString();
Console.WriteLine($"requestSocket : Received '{message}'");
//Console.ReadLine();
Thread.Sleep(1_000);
}
}
}
Seeing the period of your problems maybe it's because of versions.
I run fine a program for long time with communications from Windows/C# with NTMQ 4.0.0.207 239,829 7/1/2019 on one side and Ubuntu/Python with zeromq=4.3.1 and pyzmq=18.1.0.
I just tried updating to use same NETMQ version but with new versions zeromq=4.3.3 and pyzmq=20.0.0 but there is a problem/bug somewhere and it doesn't run well anymore.
So your code doesn't look bad may be it's software versions issues not doing well try with NTMQ 4.0.0.207 on c# side and zeromq=4.3.1 with pyzmq=18.1.0 on python side
Q : "How to set up a ZeroMQ request-reply between a c# and python application"
The problem starts with the missed understanding of how REQ/REP archetype works.
Your code uses a blocking-form of the .recv()-method, so you remain yourselves hanging Out-of-the-Game, forever & unsalvageable, whenever a REQ/REP two-step gets into troubles (as no due care was taken to prevent this infinite live-lock).
Rather start using .poll()-method to start testing a presence / absence of a message in the local AccessNode-side of the queue and this leaves you in a capability to state-fully decide what to do next, if a message is already or is not yet present, so as to keep the mandatory sequence of an API-defined need to "zip" successful chainings ofREQ-side .send()-.recv()-.send()-.recv()-... with REP-side .recv()-.send()-.recv()-.send()-... calls, are the REQ/REP archetype works as a distributed-Finite-State-Automaton (dFSA), that may easily deadlock itself, due to "remote"-side not being compliant with the local-side expectations.
Having a code, that works in a non-blocking, .poll()-based mode avoids falling into these traps, as you may handle each of these unwanted circumstances while being still in a control of the code-execution paths (which a call to a blocking-mode method in a blind belief it will return at some future point in time, if ever, simply is not capable of).
Q.E.D.
If in doubts, one may use a PUSH/PULL archetype, as the PUB/SUB-archetype may run into problems with non-matching subscriptions ( topic-list management being another, version dependent detail ).
There ought be no other problem for any of the language-bindings, if they passed all the documented ZeroMQ API features without creating any "shortcuts" - some cases were seen, where language-specific binding took "another" direction for PUB/SUB, when sending a pure message, transformed into a multi-part message, putting a topic into a first frame and the message into the other. That is an example of a binding not compatible with the ZeroMQ API, where a cross-language / non-matching binding-version system problems are clear to come.
Your port numbers do not match, the python code is 55557 and the c# is 5557
I might be late, but this same thing happened to me. I have a python Subscriber using pyzmq and a C# Publisher using NetMQ.
After a few hours, it occurred to me that I needed to let the Publisher some time to connect. So a simple System.Threading.Thread.Sleep(500); after the Connect/Bind did the trick.

Socket.ReceiveReady is not fired despite available messages

I've started to explore NetMQ 3.3.0.11 and ran into an issue with the use of Poller.
I try to achieve that the poller polls for about 1s and then stops and allows something else to be done before it resumes polling for 1s and so on.
I have the following code:
var poller = new Poller (client) { PollTimeout = 10 };
while (true)
{
for (var poll = 0; poll < 100; poll++)
{
poller.PollOnce ();
}
do_something;
}
The problem I'm facing is that during that polling period the Client.ReceiveReady event is not fired even though a message is ready to be picked up. And a InvalidOperationException stating Poller is started is raised.
Any idea what I'm doing wrong?
First try to work with version 3.3.0.12-rc1, it fixes a lot of issues, probably also the one you are suffering from.
Also regarding the do_something, I suggest working with NetMQTimer instead of PollOnce (use PollTillCanceled instead). You can also use NetMQScheduler for the do something stuff.

Delayed Job creating Airbrakes every time it raises an error

def perform
refund_log = {
success: refund_retry.success?,
amount: refund_amount,
action: "refund"
}
if refund_retry.success?
refund_log[:reference] = refund_retry.transaction.id
refund_log[:message] = refund_retry.transaction.status
else
refund_log[:message] = refund_retry.message
refund_log[:params] = {}
refund_retry.errors.each do |error|
refund_log[:params][error.code] = error.message
end
order_transaction.message = refund_log[:params].values.join('|')
raise "delayed RefundJob has failed"
end
end
When I raise "delayed RefundJob has failed" in the else statement, it creates an Airbrake. I want to run the job again if it ends up in the else section.
Is there any way to re-queue the job without raising an exception? And prevent creating an airbrake?
I am using delayed_job version 1.
The cleanest way would be to re-queue, i.e. create a new job and enqueue it, and then exit the method normally.
To elaborate on #Roman's response, you can create a new job, with a retry parameter in it, and enqueue it.
If you maintain the retry parameter (increment it each time you re-enqueue a job), you can track how many retries you made, and thus avoid an endless retry loop.
DelayedJob expects a job to raise an error to requeued, by definition.
From there you can either :
Ignore your execpetion on airbrake side, see https://github.com/airbrake/airbrake#filtering so it still gets queued again without filling your logs
Dive into DelayedJob code where you can see on https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L65 that a method named reschedule is available and used by run_with_lock ( https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L99 ). From there you can call reschedule it manually, instead of raising your exception.
About the later solution, I advise adding some mechanism that still fill an airbrake report on the third or later try, you can still detect that something is wrong without the hassle of having your logs filled by the attempts.

Building an high performance node.js application with cluster and node-webworker

I'm not a node.js master, so I'd like to have more points of view about this.
I'm creating an HTTP node.js web server that must handle not only lots of concurrent connections but also long running jobs. By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection.
For example:
var http = require('http');
http.createServer(function (req, res) {
doSomething(); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
So I was thinking to run all the long running jobs in separate threads using the node-webworker library:
var http = require('http');
var sys = require('sys');
var Worker = require('webworker');
http.createServer(function (req, res) {
var w = new Worker('doSomething.js'); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
And to make the whole thing more performant, I thought to also use cluster to create a new node process for each CPU core.
In this way I expect to balance the client connections through different processes with cluster (let's say 4 node processes if I run it on a quad-core), and then execute the long running job on separate threads with node-webworker.
Is there something wrong with this configuration?
I see that this post is a few months old, but I wanted to provide a comment to this in the event that someone comes along.
"By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection."
^-- This is not entirely true. If doSomething(); is required to complete before you send back the response, then yes, but if it isn't, you can make use of the Asynchronous functionality available to you in the core of Node.js, and return immediately, while this item processes in the background.
A quick example of what I'm explaining can be seen by adding the following code in your server:
setTimeout(function(){
console.log("Done with 5 second item");
}, 5000);
If you hit the server a few times, you will get an immediate response on the client side, and eventually see the console fill with the messages seconds after the response was sent.
Why don't you just copy and paste your code into a file and run it over JXcore like
$ jx mt-keep:4 mysourcefile.js
and see how it performs. If you need a real multithreading without leaving the safety of single threading try JX. its 100% node.JS 0.12+ compatible. You can spawn the threads and run a whole node.js app inside each of them separately.
You might want to check out Q-Oper8 instead as it should provide a more flexible architecture for this kind of thing. Full info at:
https://github.com/robtweed/Q-Oper8

Resources