NIFI Processor won't call the #OnStopped or #OnDisabled functions - apache-nifi

I have a NIFI-Processor that subscribes to a few tags on a OPC UA server.
I'm struggling to find a way to terminate the subscription. My plan was to just keep it running until I decide to stop the processor.
I tried defining functions for #OnStopped, #OnUnscheduled and #OnDisabled, but they never get called when I stop or disable the processor.
I'm on NIFI 1.7 so I can terminate the processor's thread, but my #OnStopped, #OnUnscheduled and #OnDisabled functions still don't get called.
Does terminating the thread mean that the thread won't return from onTrigger in a fashion that allows calling the above mentioned lifecycle methods?
EDIT: As requested, my method with annotation:
#OnStopped
private void OnStopped() {
getLogger().info("Subscriptions cleared - stopped");
miloOpcUAService.clearSubscriptions();
}

Your method has to have public visibility, otherwise the scheduler (which uses reflection) can't find it to invoke it.

Related

Allow-listing IP addresses using `call.cancel()` from within `EventListener.dnsEnd()` in OkHttp

i am overriding the dnsEnd() function in EventListener:
#Override
public void dnsEnd(Call call, String domainName, List<InetAddress> inetAddressList) {
inetAddressList.forEach(address -> {
logger.debug("checking if url ({}) is in allowlist", address.toString());
if (!allowlist.contains(address)) {
call.cancel();
}
});
}
i know, in the documentation it says not to alter call parameters etc:
"All event methods must execute fast, without external locking, cannot throw exceptions, attempt to mutate the event parameters, or be re-entrant back into the client. Any IO - writing to files or network should be done asynchronously."
but, as i don't care about the call if it is trying to get to an address outside the allowlist, i fail to see the issue with this implementation.
I want to know if anyone has experience with this, and why it may be an issue?
I tested this and it seems to work fine.
This is fine and safe. Probably the strangest consequence of this is the canceled event will be triggered by the thread already processing the DNS event.
But cancelling is not the best way to constrain permitted IP addresses to a list. You can instead implement the Dns interface. Your implementation should delegate to Dns.SYSTEM and them filter its results to your allowlist. That way you don't have to worry about races on cancelation.

Azure ServiceBus TopicClient SendAsync implementation in own wrapper

what is the proper implementation of SendAsync method of Azure ServiceBus TopicClient?
In the second implementation, will the BrokeredMessage actually be disposed before the SendAsync happens?
public async Task SendAsync<TMessage>(TMessage message, IDictionary<string, object> properties = null)
{
using (var bm = MessagingHelper.CreateBrokeredMessage(message, properties))
{
await this._topicClient.Value.SendAsync(bm);
}
}
public Task SendAsync<TMessage>(TMessage message, IDictionary<string, object> properties = null)
{
using (var bm = MessagingHelper.CreateBrokeredMessage(message, properties))
{
return this._topicClient.Value.SendAsync(bm);
}
}
I would like to get most from await/async pattern.
Answer to your question: the second approach could cause issues with disposed objects, you have to wait ending of SendAsync execution before you can release resources.
Detailed explanation.
If you call await, execution of a method will be stopped at the same moment and will not continue till awaitable method is not returned. Brokered message will be stored in a local hidden variable and will not be disposed.
If you don't call await, execution will continue and all resources of brokered message will be freed before they are actually consumed (as using is calling Dispose on object at the end) or in the process of consumption. This definetely will lead to exceptions inside SendAsync. At this point, execution of SendAsync is actually started.
What await does is “pausing” any current thread and waits for completion of task and it's result. And that's what you actually need. Purpose of async-await is to allow execution of some task concurrently with something else, it provides ability to wait for a result of concurrent operation when it is really necessary and further execution isn't possible without it.
First approach is good if every method to the top is an async method too. I mean, if caller of your SendAsync is async Task, and caller of that caller and so on to the top calling method.
Also, consider exceptions that could raise, they are listed here. As you can see, there are so-called transient errors. This is a kind of errors that retry can possibly fix. In your code, there is no such exception handling. Example of retry pattern could be found here, but mentioned article on exceptions can suggest better solutions and it is a topic of another question. I would also add some logging system to at least be aware of any non transient exceptions.

Project reactor processors v3.X

We are trying to migrate from 2.X to 3.X.
https://github.com/reactor/reactor-core/issues/375
We have used the EventBus as event manager in our application(Low latency FX system) and it works very well for us.
After the change we decided to take every module and create his own processor to handle event.
1. Does this use seems to be correct from your point of view? Because lack of document at the current stage and after reviewing everything we could we don't really know what to do here
2. We have tried to use Flux in order to perform action every X interval
For example: Market is arriving 1000 for 1 second but we want to process an update only 4 time in a second. After upgrading we are using:
Processor with buffer and sending to another method.
In this method we have Flux that get list and try to work in parallel in order to complete his task.
We had 2 major problems:
1. Sometimes we received Null event which we cannot find that our system is sending to i suppose maybe we are miss using the processor
//Definition of processor
ReplayProcessor<Event> classAEventProcessor = ReplayProcessor.create();
//Event handler subscribing
public void onMyEventX(Consumer<Event> consumer) {
Flux<Event> handler = classAEventProcessor .filter(event -> event.getType().equals(EVENT_X));
handler.subscribe(consumer);
}
in the example above the event in the handler sometimes get null.. Once he does the stream stop working until we are restating server(Because only on restart we are doing creating processor)
2.We have tried to us parallel but sometimes some of the message were disappeared so maybe we are misusing the framework
//On constructor
tickProcessor.buffer(1024, Duration.of(250, ChronoUnit.MILLIS)).subscribe(markets ->
handleMarkets(markets));
//Handler
Flux.fromIterable(getListToProcess())
.parallel()
.runOn(Schedulers.parallel())
.doOnNext(entryMap -> {
DoBlockingWork(entryMap);
})
.sequential()
.subscribe();
The intention of this is that the processor will wakeup every 250ms and invoke the handler. The handler will work work with Flux parallel in order to make better and faster processing.
*In case that DoBlockingWork takes more than 250ms i couldn't understand what will be the behavior
UPDATE:
The EventBus was wrapped by us and every event subscribed throw the wrapped event manager.
Now we have tried to create event processor for every module but it works very slow. We have used TopicProcessor with ThreadExecutor and still very slow.. EventBus did the same work in high speed
Anyone has any idea? BTW when i tried to use DirectProcessor it seems to work much better that the TopicProcessor
Reactor 3 is built around the concept that you should avoid blocking as much as you can, so in your second snippet DoBlockingWork doesn't look good.
How are the events generated? Do you maybe have an listener-based asynchronous API to get them? If so, you could try using Flux.create.
For your use case of "we have 1000 events in 1 second, but only want to process 4", I'd chain a sample operator. For instance, sample(Duration.ofMillis(250)) will divide each second into 4 windows, from which it will only emit the last element.
The reference guide is being written, as well as a page where you can find links to external articles and learning material.There's a preview of the WIP reference guide here and the learning resources page here.

How to put actor to sleep?

I have one actor which is executing a forever loop that is waiting for the availability of data to operate on.
The doc says the Actor runs on a very lightweight thread, so I'm not sure whether i can use the thread.sleep() method on that actor. My objective is to not have that actor consume too much processing power.
So can I use the thread.sleep() method inside the actor ?
Don't sleep() inside Actors! That would cause the Thread to be blocked, causing exactly what you're trying to avoid - using up resources.
Instead if you just handle the message and "do nothing", the Actor will not use up any scheduling resources and will be just another plain object on the heap (occupying around a bit of memory but nothing else).
I just schedule to send a "WakeUp" message in a future time. Akka will send that message at predefined time, so the actor can handle and continue processing. This is to avoid using sleep.
// schedule to wake up
getContext().getSystem().scheduler().scheduleOnce(
FiniteDuration.create(sleepTime.toMillis(), TimeUnit.MILLISECONDS),
new Runnable() {
#Override
public void run() {
getContext().getSelf().tell(new WakeUpMessage());
}
},
getContext().getSystem().executionContext());

Running Plone subscriber events asynchronously

In using Plone 4, I have successfully created a subscriber event to do extra processing when a custom content type is saved. This I accomplished by using the Products.Archetypes.interfaces.IObjectInitializedEvent interface.
configure.zcml
<subscriber
for="mycustom.product.interfaces.IRepositoryItem
Products.Archetypes.interfaces.IObjectInitializedEvent"
handler=".subscribers.notifyCreatedRepositoryItem"
/>
subscribers.py
def notifyCreatedRepositoryItem(repositoryitem, event):
"""
This gets called on IObjectInitializedEvent - which occurs when a new object is created.
"""
my custom processing goes here. Should be asynchronous
However, the extra processing can sometimes take too long, and I was wondering if there is a way to run it in the background i.e. asynchronously.
Is it possible to run subscriber events asynchronously for example when one is saving an object?
Not out of the box. You'd need to add asynch support to your environment.
Take a look at plone.app.async; you'll need a ZEO environment and at least one extra instance. The latter will run async jobs you push into the queue from your site.
You can then define methods to be executed asynchronously and push tasks into the queue to execute such a method asynchronously.
Example code, push a task into the queue:
from plone.app.async.interfaces import IAsyncService
async = getUtility(IAsyncService)
async.queueJob(an_async_task, someobject, arg1_value, arg2_value)
and the task itself:
def an_async_task(someobject, arg1, arg2):
# do something with someobject
where someobject is a persistent object in your ZODB. The IAsyncService.queueJob takes at least a function and a context object, but you can add as many further arguments as you need to execute your task. The arguments must be pickleable.
The task will then be executed by an async worker instance when it can, outside of the context of the current request.
Just to give more options, you could try collective.taskqueue for that, really simple and really powerful (and avoid some of the drawbacks of plone.app.async).
The description on PyPI already has enough to get you up to speed in no time, and you can use redis for the queue management which is a big plus.

Resources