NiFi - Thread still runs after stopping processor - apache-nifi

I'm developing a publish subscribe processor with Eclipse Milo for Apache NiFi.
I have a service that handles most of the interaction with Eclipse Milo and the server and a controller that essentially just calls the service's functions.
The subscribing to nodes on the OPCUA server works fine, but I can't think of a good way to terminate the subscription, e.g. when I stop the processor.
The subscription, which "lives" in the service, survives the service getting disabled, as well as the controller being disabled/stopped. That means that the #OnStopped & #OnUnscheduled methods that I defined never get called, likely because the subscription never gets terminated. So I can't use these two methods.
I know that I can terminate threads in NiFi 1.7+, but I don't think that's a good way to handle this and also I'm still using 1.2.
Does anyone have any suggestions?

Update to the latest version, some problems with the way processors finish were fixed.

Related

Microservices: how to track fallen down services?

Problem:
Suppose there are two services A and B. Service A makes an API call to service B.
After a while service A falls down or to be lost due to network errors.
How another services will guess that an outbound call from service A is lost / never happen? I need some another concurrent app that will automatically react (run emergency code) if service A outbound CALL is lost.
What are cutting-edge solutions exist?
My thoughts, for example:
service A registers a call event in some middleware (event info, "running" status, timestamp, etc).
If this call is not completed after N seconds, some "call timeout" event in the middleware automatically starts the emergency code.
If the call is completed at the proper time service A marks the call status as "completed" in the same middleware and the emergency code will not be run.
P.S. I'm on Java stack.
Thanks!
I recommend to look into patterns such as Retry, Timeout, Circuit Breaker, Fallback and Healthcheck. Or you can also look into the Bulkhead pattern if concurrent calls and fault isolation are your concern.
There are many resources where these well-known patterns are explained, for instance:
https://www.infoworld.com/article/3310946/how-to-build-resilient-microservices.html
https://blog.codecentric.de/en/2019/06/resilience-design-patterns-retry-fallback-timeout-circuit-breaker/
I don't know which technology stack you are on but usually there is already some functionality for these concerns provided already that you can incorporate into your solution. There are libraries that already take care of this resilience functionality and you can, for instance, set it up so that your custom code is executed when some events such as failed retries, timeouts, activated circuit breakers, etc. occur.
E.g. for the Java stack Hystrix is widely used, for .Net you can look into Polly .Net to make use of retry, timeout, circuit breaker, bulkhead or fallback functionality.
Concerning health checks you can look into Actuator for Java and .Net core already provides a health check middleware that more or less provides that functionality out-of-the box.
But before using any libraries I suggest to first get familiar with the purpose and concepts of the listed patterns to choose and integrate those that best fit your use cases and major concerns.
Update
We have to differentiate between two well-known problems here:
1.) How can service A robustly handle temporary outages of service B (or the network connection between service A and B which comes down to the same problem)?
To address the related problems the above mentioned patterns will help.
2.) How to make sure that the request that should be sent to service B will not get lost if service A itself goes down?
To address this kind of problem there are different options at hand.
2a.) The component that performed the request to service A (which than triggers service B) also applies the resilience patterns mentioned and will retry its request until service A successfully answers that it has performed its tasks (which also includes the successful request to service B).
There can also be several instances of each service and some kind of load balancer in front of these instances which will distribute and direct the requests to an available instance (based on regular performed healthchecks) of the specific service. Or you can use a service registry (see https://microservices.io/patterns/service-registry.html).
You can of course chain several API calls after another but this can lead to cascading failures. So I would rather go with an asynchronous communication approach as described in the next option.
2b.) Let's consider that it is of utmost importance that some instance of service A will reliably perform the request to service B.
You can use message queues in this case as follows:
Let's say you have a queue where jobs to be performed by service A are collected.
Then you have several instances of service A running (see horizontal scaling) where each instance will consume the same queue.
You will use message locking features by the message queue service which makes sure that as soon one instance of service A reads a message from the queue the other instances won't see it. If service A was able to complete it's job (i.e. call service B, save some state in service A's persistence and whatever other tasks you need to be included for a succesfull procesing) it will delete the message from the queue afterwards so no other instance of service A will also process the same message.
If service A goes down during the processing the queue service will automatically unlock the message for you and another instance A (or the same instance after it has restarted) of service A will try to read the message (i.e. the job) from the queue and try to perform all the tasks (call service B, etc.)
You can combine several queues e.g. also to send a message to service B asynchronously instead of directly performing some kind of API call to it.
The catch is, that the queue service is some highly available and redundant service which will already make sure that no message is getting lost once published to a queue.
Of course you also could handle jobs to be performed in your own database of service A but consider that when service A receives a request there is always a chance that it goes down before it can save that status of the job to it's persistent storage for later processing. Queue services already address that problem for you if chosen thoughtfully and used correctly.
For instance, if look into Kafka as messaging service you can look into this stack overflow answer which relates to the problem solution when using this specific technology: https://stackoverflow.com/a/44589842/7730554
There is many way to solve your problem.
I guess you are talk about 2 topics Design Pattern in Microservices and Cicruit Breaker
https://dzone.com/articles/design-patterns-for-microservices
To solve your problem, Normally I put a message queue between services and use Service Discovery to detect which service is live and If your service die or orverload then use Cicruit Breaker methods

Notifying golongpoll.SubscriptionManager of an event from kafka-go

I was writing a POC on long-polling using go.
I see the general package to be used is https://github.com/jcuga/golongpoll .
But assuming that I would want to publish an event to the golongpoll.SubscriptionManager from a general context, especially when there is a possibility that the long poll API request is being served by one machine, while the Kafka event for that particular consumer group is consumed by another instance in the cluster.
The examples given in the documentation did not talk of such a scenario at all, even though this seems like a common scenario. One way I can think of is have a distributed cache like Redis in between and have all the services poll this for a change? But that sounds a bit dumb to me.

EasyNetQ / RabbitMQ consuming events in Web API

I have created Web API which allows messages to be sent to the Queue. My Web API is designed with CQRS and DDD in mind. I want my message consumer to always be waiting for any messages on the queue to receive. Currently the way its done, this will only read messages if I make a request to the API to hit the method.
Is there a way of either using console application or something that will always be running to consume messages at anytime given without having to make a request from the Web Api. So more of a automation task ?
If so, how do I go about with it i.e. if its console app how would I keep it always running (IIS ?) and is there way to use Dependency Injection as I need to consume the message then send to my repository which lives on separate solution. ?
or a way to make EasyNetQ run at start up ?
The best way to handle this situation in your case is to subscribe to bus events using AMPQ through EasyNetQ library. The recommended way of hosting it is by writing a windows service using topshelf library and subscribe to bus events inside that service on start.
IIS processes and threads are not reliable for such tasks as they are designed to be recycled on a regular basis which may cause some instabilities and inconsistencies in your application.
and is there way to use Dependency Injection as I need to consume the message then send to my repository which lives on separate solution.
It is better to create a separate question for this, as it is obviously off-topic. Also, it requires a further elaboration as it is not clear what specifically you are struggling with.

Check that MassTransit endpoints are reachable

We're use MassTransit with RabbitMQ. Is there a way to check that endpoints aren't available before we publish any messages? I want to setup our IoC to use another strategy if servicebus isn't available and I don't want to get to the point when I'll catch RabbitMQ.Client.Exceptions.BrockerUnreachableException on publishing messages.
If you're using a container, you could create a decorator that could monitor the outcome of the Publish method call, and if it starts throwing exceptions, you could switch the calls over to an alternative publisher.
Ideally such an implementation would include some type of progressive retry capability so that once the endpoint becomes available the calls resume back to the actual endpoint, as well as triggering some replay of the previously failed messages to the endpoint as well.
I figure you're already dealing with the need to have an alternative storage available, such as a local endpoint or some sort of local storage.
Not currently, you can submit an issue requesting that feature: https://github.com/MassTransit/MassTransit/issues. It's not trivial to implement, but maybe not impossible.
A couple of other options people have done include a remote cluster or having a local instance to forward/cluster across all machines included in the bus.

Spring Batch or JMS for long running jobs

I have the problem that I have to run very long running processes on my Webservice and now I'm looking for a good way to handle the result. The scenario : A user executes such a long running process via UI. Now he gets the message that his request was accepted and that he should return some time later. So there's no need to display him the status of his request or something like this. I'm just looking for a way to handle the result of the long running process properly. Since the processes are external programms, my application server is not aware of them. Therefore I have to wait for these programms to terminate. Of course I don't want to use EJBs for this because then they would block for the time no result is available. Instead I thought of using JMS or Spring Batch. Does anyone ever had the same problem or an advice which solution would be better?
It really depends on what forms of communication your external programs have available. JMS is a very good approach and immediately available in your app server but might not be the best option if your external program is a long running DB query which dumps the result in a text file...
The main advantage of Spring Batch over "just" using JMS as an aynchronous communcations channel is the transactional properties, allowing the infrastructure to retry failed jobs, group jobs together and such. Without knowing more about your specific setup, it is hard to give detailed advise.
Cheers,
I had a similar design requirement, users were sending XML files and I had to generate documents from them. Using JMS in this case is advantageous since you can always add new instances of these processes which can consume and execute the jobs in parallel.
You can use a timer task to check status or monitor these processes. Also, you can publish a message to a JMS queue once the processes are completed.

Resources