Why NiFi HandleHttpRespose process task count is very hight? - apache-nifi

When I connect Retry from invokeHTTP to HandleHttpRespose (or any other process) in NiFi, task count will become high (about 1,000,000 tasks/time) and response slow, what must be the reason?

I can't explain the task count on HandleHttpResponse, that usually would only happen when the processor has the # TriggerWhenEmpty annotation, which means its running all the time when no flow files are available and just doing nothing.
In general, using HandleHttpResponse with InvokeHttp is not going to work, it was made to work with HandleHttpRequest which places accepts a request, creates an entry in the HTTP Context Map, allows the flow to proceed, and then can respond to the original request with HandleHttpRequest.
InvokeHttp is a client making a connection to a server, where as HandleHttpRequest is a server that needs to send a response to a client using HandleHttpResponse. InvokeHttp does not put anything into the HTTP Context Map so there is nothing for HandleHttpRequest to do in that case.
You would typically connect the "retry" relationship of InvokeHttp in a self-loop back to InvokeHttp so it can keep retrying.

Related

How to handle http stream responses from within a Substrate offchain worker?

Starting from the Substrate's Offchain Worker recipe that leverages the Substrate http module, I'm trying to handle http responses that are delivered as streams (basically interfacing a pubsub mechanism with a chain through a custom pallet).
Non-stream responses are perfectly handled as-is and reflecting them on-chain with signed transactions is working for me, as advertised in the doc.
However, when the responses are streams (meaning the http requests are never completed), I can only see the stream data logs in my terminal when I shut down the Substrate node. Trying to reflect each received chunk as a signed transaction doesn't work either: I can also see my logs only on node shut down, and the transaction is never sent (which makes sense since the node is down).
Is there an existing pattern for this use case? Is there a way to get the stream observed in background (not in the offchain worker runtime)?
Actually, would it be a good practice to keep the worker instance running ad vitam for this http request? (knowing that in my configuration the http request is sent only once, via a scheme of command queue - in the pallet storage - that gets cleaned at each block import).

NiFi getHTTP or invokeHTTP which processor to use?

I have following two scenario and for each one I need recommendation as to which NiFi processor to use:
I have Restful web services running outside NiFi. NiFi would like to get/post/delete/update some data by calling specific restful API. Once the Restful API receives request from NiFi it sends back the response to NiFi. Which NiFi processor to use here?
In 2nd scenario, I have an application running outside NiFi. This application has its own GUI. The user need some information so he want to send request to NiFi. In NiFi, is there any processor which accepts request from application, process the request, and sends response back?
I actually read all the question with getHTTP and invokeHTTP.
I have initially tried with invokeHTTP processor. I tried both get and post call using invokeHTTP. But I don't see any response from Restful API running outside NiFi.
I did not try getHTTP.
I am using NiFi. NiFi do not have code.
I expect NiFi should be able to call Restful API running outside. I expect NiFi should accept request coming from application running outside and process that request.
Yep, NiFi comes bundled with processors that satisfy both of your requirements.
For scenario #1, you can use either a combination of GetHTTP/PostHTTP which as their name implies are HTTP clients that make GET and POST calls respectively. However, later the community came up with InvokeHTTP that offers more features like support for NiFi Expression Language, support for incoming flowfiles, etc.,
For scenario #2, you can either use ListenHTTP or the combination of HandleHttpRequest/HandleHttpResponse. The later literally offers you have a more robust web-service implementation while the former is a simple web-hook kind. I haven't worked much with ListenHTTP so probably can't comment more on that.
Having said that, for your second scenario, if your objective is to consume NiFi statistics, you can directly hit NiFi's rest api, rather than having a separate NiFi flow with web service capability.
Useful Links
https://pierrevillard.com/2016/03/13/get-data-from-dropbox-using-apache-nifi/
https://dzone.com/articles/using-websockets-with-apache-nifi
https://ddewaele.github.io/http-communication-with-apache-nifi/

Front-facing REST API with an internal message queue?

I have created a REST API - in a few words, my client hits a particular URL and she gets back a JSON response.
Internally, quite a complicated process starts when the URL is hit, and there are various services involved as a microservice architecture is being used.
I was observing some performance bottlenecks and decided to switch to a message queue system. The idea is that now, once the user hits the URL, a request is published on internal message queue waiting for it to be consumed. This consumer will process and publish back on a queue and this will happen quite a few times until finally, the same node servicing the user will receive back the processed response to be delivered to the user.
An asynchronous "fire-and-forget" pattern is now being used. But my question is, how can the node servicing a particular person remember who it was servicing once the processed result arrives back and without blocking (i.e. it can handle several requests until the response is received)? If it makes any difference, my stack looks a little like this: TomCat, Spring, Kubernetes and RabbitMQ.
In summary, how can the request node (whose job is to push items on the queue) maintain an open connection with the client who requested a JSON response (i.e. client is waiting for JSON response) and receive back the data of the correct client?
You have few different scenarios according to how much control you have on the client.
If the client behaviour cannot be changed, you will have to keep the session open until the request has not been fully processed. This can be achieved employing a pool of workers (futures/coroutines, threads or processes) where each worker keeps the session open for a given request.
This method has few drawbacks and I would keep it as last resort. Firstly, you will only be able to serve a limited amount of concurrent requests proportional to your pool size. Lastly as your processing is behind a queue, your front-end won't be able to estimate how long it will take for a task to complete. This means you will have to deal with long lasting sessions which are prone to fail (what if the user gives up?).
If the client behaviour can be changed, the most common approach is to use a fully asynchronous flow. When the client initiates a request, it is placed within the queue and a Task Identifier is returned. The client can use the given TaskId to poll for status updates. Each time the client requests updates about a task you simply check if it was completed and you respond accordingly. A common pattern when a task is still in progress is to let the front-end return to the client the estimated amount of time before trying again. This allows your server to control how frequently clients are polling. If your architecture supports it, you can go the extra mile and provide information about the progress as well.
Example response when task is in progress:
{"status": "in_progress",
"retry_after_seconds": 30,
"progress": "30%"}
A more complex yet elegant solution would consist in using HTTP callbacks. In short, when the client makes a request for a new task it provides a tuple (URL, Method) the server can use to signal the processing is done. It then waits for the server to send the signal to the given URL. You can see a better explanation here. In most of the cases this solution is overkill. Yet I think it's worth to mention it.
One option would be to use DeferredResult provided by spring but that means you need to maintain some pool of threads in request serving node and max no. of active threads will decide the throughput of your system. For more details on how to implement DeferredResult refer this link https://www.baeldung.com/spring-deferred-result

how to use wait notify processor for stopping invokehttp processor?

I want to stop my invokehttp proccesor when it has a failure (when the invokehttp processor fails, the notify processor should "notify" the wait processor about failure and it should make invokehttp processor wait or stop).
I tried to use wait/notify processor for it but the notify processor throws an exception like this:
Question is possible duplicate of this
How to use wait\notify Processor?
For that solution is.,
You will need to create and start a DistributedMapCacheServer and DistributedMapCacheClient. The client needs to be configured with the port and host that the server is listening on.
Then the Wait and Notify processors use the DistributedMapCacheClient.
Since you were able to start the processors, you likely already have the client setup, but you don't have the server running.

NiFi ListenHTTP GET request?

I am currently using the ListenHTTP processor to accept flow files from a different NiFi instance. This works fine but for some reason GET requests do not work. Does ListenHTTP only allow POST requests ?
This is the error I receive:
HTTP method GET is not supported by this URL
P. S I am aware of a more advanced HandleHTTPRequest processor.
Yes, ListenHTTP only accepts POST and HEAD requests. GET, PUT, and DELETE are not accepted by the processor and will return a 405 HTTP Status Code. The documentation of the processor could be improved to document this.
You are correct that to handle GET requests, you should use the HandleHTTPRequest processor.
However, if your use case is transmitting flowfiles between two NiFi instances, you will get much better behavior and performance by using the Site to Site capability. This can be routed over HTTP(S) or raw sockets, provides security, integrity, load balancing, and many additional features.

Resources