Does Apache Storm have an HTTP API? - apache-storm

I would like to just HTTP POST events into a spout. Do I need to set up a web server myself, or would that be redundant? All of the tutorials that I have seen so far assume that an application will be fetching (or even just generating) the data itself and passing it to emit-spout!.

Storm used a pull based model in Spouts.nextTuple(). Thus, it might be best to have a buffer in between -- a WebServer takes HTTP POST requests and writes into that buffer. A Spout can pull the date from the buffer.

Related

How to handle http stream responses from within a Substrate offchain worker?

Starting from the Substrate's Offchain Worker recipe that leverages the Substrate http module, I'm trying to handle http responses that are delivered as streams (basically interfacing a pubsub mechanism with a chain through a custom pallet).
Non-stream responses are perfectly handled as-is and reflecting them on-chain with signed transactions is working for me, as advertised in the doc.
However, when the responses are streams (meaning the http requests are never completed), I can only see the stream data logs in my terminal when I shut down the Substrate node. Trying to reflect each received chunk as a signed transaction doesn't work either: I can also see my logs only on node shut down, and the transaction is never sent (which makes sense since the node is down).
Is there an existing pattern for this use case? Is there a way to get the stream observed in background (not in the offchain worker runtime)?
Actually, would it be a good practice to keep the worker instance running ad vitam for this http request? (knowing that in my configuration the http request is sent only once, via a scheme of command queue - in the pallet storage - that gets cleaned at each block import).

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

NiFi getHTTP or invokeHTTP which processor to use?

I have following two scenario and for each one I need recommendation as to which NiFi processor to use:
I have Restful web services running outside NiFi. NiFi would like to get/post/delete/update some data by calling specific restful API. Once the Restful API receives request from NiFi it sends back the response to NiFi. Which NiFi processor to use here?
In 2nd scenario, I have an application running outside NiFi. This application has its own GUI. The user need some information so he want to send request to NiFi. In NiFi, is there any processor which accepts request from application, process the request, and sends response back?
I actually read all the question with getHTTP and invokeHTTP.
I have initially tried with invokeHTTP processor. I tried both get and post call using invokeHTTP. But I don't see any response from Restful API running outside NiFi.
I did not try getHTTP.
I am using NiFi. NiFi do not have code.
I expect NiFi should be able to call Restful API running outside. I expect NiFi should accept request coming from application running outside and process that request.
Yep, NiFi comes bundled with processors that satisfy both of your requirements.
For scenario #1, you can use either a combination of GetHTTP/PostHTTP which as their name implies are HTTP clients that make GET and POST calls respectively. However, later the community came up with InvokeHTTP that offers more features like support for NiFi Expression Language, support for incoming flowfiles, etc.,
For scenario #2, you can either use ListenHTTP or the combination of HandleHttpRequest/HandleHttpResponse. The later literally offers you have a more robust web-service implementation while the former is a simple web-hook kind. I haven't worked much with ListenHTTP so probably can't comment more on that.
Having said that, for your second scenario, if your objective is to consume NiFi statistics, you can directly hit NiFi's rest api, rather than having a separate NiFi flow with web service capability.
Useful Links
https://pierrevillard.com/2016/03/13/get-data-from-dropbox-using-apache-nifi/
https://dzone.com/articles/using-websockets-with-apache-nifi
https://ddewaele.github.io/http-communication-with-apache-nifi/

RabbitMQ keep messages in queue

I am streaming a tty's stdout and stderr to RabbitMQ (logs to be exact). These logs can be viewed on a website and while the content is streamed to RabbitMQ they are consumed by the webserver and forwarded to the client using WebSockets. Logs are immediately persisted after sending it to RabbitMQ.
When the user accesses the website the persisted logs are rendered and the consecutive parts are streamed using WebSockets. The problem is that there is a race condition as the persisted logs might be missing chunks of the log that occurred between rendering the site and receiving the first chunk via WebSocket.
My idea was to keep all chunks in the queue and send those via the WebSocket after connecting. Additionally I would add a worker to listen to some kind of a "finished" event which then takes everything in the queue and persists it at once.
The problem is that I don't know if this is possible using RabbitMQ or how. Any ideas or other solutions?
I don't think it really matters but my stack is using Ruby Sinatra and the Bunny RabbitMQ client.
While I agree with your general idea about picking up where you left off, after loading the intial page, what you're trying to do isn't something that should be done from RabbitMQ.
There are a lot of potential problems that this would cause, which I've outlined in a blog post, previously.
Instead of trying to do this w/ RMQ, I would do this from a database layer.
As you push things into the database, you have an ID - hopefully one that is sequential. If not, add a sequence to the entries.
When you load the page for the user, send the current ID that they are at down to the browser.
After the page finishes loading and you're setting up the websocket connection, send the user's current spot in the list of messages via the websocket. then the websocket connection can use that id to say "give me all the messages after this id, and start streaming them"
Again, this is not done via RabbitMQ (see my article on why this is a bad idea), but via your database and sequential IDs.

Jersey and AsyncResponse vs. Redirects

Currently I am using Jersey 1.0 and about to switch to 2.0. For REST requests the may last over a second or two I use the following pattern:
Client calls GET or PUT
Server returns a polling URL to the client
The client polls the URL until it gets a redirect to the completed resource
Pretty standard and straightforward. However, I noticed that Jersey 2.0 has an AsyncResponse capability. But it looks like this is done with no changes on the wire. In other words, the client still blocks for the result while the server is asynchronously processing the request.
So what good is this? Should I be using it instead of my current asynchronous approach for calls >1 second? Or is it really just to keep the connections freed on the server for calls that would be only a few hundred milliseconds?
I want my server to be as scalable as possible but the approach I use now can be tedious for the client. AsyncResponse seems super simple but I'm not sure how it would work for something like a heroku service where you want very short connection times.
AsyncResponse presumably gives you more scalability within the web app server for standard standard requests in terms of thread pooling resources, but I don't think it changes anything about the client experience which will continue to block on read on their connection. Therefore, if you already implemented a polling solution from your client side, this won't add much of any value to you imho.

Resources