Is there a technique for "streaming" to an HTTP Reply Node in an IIB message flow? - ibm-integration-bus

I'm a newbie and I have an IIB message flow that looks like this:
HTTP Input Node -> File Read Node -> HTTP Reply Node.
(1) HTTP Input Node - flow is started by a user initiating an https request.
(2) File Read Node - message flow gets a file from our ftp server (local file directory, file name, sftp host and credentials, and sftp server directory all temporarily hardcoded just for ease in troubleshooting a solution, but in real life this info would get extracted from the http request).
(3) HTTP Reply Node - reply to user with contents of file retrieved from the ftp server.
The problem is this: The message flow causes a "java/lang/OutOfMemoryError" heap dump when the File Read Node reads large files (I've been testing by running multiple instances of the message flow with a 124 MG file).
Background: This is a legacy ASP/VBScript application that was rewritten in IIB. Not by me. But I've been tasked with fixing the memory error.
More Info: From reading, I understand why the committed memory gets all used up. I've read all I can on large message processing but because this application's main purpose is to allow users to pick up files from our ftp server via ** https **, I seem to be tied to that HTTP Reply Node and thus the entire file has to be available in the MBMessageAssembly object (and therefore in memory) when the flow hits the "in" terminal of the HTTP Reply Node.
My question is this: Is there a technique for "streaming" to the HTTP Reply Node? Or, is it possible to remove the HTTP Reply Node altogether and do the SFTP and the HTTP Reply in a Java Compute Node (I can't use a Compute Node/ESQL since we aren't licensed for that)? Or, any other ideas?

Good problem statement, and thanks for doing the legwork before posting your question.
The fundamental problem here is that IIB is designed for processing messages, not (huge) files. You can experiment with increasing the heap size for all of
the HTTP listener (broker-wide or EG-specific, be sure to adjust the correct one).
the execution group
the JVM.
Note that IIB does not store the message tree itself in the JVM heap, even when you are using the JavaCompute node to manipulate it. But the FileRead node uses the JVM, and it clearly needs more Java heap.

Related

How to handle http stream responses from within a Substrate offchain worker?

Starting from the Substrate's Offchain Worker recipe that leverages the Substrate http module, I'm trying to handle http responses that are delivered as streams (basically interfacing a pubsub mechanism with a chain through a custom pallet).
Non-stream responses are perfectly handled as-is and reflecting them on-chain with signed transactions is working for me, as advertised in the doc.
However, when the responses are streams (meaning the http requests are never completed), I can only see the stream data logs in my terminal when I shut down the Substrate node. Trying to reflect each received chunk as a signed transaction doesn't work either: I can also see my logs only on node shut down, and the transaction is never sent (which makes sense since the node is down).
Is there an existing pattern for this use case? Is there a way to get the stream observed in background (not in the offchain worker runtime)?
Actually, would it be a good practice to keep the worker instance running ad vitam for this http request? (knowing that in my configuration the http request is sent only once, via a scheme of command queue - in the pallet storage - that gets cleaned at each block import).

Exchange files (up to many GB)

For my project, I have to create a file manager which aims at storing many files (from many locations) and exposing URL to download them.
In a micro-service ecosystem (I am used to use spring boot), I wonder what is the best way to exchange such files, I mean sending files to file manager?
On a one hand, I always thought it is better to exchange them asynchronously, so HTTP does not seem a good choice. But maybe I am wrong.
Is it a good choice to split files into fragments (in order to reduce number of bytes for each part) and send each of them through something like RabbitMQ or Kafka? Or should I rather transfer entire files on a NAS or through FTP and let file manager handling them? Or something else, like for example storing bytes in a temp database (maybe not a good choice)...
The problem of fragmentation is I have to implement a logic for keeping sort of each fragments which complicates processing of queues of topics.
IMO, never send actual files through a message broker.
First, setup some object storage system, for example S3 (with AWS or locally with Ceph), then send the path to the file as a string with the producer, then have the consumer read that path, and download the file.
If you want to collect files off of NAS or FTP, then Apache NiFi is one tool that has connectors to systems like that.
Based on my professional experience working with distributed systems (JMS based), to transfer huge content between participants:
a fragment approach should be used for request - reply model + control signals (has next, fragment counter)
delta approach for updates.
To avoid corrupt data, a hash function result can also be transmitted and checked in both scenarios.
But as mentioned in this e-mail thread, a better approach is to use FTP for this kind of scenarios:
RabbitMQ should actually not be used for big file transfers or only
with great care and fragmenting the files into smaller separate
messages.
When running a single broker instance, you'd still be safe, but in a
clustered setup, very big messages will break the cluster.
Clustered nodes are connected via 1 tcp connection, which must also
transport a (erlang) heartbeat. If your big message takes more time to
transfer between nodes than the heartbeat timeout (anywhere between
~20-45 seconds if I'm correct), the cluster will break and your
message is lost.
The preferred architecture for file transfer over amqp is to just send
a message with a link to a downloadable resource and let the file
transfer be handle by specialized protocol like ftp :-)
Hope it helps.

How to handle global resources in Spring State Machine?

I am thinking of using Spring State Machine for a TCP client. The protocol itself is given and based on proprietary TCP messages with message id and length field. The client sets up a TCP connection to the server, sends a message and always waits for the response before sending the next message. In each state, only certain responses are allowed. Multiple clients must run in parallel.
Now I have the following questions related to Spring State machine.
1) During the initial transition from disconnected to connected the client sets up a connection via java.net.Socket. How can I make this socket (or the DataOutputStream and BufferedReader objects got from the socket) available to the actions of the other transitions?
In this sense, the socket would be some kind of global resource of the state machine. The only way I have seen so far would be to put it in the message headers. But this does not look very natural.
2) Which runtime environment do I need for Spring State Machine?
Is a JVM enough or do I need Tomcat?
Is it thread-safe?
Thanks, Wolfgang
There's nothing wrong using event headers but those are not really global resources as header exists only for duration of a event processing. I'd try to add needed objects into an machine's extended state which is then available for all actions.
You need just JVM. On default machine execution is synchronous so there should not be any threading issues. Docs have notes if you want to replace underlying executor asynchronous(this is usually done if multiple concurrent regions are used).

How can I transfer bytes in chunks to clients?

SignalR loses many messages when I transfer chunks of bytes from client over server to client (or client to server; or server to client).
I read the file into a stream and sent it over a hub or persistent connection to other client. This runs very fast, but there are always messages dropped or lost.
How can I transfer large files (in chunks or not) from client to client without losing messages?
As #dfowler points out, it's not the right technology for the job. What I would recommend doing is sending a message that there is a file to be downloaded that includes the link and then you can download that file using standard GET requests against either static files or some web service written with ASP.NET WebAPI.
SignalR isn't for file transfer, it's for sending messages.
Why isn't it the right technology? If a client needs to send some data to a signalR hub it should be able to over the signalR connection without requiring additional stuff.
In fact it works fine when sending a byte array, at least for me, however I encountered similar problems when transferring chunks.
Perhaps you can do some tests to check if the order in which you send the chunks is the same as the order they are received.
UPDATE
I did a test myself and in my case the order was indeed the problem. Modified the hub method receiving the chunks to accept an order parameter which I then use to reconstruct the byte array at the end and it works fine. Having said this I however now understand that this wouldn't work well with large file transfers.
In my case I don't need to transfer very large amounts of data, I just wanted to give my UI an indication of progress, requiring the data to be sent in chunks.

How to check which point is cause of problem with MQ?

I use MQ for send/receive message between my system and other system. Sometime I found that no response message in response queue, yet other system have already put response message into response queue (check from log). So, how to check which point is cause of problem, how to prove message is not arrive to my response queue.
In addition, when message arrive my queue it will be written to log file.
You can view this in real-time using the QStats interface. The MO71 SupportPac is a desktop client that you can configure to connect similar to WebSphere MQ Explorer. One of the options it has is queue statistics. Each time you view the queue stats, WMQ resets them to zero. So the procedure is this:
Start MO71 and browse the queues.
Filter on the one queue of interest.
View the queue stats a couple of times.
You will see them reset to zero.
Now run your test.
View the queue stats again.
If the remote program actually put a message, you will see that the queue now shows one or more messages PUT.
If your program successfully executed a GET of the message, you will see GET counts equal to the number of PUT counts.
If GET and PUT both zero, the remote program never PUT the response message.
There are a few other approaches to this but this is the easiest. The opposite end of the spectrum is SupportPac MA0W which will show you every API call against that queue, or by PID, or whatever. It shows all the options so if a program tries to open the queue with the wrong options (i.e. open a remote queue for input) it shows that. But MA0W is a installed as an exit and requires the QMgr to be bounced so it's a little invasive.

Resources