How can I transfer bytes in chunks to clients? - byte

SignalR loses many messages when I transfer chunks of bytes from client over server to client (or client to server; or server to client).
I read the file into a stream and sent it over a hub or persistent connection to other client. This runs very fast, but there are always messages dropped or lost.
How can I transfer large files (in chunks or not) from client to client without losing messages?

As #dfowler points out, it's not the right technology for the job. What I would recommend doing is sending a message that there is a file to be downloaded that includes the link and then you can download that file using standard GET requests against either static files or some web service written with ASP.NET WebAPI.

SignalR isn't for file transfer, it's for sending messages.

Why isn't it the right technology? If a client needs to send some data to a signalR hub it should be able to over the signalR connection without requiring additional stuff.
In fact it works fine when sending a byte array, at least for me, however I encountered similar problems when transferring chunks.
Perhaps you can do some tests to check if the order in which you send the chunks is the same as the order they are received.
UPDATE
I did a test myself and in my case the order was indeed the problem. Modified the hub method receiving the chunks to accept an order parameter which I then use to reconstruct the byte array at the end and it works fine. Having said this I however now understand that this wouldn't work well with large file transfers.
In my case I don't need to transfer very large amounts of data, I just wanted to give my UI an indication of progress, requiring the data to be sent in chunks.

Related

Preventing data loss in client authoritative database writes

A project I'm working on requires users to insert themselves into a list on a server. We expect a few hundred users over a weekend and while very unlikely, a collision could happen in which two users submit the list concurrently and one of them is lost. The server has no validation, it simply allows you to get and put data.
I was pointed in the direction of "optimistic locking" but I'm having trouble grasping when exactly the data should be validated and how it prevents this from happening. If one of the clients reads the data, adds itself and then checks again to ensure that the data is the same with the use of an index or timestamp, how does this prevent the other client from doing the same and then one overwriting the other?
I'm trying to understand the flow in the context of two clients getting data and putting data.
The point of optimistic locking is that the decision to accept or reject a write is taken on the server, and is protected against concurrency by a pessimistic transaction or some sort of hardware protection, such as compare-and-swap. So a client requests a write together with some sort of timestamp or version identifier, and the server only accepts the write if the timestamp is still accurate. If it isn't the client gets some sort of rejection code and will have to try again. If it is, the client gets told that its write succeeded.
This is not the only way to handle receiving data from multiple clients. One popular alternative is to use a reliable messaging system - for example the Java Messaging Service specifies an interface for such systems for which you can find open source implementations. Clients write into the messaging system and can go away as soon as their message is accepted. The server reads requests from the messaging system and acts on them. If the server or the network goes down it's no big deal: the messages will still be there to be read when they come back (typically they are written to disk and have the same level of protection as database data although if you look at a reliable message queue implementation you may find that it is not, in fact, built on top of a standard database table).
One example of a writeup of the details of optimistic locking is the HTTP server Etag specification e.g. https://en.wikipedia.org/wiki/HTTP_ETag

Exchange files (up to many GB)

For my project, I have to create a file manager which aims at storing many files (from many locations) and exposing URL to download them.
In a micro-service ecosystem (I am used to use spring boot), I wonder what is the best way to exchange such files, I mean sending files to file manager?
On a one hand, I always thought it is better to exchange them asynchronously, so HTTP does not seem a good choice. But maybe I am wrong.
Is it a good choice to split files into fragments (in order to reduce number of bytes for each part) and send each of them through something like RabbitMQ or Kafka? Or should I rather transfer entire files on a NAS or through FTP and let file manager handling them? Or something else, like for example storing bytes in a temp database (maybe not a good choice)...
The problem of fragmentation is I have to implement a logic for keeping sort of each fragments which complicates processing of queues of topics.
IMO, never send actual files through a message broker.
First, setup some object storage system, for example S3 (with AWS or locally with Ceph), then send the path to the file as a string with the producer, then have the consumer read that path, and download the file.
If you want to collect files off of NAS or FTP, then Apache NiFi is one tool that has connectors to systems like that.
Based on my professional experience working with distributed systems (JMS based), to transfer huge content between participants:
a fragment approach should be used for request - reply model + control signals (has next, fragment counter)
delta approach for updates.
To avoid corrupt data, a hash function result can also be transmitted and checked in both scenarios.
But as mentioned in this e-mail thread, a better approach is to use FTP for this kind of scenarios:
RabbitMQ should actually not be used for big file transfers or only
with great care and fragmenting the files into smaller separate
messages.
When running a single broker instance, you'd still be safe, but in a
clustered setup, very big messages will break the cluster.
Clustered nodes are connected via 1 tcp connection, which must also
transport a (erlang) heartbeat. If your big message takes more time to
transfer between nodes than the heartbeat timeout (anywhere between
~20-45 seconds if I'm correct), the cluster will break and your
message is lost.
The preferred architecture for file transfer over amqp is to just send
a message with a link to a downloadable resource and let the file
transfer be handle by specialized protocol like ftp :-)
Hope it helps.

Measuring websocket transfer rate

I am writing a frontend application that needs to know what is the current transfer rate with the server. When using HTTP, this is easy enough thanks to the performance API. You can access many measurement of the HTTP call by just using
performance.getEntriesByName(url);
However it seems WebSockets are not covered by this API. So I have been trying to find a way to do the same without it.
It seems the MessageEvent has a timestamp which indicates the time at which the event is created. However, there isn't much documentation about when that MessageEvent is created. Is it at the reception of the first byte (which I hope) or is it once the whole message is downloaded (which is probably the case). Does anyone happen to know if there is more detail on how WebSockets messages are managed by browser?
More generally, how do you measure your WebSockets transfer rate from the frontend without server side help?

socket io - Emit an event every X seconds or just emit it after a POST event?

I'm using socket io, and I was wondering what was better.
Emiting an event every X seconds to keep always updated with the database or emit the event after e.g a POST event, so it's more efficient.
I believe updating X seconds should be easier, and maybe has better scalability, but don't know if that's the correct way.
EDIT-1: To give more context. The application is for an accounting team. They basically want their excel sheets converted to a app. They have a lot of data, so I don't know if emitting an event every X seconds is a good idea.
Thanks.
There is no "correct" way. It depends entirely upon the needs of your client and the capabilities of your server. If the client needs to be kept more instantly up-to-date, then send data from your server to the client whenever the server has new data. If the client only needs to be updated every once-in-a-while, then only send it data every once-in-a-while. There is no "correct" way. It depends upon your application.
It is always more efficient to only send data to the client when the data has actually changed and when the client actually cares that something has changed. So, it would be foolish to send a client update every few seconds if the data isn't actually changing that often. If you have a means of knowing when the data changes on the server, then use that event to know when to send data to the client and even then, don't send it more often than the client actually cares to know.
It is always more efficient to have the server do no more work than is actually required by the client. Things like caching and keeping track of what each client was last sent can sometimes save lots of work for the server too.
Any further advice on this matter would need to know a lot more about the needs of your application and how this particular data fits into that and how often the data in question actually changes.
A summary on this topic:
Send data to the client no more often than it needs it
Sending data to the client that has not changed since the last time you changed it is inefficient for the server and consumes bandwidth.
Only you can decide how often your client needs updates (it depends upon your application)
Only you can test the impact on scalability of sending data to every client every time the data changes.
Server-side caching and keeping track of what client already has what data can help you avoid sending data to a client that it already has.
Server-side scalability probably has a lot to do with how many simultaneous clients are connected and how frequently there is changed data to send them.

Sending files over MSMQ

In a retail scenario where each stores report their daily transaction to the backend system at the end of the day. Today a file consisting of the daily transactions and some other meta information is transferred from the stores to the backend using FTP. I’m currently investigating replacing FTP with something else. MSMQ has been suggested as an alternative transport mechanism. So my question is, do we need to write a custom windows service that sticks the daily transactions file into a message object and sends it on its way or is there any out the box mechanism in MSMQ to handle this?
Also, since the files we want to transfer can reach 5-6 Mb for large stores should we rule out MSMQ? In that case is there any other suggested technologies we should investigate?
Cheers!
NServiceBus provides a nice abstraction over MSMQ for situations like this. You get the reliable messaging aspects of MSMQ, along with a very nice programming model for defining your messages.
MSMQ is limited to a 4MB message size, however, and there are two ways you could deal with this in NServiceBus:
NServiceBus has a concept called the Data Bus, which takes the large attachments in your messages and transmits them reliably using another method. This is handled by the infrastructure and as far as your message handlers are concerned, the data is just there.
You could break up the payload into smaller atomic messages and send them as normal messages. The NServiceBus infrastructure would ensure that they all arrive at their destination and are processed. I would recommend this method unless it's absolutely critical that the entire huge data dump is processed as one atomic transaction.
One other thing to note is that the fact that you do nightly dumps is probably a limitation of a previous system. With NServiceBus it may be possible to change the system so that these bits of information are sent in a more immediate fashion, which will result in much more up-to-date data all the time, which may be a big win for the business.
You can look at IBM Sterling Managed File Transfer and WebSphere MQ Managed File Transfer products.
You can consider WebSphere MQ MFT if you require both messaging and file transfer capabilities. On the other hand if your requirement is just file transfer then you can look at Sterling MFT.
Sending files over a messaging transport is not trivial. If you put the entire file into a single message you can have the atomicity you need but tuning the messaging provider for wide variance in message sizes can be challenging. If all the files are of about the same size, one per message is about the simplest solution.
On the other hand, you can split the files into multiple messages but then you've got to reassemble them, in the right order, include a protocol to detect and resend missing segments, integrity-check the file received against the file sent, etc. You also probably want to check that the files on either end did not change during the transmission.
With any of these systems you also need the system to be smart enough to manage the disposition of sending and receiving files under normal and exception conditions, log the transfers, etc.
So when considering whether to move to messaging the two best options are either to move natively to messaging and give up files altogether, or to use an enterprise managed file transfer solution that runs atop the messaging provider that you choose. None of the off-the-shelf MFT products will cost as much in the long run as developing it yourself if you wish to do it right with robust exception handling and reporting.
If the stores are on separate networks and communicating over the internet, then MSMQ is not really an option. NServiceBus provides a concept of a gateway, which allows to asynchronously transport MSMQ messages over HTTP or HTTPS.

Resources