InputStream (or byte[] array) to WebFlux streaming upload (REST API POST) - spring

Trying to post very large files to a REST endpoint I breached netty's data buffer 264k limit. Can you control the size of the chunks sent (e.g. 'buffer' it) both to reduce each message size and reduce the memory footprint needed for the upload. Eventually we'll need to support > 1Gb files.
Here is my current code:
public void upload(final byte[] fileDataBytes) {
webClient.post().uri(uri)
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.accept(APPLICATION_JSON)
.bodyValue(fileDataBytes)
.exchange().block();
I've read that I need to write a producer and consumer but can't see any examples for posts (only gets) and none seem that applicable.

Related

How can I send a streamed response using OkHttp's mockwebserver?

The typical flow when returning the contents of file from a server back to the client are to:
1.) Obtain an inputstream to the file
2.) Write chunks of the stream to the open socket
3.) Close the input stream
When using OkHttp's mockwebserver the MockResponse only accepts a Okio buffer. This means we must read the entire input stream contents into the buffer before sending it. This will probably result in an OutOfMemory exception if the file is too large. Is there a way to accomplish the logic flow I outlined above without using a duplex response or should I use another library? Here's how I'm currently sending the file in kotlin:
val inputStream = FileInputStream(file)
val source = inputStream.source()
val buf = Buffer()
buf.writeAll(source.buffer())
source.close()
val response = HTTP_200
response.setHeader("Content-Type", "video/mp4")
response.setBody(buf)
return response
// Dispatch the response, etc...
This is a design limitation of MockWebServer, guaranteeing that there’s no IOExceptions on the serving side. If you have a response that's bigger than you can keep in-memory, MockWebServer is the wrong tool for the job.

S3 connection Pool TimeOut while Streaming

I am having trouble streaming file back to client. I am fetching the file from s3 bucket fine. S3 connection pool is configured to be 1024. After 1024 requests, I am seeing "connect timeouts" to s3. I have a hunch there is a memory leak since I am not closing the s3 stream.
The thing is, since I have to serve up large pdf files, I can't load the data in memory in a bytes array. Is there is a way to stream the data back to client without causing this memory leak?
I tried loading the file in bytes[], closing the s3 stream using try with resource and streaming back to client using ByteArrayInputStream. That works fine but I don't want to load the whole file in memory since the size of files could be very large.
// Controller Code
#GetMapping(value = "/pdf")
public ResponseEntity getStatementPDF(#PathVariable("fileName") #AlphanumericIdConstraint String fileName) {
ResponseEntity response = service.getPDF(fileName);
log.info("Retrieved PDF file for fileName={}", fileName);
return response;
}
// Service Layer Code
public ResponseEntity getPDF(String fileName) {
S3Object s3Object = s3Client.getObject("bucket", fileName);
InputStream stream = s3Object.getObjectContent();
InputStream decryptedFile = decryptor.decrypt(stream);
HttpHeaders httpHeaders = new HttpHeaders();
httpHeaders.setContentType(MediaType.TEXT_PLAIN);
httpHeaders.setContentDisposition(createContentDisposition(fileName));
httpHeaders.set(CONTENT_ID_HEADER, fileName);
return new ResponseEntity<>(new InputStreamResource(decryptedFile), httpHeaders, HttpStatus.OK);
}
Error:
Caused by: com.amazonaws.http.exception.HttpRequestTimeoutException: Request did not complete before the request timeout configuration.
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1250)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
I believe your impression is wrong. Especially in your case you are using S3Client to open the stream, then do some operations and leave it open. If you think about it Spring doesn't have any way of closing it or even knowing about it.
However Spring also provides a Spring Cloud AWS integration which is a higher-level API for AWS services, but even this thing doesn't seem to be managing your IO for you 100% of time.
Aaaand the morale is: close your streams. :)

Stream response from HTTP client with Spring/Project reactor

How to stream response from reactive HTTP client to the controller without having the whole response body in the application memory at any time?
Practically all examples of project reactor client return Mono<T>. As far as I understand reactive streams are about streaming, not loading it all and then sending the response.
Is it possible to return kind of Flux<Byte> to make it possible to transfer big files from some external service to the application client without a need of using a huge amount of RAM memory to store intermediate result?
It should be done naturally by simply returning a Flux<WHATEVER>, where each WHATEVER will be flushed on the network as soon as possible. In such a case, the response uses chunked HTTP encoding, and the bytes from each chunk are discarded once they've been flused to the network.
Another possibility is to upgrade the HTTP response to SSE (Server Sent Events), which can be achieved in WebFlux by setting the Controller method to something like #GetMapping(path = "/stream-flux", produces = MediaType.TEXT_EVENT_STREAM_VALUE) (the produces part is the important one).
I dont think that in your scenario you need to create an event stream because event stream is more used to emit event in real time i think you better do it like this.
#GetMapping(value = "bytes")
public Flux<Byte> getBytes(){
return byteService.getBytes();
}
and you can send it es a stream.
if you still want it as a stream
#GetMapping(value = "bytes",produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<List<Byte>> getBytes(){
return byteService.getBytes();
}

gRPC + Image Upload

I want to create a simple gRPC endpoint which the user can upload his/her picture. The protocol buffer declaration is the following:
message UploadImageRequest {
AuthToken auth = 1;
// An enum with either JPG or PNG
FileType image_format = 2;
// Image file as bytes
bytes image = 3;
}
Is this approach of uploading pictures (and recieving pictures) still ok regardless of the warning in the gRPC documentation?
And if not, is the better approach (standard) to upload pictures using the standard form and storing the image file location instead?
For large binary transfers, the standard approach is chunking. Chunking can serve two purposes:
reduce the maximum amount of memory required to process each message
provide a boundary for recovering partial uploads.
For your use-case #2 probably isn't very necessary.
In gRPC, a client-streaming call allows for fairly natural chunking since it has flow control, pipelining, and is easy to maintain context in the client and server code. If you care about recovery of partial uploads, then bidirectional-streaming works well since the server can be responding with acknowledgements of progress that the client can use to resume.
Chunking using individual RPCs is also possible, but has more complications. When load balancing, the backend may be required to coordinate with other backends each chunk. If you upload the chunks serially, then the latency of the network can slow upload speed as you spend most of the time waiting to receive responses from the server. You then either have to upload in parallel (but how many in parallel?) or increase the chunk size. But increasing the chunk size increases the memory required to process each chunk and increases the granularity for recovering failed uploads. Parallel upload also requires the server to handle out-of-order uploads.
the solution provided in the question will not work for files having large sizes. it will only work for smaller image sizes.
the better and standard approach is use chunking. grpc supports streaming a built in. so it is fairly easy to send in chunks
syntax = 'proto3'
message UploadImageRequest{
bytes image = 1;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
in the above way we can use streaming for chunking.
for chunking all the languages provide its own way to chunk file based on chunk size.
Things to take care:
you need to handle the chunking logic, streaming helps in sending naturally.
if you want to send the metadata also there are three approaches.
1: use below structure
message UploadImageRequest{
AuthToken auth = 1;
FileType image_format = 2;
bytes image = 3;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
here bytes is still chunks and for the first chunk send AuthToken and FileType and for all other requests just don't send those metadata.
2: you can also use oneof which is much easier.
message UploadImageRequest{
oneof test_oneof {
Metadata meta = 2;
bytes image = 1;
}
}
message Metadata{
AuthToken auth = 1;
FileType image_format = 2;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
3: just use below structure and in first chunk send metadata and other chunks will have data. you need to handle that in code.
syntax = 'proto3'
message UploadImageRequest{
bytes message = 1;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
lastly for auth you can use headers instead of sending that in message.

File upload progress bar using RestTemplate.postForLocation

I have a Java desktop client application that uploads files to a REST service.
All calls to the REST service are handled using the Spring RestTemplate class.
I'm looking to implement a progress bar and cancel functionality as the files being uploaded can be quite big.
I've been looking for a way to implement this on the web but have had no luck.
I tried implementing my own ResourceHttpMessageConverter and substituting the writeInternal() method but this method seems to be called during some sort of buffered operation prior to actually posting the request (so the stream is read all in one go before sending takes place).
I've even tried overriding the CommonsClientHttpRequestFactory.createRequest() method and implementing my own RequestEntity class with a special writeRequest() method but the same issue occurs (stream is all read before actually sending the post).
Am I looking in the wrong place? Has anyone done something similar.
A lot of the stuff I've read on the web about implementing progress bars talks about staring the upload off and then using separate AJAX requests to poll the web server for progress which seems like an odd way to go about it.
Any help or tips greatly appreciated.
This is an old question but it is still relevant.
I tried implementing my own ResourceHttpMessageConverter and substituting the writeInternal() method but this method seems to be called during some sort of buffered operation prior to actually posting the request (so the stream is read all in one go before sending takes place).
You were on the right track. Additionally, you also needed to disable request body buffering on the RestTemplate's HttpRequestFactory, something like this:
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory();
clientHttpRequestFactory.setBufferRequestBody(false);
RestTemplate restTemplate = new RestTemplate(clientHttpRequestFactory);
Here's a working example for tracking file upload progress with RestTemplate.
There was not much detail about what this app is, or how it works so this response is vague but I believe you can do something like this to track your upload progress.
If this really is a Java Client App (i.e. Not HTML/JavaScript but a java program) and you really are having it upload a file as a stream then you should be able to track your upload progress by counting the bytes in the array being transmitted in the stream buffer and comparing that to the total byte count from the file object.
When you get the file get its size.
Integer totalFile = file.getTotalSpace();
Where ever you are transmitting as a stream you are presumably adding bytes to a output buffer of some kind
byte[] bytesFromSomeFileReader = [whatEverYouAreUsingToReadTheFile];
ByteArrayOutputStream byteStreamToServer = new ByteArrayOutputStream();
Integer bytesTransmitted = 0;
for (byte fileByte : bytesFromSomeFileReader) {
byteStreamToServer.write(fileByte);
//
// Update your progress bar every killo-byte sent.
//
bytesTransmitted++;
if( (bytesTransmitted % 1000) = 0) {
someMethodToUpdateProgressBar();
}
}

Resources