S3 connection Pool TimeOut while Streaming - spring

I am having trouble streaming file back to client. I am fetching the file from s3 bucket fine. S3 connection pool is configured to be 1024. After 1024 requests, I am seeing "connect timeouts" to s3. I have a hunch there is a memory leak since I am not closing the s3 stream.
The thing is, since I have to serve up large pdf files, I can't load the data in memory in a bytes array. Is there is a way to stream the data back to client without causing this memory leak?
I tried loading the file in bytes[], closing the s3 stream using try with resource and streaming back to client using ByteArrayInputStream. That works fine but I don't want to load the whole file in memory since the size of files could be very large.
// Controller Code
#GetMapping(value = "/pdf")
public ResponseEntity getStatementPDF(#PathVariable("fileName") #AlphanumericIdConstraint String fileName) {
ResponseEntity response = service.getPDF(fileName);
log.info("Retrieved PDF file for fileName={}", fileName);
return response;
}
// Service Layer Code
public ResponseEntity getPDF(String fileName) {
S3Object s3Object = s3Client.getObject("bucket", fileName);
InputStream stream = s3Object.getObjectContent();
InputStream decryptedFile = decryptor.decrypt(stream);
HttpHeaders httpHeaders = new HttpHeaders();
httpHeaders.setContentType(MediaType.TEXT_PLAIN);
httpHeaders.setContentDisposition(createContentDisposition(fileName));
httpHeaders.set(CONTENT_ID_HEADER, fileName);
return new ResponseEntity<>(new InputStreamResource(decryptedFile), httpHeaders, HttpStatus.OK);
}
Error:
Caused by: com.amazonaws.http.exception.HttpRequestTimeoutException: Request did not complete before the request timeout configuration.
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1250)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)

I believe your impression is wrong. Especially in your case you are using S3Client to open the stream, then do some operations and leave it open. If you think about it Spring doesn't have any way of closing it or even knowing about it.
However Spring also provides a Spring Cloud AWS integration which is a higher-level API for AWS services, but even this thing doesn't seem to be managing your IO for you 100% of time.
Aaaand the morale is: close your streams. :)

Related

How do I get rid of okhttp3 "A connection to ... was was leaked" warning when returning a minio stream as spring ResponseEntity?

I am reading a file with minio and have a REST controller that returns the inputstream given by minio as an InputStreamResource. Here is my code:
#GetMapping("/download")
fun download(): ResponseEntity<InputStreamResource> {
// read file from minio
// getObjectResponse is an InputStream
...
val getObjectResponse = minioClient.getObject(getObjectArgs)
return ResponseEntity.ok().body(InputStreamResource(getObjectResponse))
}
According to this question wrapping an InputStream into a InputStreamResource is correct, and spring is supposed to close the underlying InputStream after the reponse is delivered. Yet I still get the infamous
okhttp3.OkHttpClient: A connection to ... was leaked. Did you forget to close a response body?
What are my options here? I would rather not need to copy and buffer the minio content into memory as these files tend to be very large.
As long as you close any of Response, ResponseBody or a stream like ResponseBody.inputStream() then the OkHttp response will be cleaned up.
You are reliant on the caller who accepts your InputStreamResource to close it correctly.
I'd suggest wrapping/decorating the input stream you pass into InputStreamResource and logging when and where call is closed.
By using Yuri Schimkes suggestion to improve logging and some debugging I found out that spring is closing the inputstream only when a HTTP 200 is returned and the inpustream is actually read and delivered. In my case sometimes caching happened (spring magic with etags) and spring returned a HTTP 304 without consuming and closing the inputstream.

How can I send a streamed response using OkHttp's mockwebserver?

The typical flow when returning the contents of file from a server back to the client are to:
1.) Obtain an inputstream to the file
2.) Write chunks of the stream to the open socket
3.) Close the input stream
When using OkHttp's mockwebserver the MockResponse only accepts a Okio buffer. This means we must read the entire input stream contents into the buffer before sending it. This will probably result in an OutOfMemory exception if the file is too large. Is there a way to accomplish the logic flow I outlined above without using a duplex response or should I use another library? Here's how I'm currently sending the file in kotlin:
val inputStream = FileInputStream(file)
val source = inputStream.source()
val buf = Buffer()
buf.writeAll(source.buffer())
source.close()
val response = HTTP_200
response.setHeader("Content-Type", "video/mp4")
response.setBody(buf)
return response
// Dispatch the response, etc...
This is a design limitation of MockWebServer, guaranteeing that there’s no IOExceptions on the serving side. If you have a response that's bigger than you can keep in-memory, MockWebServer is the wrong tool for the job.

Stream response from HTTP client with Spring/Project reactor

How to stream response from reactive HTTP client to the controller without having the whole response body in the application memory at any time?
Practically all examples of project reactor client return Mono<T>. As far as I understand reactive streams are about streaming, not loading it all and then sending the response.
Is it possible to return kind of Flux<Byte> to make it possible to transfer big files from some external service to the application client without a need of using a huge amount of RAM memory to store intermediate result?
It should be done naturally by simply returning a Flux<WHATEVER>, where each WHATEVER will be flushed on the network as soon as possible. In such a case, the response uses chunked HTTP encoding, and the bytes from each chunk are discarded once they've been flused to the network.
Another possibility is to upgrade the HTTP response to SSE (Server Sent Events), which can be achieved in WebFlux by setting the Controller method to something like #GetMapping(path = "/stream-flux", produces = MediaType.TEXT_EVENT_STREAM_VALUE) (the produces part is the important one).
I dont think that in your scenario you need to create an event stream because event stream is more used to emit event in real time i think you better do it like this.
#GetMapping(value = "bytes")
public Flux<Byte> getBytes(){
return byteService.getBytes();
}
and you can send it es a stream.
if you still want it as a stream
#GetMapping(value = "bytes",produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<List<Byte>> getBytes(){
return byteService.getBytes();
}

How to test if large objects are been chunked?

I have a web API controller with a POST method as follows.
public class MyController : ApiController
{
// POST: api/Scoring
public HttpResponseMessage Post([FromBody]ReallyLargeJSONObject request)
{
// some processing of this large json object
return Request.CreateResponse(HttpStatusCode.OK, someResponseObject);
}
....
}
This is consumed by a HTTPClient as follows
HttpClient httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
httpClient.BaseAddress = new Uri("http://localhost");
ReallyLargeJSONObject request = new ReallyLargeJSONObject();
var task = httpClient.PostAsJsonAsync("api/my", request)
I have read at a few places that in .NET 4.5, HttpClient class streams the data (and doesn't buffer it). That's great as this way my server will not get overloaded with large packets. However I would like to test this. For this, I have made size of my ReallyLargeJSONObject instance from the client to be ~20MB. I also try with even large packets (~1GB). When I use fiddler, it shows only one request going to server. My questions:
Should I see multiple request going to server in fiddler?
If set breakpoints in the MyController.Post method, should it be hitting multiple times when data is been streamed?
You should not be seeing multiple requests nor the Post method being hit multiple times as it would be happening at a lower level/method call.
To actually see the chunks broken up and being sent over the wire you can use something like Wireshark to monitor network activity. With this you'll be able to see how long it's taking, how many packets are being used, how big each packet is, etc.
Reference https://www.wireshark.org
Reading on streams: Can you explain the concept of streams?
Reading on packets: https://en.wikipedia.org/wiki/Packet_segmentation

File upload progress bar using RestTemplate.postForLocation

I have a Java desktop client application that uploads files to a REST service.
All calls to the REST service are handled using the Spring RestTemplate class.
I'm looking to implement a progress bar and cancel functionality as the files being uploaded can be quite big.
I've been looking for a way to implement this on the web but have had no luck.
I tried implementing my own ResourceHttpMessageConverter and substituting the writeInternal() method but this method seems to be called during some sort of buffered operation prior to actually posting the request (so the stream is read all in one go before sending takes place).
I've even tried overriding the CommonsClientHttpRequestFactory.createRequest() method and implementing my own RequestEntity class with a special writeRequest() method but the same issue occurs (stream is all read before actually sending the post).
Am I looking in the wrong place? Has anyone done something similar.
A lot of the stuff I've read on the web about implementing progress bars talks about staring the upload off and then using separate AJAX requests to poll the web server for progress which seems like an odd way to go about it.
Any help or tips greatly appreciated.
This is an old question but it is still relevant.
I tried implementing my own ResourceHttpMessageConverter and substituting the writeInternal() method but this method seems to be called during some sort of buffered operation prior to actually posting the request (so the stream is read all in one go before sending takes place).
You were on the right track. Additionally, you also needed to disable request body buffering on the RestTemplate's HttpRequestFactory, something like this:
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory();
clientHttpRequestFactory.setBufferRequestBody(false);
RestTemplate restTemplate = new RestTemplate(clientHttpRequestFactory);
Here's a working example for tracking file upload progress with RestTemplate.
There was not much detail about what this app is, or how it works so this response is vague but I believe you can do something like this to track your upload progress.
If this really is a Java Client App (i.e. Not HTML/JavaScript but a java program) and you really are having it upload a file as a stream then you should be able to track your upload progress by counting the bytes in the array being transmitted in the stream buffer and comparing that to the total byte count from the file object.
When you get the file get its size.
Integer totalFile = file.getTotalSpace();
Where ever you are transmitting as a stream you are presumably adding bytes to a output buffer of some kind
byte[] bytesFromSomeFileReader = [whatEverYouAreUsingToReadTheFile];
ByteArrayOutputStream byteStreamToServer = new ByteArrayOutputStream();
Integer bytesTransmitted = 0;
for (byte fileByte : bytesFromSomeFileReader) {
byteStreamToServer.write(fileByte);
//
// Update your progress bar every killo-byte sent.
//
bytesTransmitted++;
if( (bytesTransmitted % 1000) = 0) {
someMethodToUpdateProgressBar();
}
}

Resources