Tomcat Performance with Spring Boot API for File Upload - spring

I have a Spring boot API and one of the endpoints allows users to upload video's. Now My controller basically takes the file as a MultiPart file and then I store it in a temp folder accessible to tomcat. Once I have it stored on Disk, I then push the video to an S3 bucket.
Now to me anyway, this seems to be less than optimal, Like if I wanted to have a 100 or a 1000 users upload at once it seems really non performant to write the files to disk first.
As a little background I'm storing it on disk with the intention that if there is a issue pushing to S3 I can retry
The below code might show what I'm doing better than the above:
public Video addVideo(#RequestParam("title") String title,
#RequestParam("Description") String Description,
#RequestParam(value = "file", required = true) MultipartFile file) {
this.amazonS3ClientService.uploadFileToS3Bucket(file, title, description));
}
Method for storing Video file:
String fileNameWithExtenstion = awsS3FileName + "." + FilenameUtils.getExtension(multipartFile.getOriginalFilename());
//creating the file in the server (temporarily)
File file = new File(tomcatTempDir + fileNameWithExtenstion);FileOutputStream fos = new FileOutputStream(file);
fos.write(multipartFile.getBytes());
fos.close();PutObjectRequest putObjectRequest = new PutObjectRequest(this.awsS3Bucket, awsS3BucketFolder + UnigueId + "/" + fileNameWithExtenstion, file);
if (enablePublicReadAccess) {
putObjectRequest.withCannedAcl(CannedAccessControlList.PublicRead);
}
// Upload a file as a new object with ContentType and title
specified.amazonS3.putObject(putObjectRequest);
//removing the file created in the server
file.delete();
So my question is....is there a better way in Tomcat to:
A) Take in a file via a controllerB) Push to S3

There is no other way to do it with multipart. The problem with multipart that to properly segement parts from the requst they need sometimes skipped or be repeatable. That is impossible within memory w/o having memory to explode. Therefore, Commons FileUpload caches them on disk after a certain threshold is reached.
Multipart requests are the worst way for that. I highly recommend to use either PUT or POST with content type application/octet-stream. You can take the bare request input stream and pass to HttpClient to stream to your backend server. I did this already 5 years ago and it works for gigabytes. I have posted the solution in the Apache HttpClient mailing list.
There is one possibility how this could work under specific conditions:
All parts are in the correct physical order you want to read
Your write to a backend is fast enough to sustain the read from the front
Consume the root part and then go over to the next physical one, process the request body lazily. JAX-WS RI (Metro) has a very nice handling of multipart requests for XOP/MTOM. Learn from that because you won't be able to make it any better.

Perhaps you can try to direct stream the input stream from your MultipartFile to S3.
Consider the following uploadFileToS3Bucket method:
public PutObjectResult uploadFileToS3Bucket(InputStream input, long size, String title, String description) {
// Indicate the length of the information to avoid the need to compute it by the AWS SDK
// See: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html#PutObjectRequest-java.lang.String-java.lang.String-java.io.InputStream-com.amazonaws.services.s3.model.ObjectMetadata-
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setContentLength(size); // rely on Spring implementation. Maybe you probably also can use input.available()
// compute the object name as appropriate
String key = "...";
PutObjectRequest putObjectRequest = new PutObjectRequest(
this.awsS3Bucket, key, input, objectMetadata
);
// The rest of your code
if (enablePublicReadAccess) {
putObjectRequest.withCannedAcl(CannedAccessControlList.PublicRead);
}
// Upload a file as a new object with ContentType and title
return specified.amazonS3.putObject(putObjectRequest);
}
Of course, you need to provide the service the input stream obtained from the client request associated with the MutipartFile object:
public Video addVideo(
#RequestParam("title") String title,
#RequestParam("Description") String Description,
#RequestParam(value = "file", required = true) MultipartFile file) {
try (InputStream input = file.getInputStream()) {
this.amazonS3ClientService.uploadFileToS3Bucket(input, file.getSize(), title, description));
}
}
Probably you can also play with the getBytes method of MultipartFile and create a ByteArrayInputStream to perform the operation.
In addVideo:
byte[] bytes = file.getBytes();
In uploadFileToS3Bucket:
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setContentLength(bytes.length);
PutObjectRequest putObjectRequest = new PutObjectRequest(
this.awsS3Bucket, key, new ByteArrayInputStream(bytes), objectMetadata
);
I would prefer the first solution, but try to determine which option offers you the best performance.

Related

POST a single large file in .net core

I have a .net core 2.1 api application that will download a file from a remote location based on the file name. Here is the code:
static public class FileDownloadAsync
{
static public async Task DownloadFile(string filename)
{
//File name is 1GB.zip for testing
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
using (HttpClient client = new HttpClient())
{
string url = #"http://speedtest.tele2.net/" + filename;
using (HttpResponseMessage response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
using (Stream readFrom = await response.Content.ReadAsStreamAsync())
{
string tempFile = $"D:\\Test\\{filename}";
using (Stream writeTo = File.Open(tempFile, FileMode.Create))
{
await readFrom.CopyToAsync(writeTo);
}
}
stopwatch.Stop();
Debug.Print(stopwatch.Elapsed.ToString());
}
}
}
This is working great, it will pull a 1 gig file down in about 50 seconds. Well within the required download time. I have hard coded a test file to download in this code for testing as well as storage location--these values will ultimately come from a config file when moved into production. Here is the API endpoint that calls this function:
[HttpGet("{fileName}")]
public async Task<string> GetFile(string fileName)
{
await FileDownloadAsync.DownloadFile(fileName);
return "Done";
}
So getting the file from a remote location down to the local server is not a problem. I need some help/guidance on re-posting this file to another API. Once the file is downloaded, there is some work done on the file to prepare it for upload (the files are all MP4 files), and once that work is done, I need to post it to another API for more proprietary processing. Here is the API end point data I have:
POST: /batch/requests Allocates resources to start new batch transcription. Use this method to request[work] on the input
audio data. Upon the accepted request, the response provides
information about the associated request ID and processing status.
Headers: Authorization: Authorization token
Accept: application/json
Content-Type: Indicates the audio format. The value must be:
audio/x-wav;codec=pcm;bit=16;rate=8000;channels=1
audio/x-wav;codec=pcm;bit=16;rate=16000;channels=1
audio/x-raw;codec=pcm;bit=16;rate=8000;channels=1
audio/x-raw;codec=pcm;bit=16;rate=16000;channels=1
video/mp4
Content-Length (optional): The size of the input voice file. Not
required if a chunked transfer is used.
Query string parameters (required):
profileId: one of supported (see GET profiles) customerId: the id of
the customer. A string of minimum 1 and up to 250 alphanumeric, dot
(.) and dash (-) characters.
So I will set the Content-Type to video/MP4 for processing. Note that if the input size is not used if a chunked transfer is used.
Right now, I am more concerned with just posting (streaming) the file in a non-chunked format while we await for more information on what they consider "chunking" a file.
So I am looking for help on steaming the file from disk to the endpoint. Everything I am running across for .net core API is creating the API to download the file from a POST like a Razor page or Angular page--I already have that. I just need some help on "re-posting" to another API.
Thanks
Using the HttpClient you open a stream to the file, create a content stream, set the necessary headers and post to the endpoint
Stream file = File.Open(filepath, FileMode.Open);
var content = new StreamContent(file);
content.Headers.ContentType = new MediaTypeHeaderValue("video/MP4");
client.DefaultRequestHeaders.Add("Authorization", "token here");
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json");
using (HttpResponseMessage response = await client.PostAsync(url, content)) {
//...
}

How to mock a multipart file upload when using Spring and Apache File Upload

The project I'm working on needs to support large file uploads and know the time taken during their upload.
To handle the large files I'm using the streaming API of Apache FileUpload, this also allows me to measure the time taken for the complete stream to be saved.
The problem I'm having is that I cannot seem to be able to utilise MockMvc in an Integration Test on this controller. I know that the controller works as I've successfully uploaded files using postman.
Simplified Controller Code:
#PostMapping("/upload")
public String handleUpload(HttpServletRequest request) throws Exception {
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterStream = upload.getItemIterator(request);
while (iterStream.hasNext()) {
FileItemStream item = iterStream.next();
String name = item.getFieldName();
InputStream stream = item.openStream();
if (!item.isFormField()) {
// Process the InputStream
} else {
String formFieldValue = Streams.asString(stream);
}
}
}
Simplified Test Code:
private fun uploadFile(tfr: TestFileContainer) {
val mockFile = MockMultipartFile("file", tfr.getData()) // .getData*() returns a ByteArray
val receiveFileRequest = MockMvcRequestBuilders.multipart("/upload")
.file(mockFile)
.contentType(MediaType.MULTIPART_FORM_DATA)
val result = mockMvc.perform(receiveFileRequest)
.andExpect(status().isCreated)
.andExpect(header().exists(LOCATION))
.andReturn(
}
This is the error I'm currently getting
org.apache.tomcat.util.http.fileupload.FileUploadException: the
request was rejected because no multipart boundary was found
Can anyone help?
The MockMultipartFile approach won't work as Spring does work behind the scenes and simply passes the file around.
Ended up using RestTemplate instead as it actually constructs requests.

stream was reset: PROTOCOL_ERROR

I've written a web API for sending and receiving data between a server and an app I'm working on.
It works fine for everything, but I've now found it won't let me send large strings. The strings are base64 strings which represent images, typically around 100kb in size.
The method I'm attempting is a multipart post, breaking the base64 string into chunks that can be sent successfully.
When I use this method, I get an error:
stream was reset: PROTOCOL_ERROR
Upon checking the database it seems the first string chunk sends successfully, but nothing more after that.
Can anyone shed some light on what's causing this to happen?
Relevant code is here:
First is the process for breaking the image into manageable chunks and posting it:
HTTPRequest req = new HTTPRequest();
IEnumerable<string> imgStrSplit = Split(img1byteArrayStr, 1000);
foreach (string s in imgStrSplit)
{
response = await req.SubmitImage("Test", s, "1");
}
And below is the SubmitImage() method of the HTTPRequest class:
public async Task<int> SubmitImage(string name, string imageString, string imgNum)
{
using (System.Net.Http.HttpClient client = new System.Net.Http.HttpClient(new NativeMessageHandler()))
{
client.DefaultRequestHeaders.Add("Accept", "application/json");
int successResponse;
string address = $"https://myURL/SubmitImage?name=" + name + "&imageStr=" + imageString + "&imgNum=" + imgNum + "&curl=AYZYBAYZE143";
HttpResponseMessage response = await client.GetAsync(address);
successResponse = JsonConvert.DeserializeObject<int>(response.Content.ReadAsStringAsync().Result);
return successResponse;
}
Thanks.
Found a solution via this discussion.
The problem was in the fact that I was using the ModernHTTPClint NuGet package, as seen in this line of the SubmitImage method:
using (System.Net.Http.HttpClient client = new System.Net.Http.HttpClient(new NativeMessageHandler()))
The NativeMessageHandler() is said to make connections work much faster, but apparently in some cases results in the PROTOCOL_ERROR I had encountered. By removing this part, I fixed the problem.

How to cache a InputStreamResource In RestController?

I have a servlet that returns an image as InputStreamResource. There are approx 50 static images that are to be returned based on some get query parameters.
For not having to look up each of those images every time it is requested (which is very often), I'd like to cache those images responses.
#RestController
public class MyRestController {
//code is just example; may be any number of parameters
#RequestMapping("/{code}")
#Cachable("code.cache")
public ResponseEntity<InputStreamResource> getCodeLogo(#PathVariable("code") String code) {
FileSystemResource file = new FileSystemResource("d:/images/" + code + ".jpg");
return ResponseEntity.ok()
.contentType("image/jpg")
.lastModified(file.lastModified())
.contentLength(file.contentLength())
.body(new InputStreamResource(file.getInputStream()));
}
}
When using the #Cacheable annotation (no matter if directly on the RestMapping method or refactored to an external service), I_'m getting the following exception:
cause: java.lang.IllegalStateException: InputStream has already been read - do not use InputStreamResource if a stream needs to be read multiple times - error: InputStream has already been read - do not use InputStreamResource if a stream needs to be read multiple times
org.springframework.core.io.InputStreamResource.getInputStream(InputStreamResource.java:96)
org.springframework.http.converter.ResourceHttpMessageConverter.writeInternal(ResourceHttpMessageConverter.java:100)
org.springframework.http.converter.ResourceHttpMessageConverter.writeInternal(ResourceHttpMessageConverter.java:47)
org.springframework.http.converter.AbstractHttpMessageConverter.write(AbstractHttpMessageConverter.java:195)
org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.writeWithMessageConverters(AbstractMessageConverterMethodProcessor.java:238)
org.springframework.web.servlet.mvc.method.annotation.HttpEntityMethodProcessor.handleReturnValue(HttpEntityMethodProcessor.java:183)
org.springframework.web.method.support.HandlerMethodReturnValueHandlerComposite.handleReturnValue(HandlerMethodReturnValueHandlerComposite.java:81)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:126)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:743)
Question: how can I then cache the ResponseEntity of type InputStreamResource at all?
Cache manager will add to cache ResponseEntity with InputStreamResource inside of it. First time it will be ok. But when cached ResponseEntity will try to read InputStreamResouce second time you'll get exception, because it is unable to read stream more than one time.
Solution: don't cache InputStreamResouce itself, but cache the content of stream.
#RestController
public class MyRestController {
#RequestMapping("/{code}")
#Cachable("code.cache")
public ResponseEntity<byte[]> getCodeLogo(#PathVariable("code") String code) {
FileSystemResource file = new FileSystemResource("d:/images/" + code + ".jpg");
byte [] content = new byte[(int)file.contentLength()];
IOUtils.read(file.getInputStream(), content);
return ResponseEntity.ok()
.contentType(MediaType.IMAGE_JPEG)
.lastModified(file.lastModified())
.contentLength(file.contentLength())
.body(content);
}
}
I've used IOUtils.read() from org.apache.commons.io, to copy bytes from stream to array, but you can do it by any preferred way.
You can't cache Streams. Once they are read, they are gone.
The error message is pretty clear about that:
InputStream has already been read -
do not use InputStreamResource if a stream needs to be read multiple times
By your code and comments, it seems to me that you have a big images folder with JPG logos (which might be added, deleted or modified), and you want to have a daily cache of the one's you're being asked for, so you don't have to constantly reload them from disk.
If that's the case, your best option is to read the File's content to a ByteArray and cache/return that instead.

Storing a (possibly large) file between requests in Spring

I have this controller methods that depending on the parameters introduced by the user downloads a certain PDF file and shows a view with its different pages converted to PNG.
So the way I approached it works like this:
First I map a method to receive the post data sent by the user, then generate the URL of the actual PDF converter and pass it to the model:
#RequestMapping(method = RequestMethod.POST)
public String formPost(Model model, HttpServletRequest request) {
//Gather parameters and generate PDF url
Long idPdf = Long.parseLong(request.getParam("idPdf"));
//feed the jsp the url of the to-be-generated image
model.addAttribute("image", "getImage?idPdf=" + idPdf);
}
Then in getImageMethod I download the PDF and then generate a PNG out of it:
#RequestMapping("/getImage")
public HttpEntity<byte[]> getPdfToImage(#RequestParam Long idPdf) {
String url = "myPDFrepository?idPDF=" + idPdf;
URL urlUrl = new URL(url);
URLConnection urlConnection;
urlConnection = urlUrl.openConnection();
InputStream is = urlConnection.getInputStream();
return PDFtoPNGConverter.convert(is);
}
My JSP just has an img tag that refers to this url:
<img src="${image}" />
So far this work perfectly. But now I need to allow the possibility of viewing multi page PDFs, converted as PNGS, each of them in a different page. So I would add a page parameter, then feed my model with the image url including that page parameter, and in my getImage method I would convert only that page.
But the way it is implemented, I would be downloading the PDF again for each page, plus an additional time for the view, so it can find out whether this specific PDF has more pages and then show the "prev" and "next" buttons.
What would be a good way to preserve the same file during these requests, so I download it just once? I thought about using temp files but then managing its deletion might be a problem. So maybe storing the PDF in the session would be a good solution? I don't even know if this is good practice or not.
I am using Spring MVC by the way.
I think the simplest way would be using spring cache abstraction. Look at tutorial and will need to change your code a little: move logic that load pdf to separate class.
it will looks like:
interface PDFRepository {
byte[] getImage(long id);
}
#Repository
public class PDFRepositoryImpl implements PDFRepository {
#Cacheable
public byte[] getImage(long id) {
String url = "myPDFrepository?idPDF=" + idPdf;
URL urlUrl = new URL(url);
URLConnection urlConnection;
urlConnection = urlUrl.openConnection();
InputStream is = urlConnection.getInputStream();
return PDFtoPNGConverter.convert(is);
}
}
You will get pluggable cache implementation support and good cache expiration management.

Resources