Client side checking before mapping to resource/ url mapping - jersey

I have created a API which is used to upload file on AWS S3.
I want to restrict the file size to 10MB.
The following is my API.
#POST
#Path("/upload")
#Produces("application/json")
public Response kafkaBulkProducer(InputStream a_fileInputStream) throws IOException {
// UPLOADING LOGIC
}
As much as I understand when a request is made to my API the data/InputStream is loaded on my server.
This is consuming resources (connection etc.).
Is there any way by which I can identity the file size before the URL is mapped or resource mapping is done, so that if my file size is greater than 10MB I will not allow it to reach my server.
I think I can work with pre-filter. But my biggest concern and question is when the API is called, will the stream data will be stored on my server first ?
Do the pre-matching filter will help, so the the data will not be stored on my server in case if its size is greater than 10MB.
Basically I don't want to store data on my server then check the size and then upload to s3.
I want a solution where I can check the file size before loading to server and then I can upload to S3.
This API I will be using with curl.
How can I do this.
I hope I am clear about my question.
Thank you

Related

How to deal with 6 megabytes limit in AWS lambda services?

I have a three-tier app running in AWS. The midware was written in Python Flask and stored on a Linux machine.
However, they asked me to move to AWS Lambda services. There is a limit of 6 M for the returning data function. As I´m dealing with GEOJson, sometimes it´s necessary to return up to 15 M.
Despite the AWS lambda stuff is stateless, I could provide some way to return data partitioned, but it´s problematic I think it will be necessary to generate the whole map again and again until I could fulfill all data.
Is there a better way to deal with this? I´m programming in Python.
I'd handle this by sending the data to S3, and issuing a redirect or JSON response that points to the URL on S3 (with a temporary, expiring URL if the data should be secure). If the data's long-lived, you can just leave it there; if not, you could use S3's lifecycle rules to have the files automatically delete after 24 hours or so.
If you have control of the client too that receives those data, you can send a compressed result that is then uncompressed client side. So you'll be able to send that 15MB response too, which can become really small when compressed.
Or you can send a fragment of the whole response with a token or something indicating the client that the response is not complete. Than the client will make another request with that token to get the next fragment, and so on until there are no more fragments. At this point the client can join all fragments to get the full response.
Speaking of the 6MB limit, I hope that at some point we will have the ability to set what is the max payload size. since 6MB is fine for most cases, but not ALL cases
You can use presigned S3 URL to upload, using this there will be no bound by payload size.
Get HTTP GET request to API Gateway, then Lambda function get generate presigned URL and return it presigned S3 URL.
Then client directly update content to s3 using pre-signed s3 URL.
https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html

Transfer file among microservices

I have a chain of Microservices (Spring boot/cloud)
UI allows user to download file from file storage, but response returns back throw all microservices. I dont want to download file on each microservice and upload it to next one when response.(I dont want to store in memory, it will cause OutOfmemory error)
Is it possible to return some stream?
Thanks
I would pass back a file reference only (like a url) and only when you need it retrieve the actual file.
So if the Client UI requires an actual file from MicroService 1 I would pass the reference back to MicroService 1 and let that service get the file content and send it to the client.
If the client can resolve a URL/reference itself you could even do with just returning that to the client and then letting the client retrieve the file.
Either way you want to minimize the moving/loading of the file and basically do this at the last possible moment.

Is it possible to use Amazon S3 Multipart Upload API in AJAX

I'm reading the official doc for Amazon S3 Multipart Upload REST API, I'm wondering whether it's possible to use this API via AJAX.
The reason I'm asking is I try to upload a very large file (>5GB) from my browser to S3 bucket, I know there's a S3 upload API for Javascript and a way of leveraging AJAX to upload file to S3 but none of these address the large file issue.
The reason for not supporting MultiPart upload API using AJAX that I can think of is browser is not able to split the local file but I want to make sure it's really the case.
Does anyone around here ever used multipart upload api in AJAX or if it's impossible doing that, how do people usually deal with large file upload from browser?
Lots of thanks in advance!
I don't think it's necessary for you to use the Rest API for this. The s3.upload() method used in the javascript example you linked does support multipart uploads for large files according to the following AWS blog post: Announcing the Amazon S3 Managed Uploader in the AWS SDK for JavaScript. A browser example is included, although it uses bucket.upload rather than s3.upload. It also includes examples of tracking progress, configuring concurrency and part size and handling failures.
It does say with respect to browser uploads that "In order to support large file uploads in the browser, you must ensure that your CORS configuration exposes the ETag header; otherwise, your multipart uploads will not succeed. See the guide for more information on how to expose this header."
Possibly the CORS configuration may also need to allow more methods than listed in the 'Configuring CORS' section of the example you linked.

Send entire Blob data to web hook endpoint

Am storing all the xml files in azure blob storage. The file data is huge.The Blob events (Created) payload only contains the information about file properties and not the actual data.
Can you please suggest a way to push the entire blob data to a webhook endpoint automatically for each creation of blob.
Is it ideal to send the entire data to webhook as my data is very large ?

Is permanent storage of video metadata against the YouTube Data API ToS?

The YouTube Data API ToS says:
Your API Client may employ session-based caching solely of YouTube API results, but You must use commercially reasonable efforts to cause Your API Client to update cached results upon any changes in video metadata. For example, if a video is removed from the YouTube service or made "private" by the video uploader, cached results shall be removed from Your cache. For the avoidance of doubt, Your API Client shall not be designed to cache YouTube audiovisual content.
The YouTube Data API overview also says:
Your application can cache API resources and their ETags. Then, when your application requests a stored resource again, it specifies the ETag associated with that resource. If the resource has changed, the API returns the modified resource and the ETag associated with that version of the resource. If the resource has not changed, the API returns an HTTP 304 response (Not Modified), which indicates that the resource has not changed. Your application can reduce latency and bandwidth usage by serving cached resources in this manner.
This does not mean that any time I wish to request a resource I must go back to the YouTube Data API, correct?
The only data from the API I'm interested in storing in my database is
ETag
Video id
Duration
Title
Is it okay for me to store these four items in my database (given I update the information reasonably regularly)?
I'm not interested in storing any portion of the actual video whatsoever. Just the metadata.

Resources