I have a spring boot rest API deployed on AWS Elastic Beanstalk and I am trying to upload pictures through it.
This is what I did : Upload a zip file through a file input from the browser, get the zip file on the server, go through all the files and upload each one on AWS S3.
It works fine but I ran into a problem: When I try to upload lots of pictures, I get an HTTP error (504 Gateway Timeout). I found out this is because the server takes too much time to respond, and I am trying to figure how to set a higher timeout for the requests (didn't find yet).
But in the mean time I am asking myself if it is the best solution.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ? Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
I know how to do the process asynchronously but I would really like to know how to notify the user after it completes.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ?
Yes, asynchronous processing of the uploaded images in the zip file would be better.
Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
Yes there is a better way. To keep everything within EB, you could look at Elastic Beanstalk worker environment. The worker environment is ideal for processing your images.
In this solution, your web based environment would store the images uploaded in S3 and submit it names along with other identifying information to an SQS queue. The queue is an entry point for the worker environment.
Your workers would process the images from the queue independently from the web environment. In the meantime, the web environment would have to check for the results and notify your users once the images get processed.
The EB also supports linking different environments. Thus you could establish a link between web and worker environments for easier integration.
Related
I have an empty API in laravel code with nginx & apache server. Now the problem is that the API takes a lot of time if I try with different files and the API responds quickly if I try with blank data.
Case 1 : I called the API with a blank request, that time response time will be only 228ms.
Case 2 : I called the API with a 5MB file request, then file transfer taking too much time. that's why response time will be too long that is 15.58s.
So how can we reduce transfer start time in apache or nginx server, Is there any server configuration or any other things that i missed up ?
When I searched on google it said keep all your versions up-to-date and use php-fpm, but when I configure php-fpm and http2 protocol on my server I noticed that it takes more time than above. All server versions are up-to-date with the current version.
This has more to do with the fact one request has nothing to process so the response will be prompt, whereas, the other request requires actual processing and so a response will take as long as the server requires to process the content of your request.
Depending on the size of the file and your server configuration, you might hit a limit which will result in a timeout response.
A solution to the issue you're encountering is to chunk your file upload. There are a few packages available so that you don't have to write that functionality yourself, an example of such a package is the Pionl Laravel Chunk Upload.
An alternative solution would be to offload the file processing to a Queue.
Update
When I searched on google about chunking it's not best solution for
small file like 5-10 MB. It's a best solution for big files like
50-100 MB. So is there any server side chunking configuration or any
other things or can i use this library to chunking a small files ?
According to the library document this is a web library. What should I
use if my API is calling from Android and iOS apps?
True, chunking might not be the best solution for smaller files but it is worthwhile knowing about. My recommendation would be to use some client-side logic to determine if sending the file in chunks is required. On the server use a Queue to process the file upload in the background allowing the request to continue processing without waiting on the upload and a response to be sent back to the client (iOS/Android app) in a timely manner.
Context
I have a web dyno which receives a video file with the intention of applying a computer vision algorithm to return an analysis. The algorithm takes about 10 seconds to run. My current method is to process it with the web dyno. The whole thing is pretty fast. The user doesn't have to wait any more than a minute.
What's not working
But of course, tying up the web dyno is a bad idea. And some users have gotten timeouts... So I tried implementing redis to pass the job to a worker dyno.
#application.route('/video', methods=['POST'])
#cross_origin()
def video():
video_file = request.files['videoData']
job = q.enqueue_call(
func=run_analysis, args=(video_file), result_ttl=5000
)
return json.dumps(job.id)
But this gives me an error
TypeError: cannot serialize '_io.BufferedRandom' object and I understand why.
In my dev env, I can save the video to the filesystem and pass the file path only, but this doesn't work in production as the web dyno's file system is ephemeral and the worker won't see the file.
So I'm looking for the fastest way to get the video file across. Speed is of the essence here, as the user is waiting for their video to be processed.
What I've tried
I've tried S3 (uploading direct from client and downloading in the worker) but it made the whole process way slower. First of all, it takes longer to upload to S3 than to my Heroku endpoint. Second of all, I have to then download it to the worker which takes a while as well. I don't really need to keep the file, so it's a very inefficient work around.
Heroku dynos are completely isolated containers, that is why they cannot share file system as you want. But if you'll host them on another hosting like DigitalOcean or Amazon, you'll be able to access files you stored by Flask from the other workers instantly (or almost instantly - don't forget to create a copy of temp-file, Flask or WSGI should delete it after response sent).
Another option is to find the fastest way of "transporting" video-data (not always file) to a worker. You can do it using:
queue - put whole file to a queue. Method is not recommended, but still ok, if video files are really small.
in-memory database - save file to some in-memory database (e.g. Redis). They have a lot of mechanisms to quickly transport data between servers or processes (cons: expensive on Heroku)
database - save file to a general purpose database (e.g. Postgresql), that will do the same as Redis, but it is able to work with bigger data cheaper, though a bit slower.
WebSockets or even Unix-socket - you can have one worker to which Flask can connect to, send file and return http-response. That "listener" will actually start task. It can either save video to a file and provide path to еру next worker (but it should always be on the same dyno as the rest workers are) or provide data directly using args, "fork", threading, subprocessing, etc.
You can start with Flask-SocketIO.
But remember, that you need to configure server-to-server connection - between webapp and worker that should list it in loop or separate thread. Not Javascript in browser, which potentially is also option - start task and upload file directly to worker.
P.S. there are no 30sec. timeouts on Heroku for WebSockets :)
Suppose, i have uploaded 5 files and after some time due to network bandwidth issue it has thrown error.
So in that case, Is my all 5 files upload failed? Infact, I want to know paperclip internal process for
multiple images upload.
Is that sequential order? Or all files at one single stream?
Could you please explain me? If any one have idea about it. Thanks!
The file transportation mechanism for file uploads to the web server is the http multipart request. paperclip would not be used until the server finishes processing this request.
paperclip is not the transportation mechanism, it is a gem to (in small words) handle file data and storage whilst providing helpers to be used on the back-end of your rails application.
When uploading a file or multiple files in the same http request, if the http request fails the webserver halts the transaction, and that happens before any interaction with rails controllers.
An alternative for you would be to process the multiple file uploads separately on the front-end of your application, but that is a separated issue and I recommend you to do some research if you would like to go this path.
I am writing an end-point using Sinatra where I will be receiving raw pdfs from the client and need to process the pdf for internal use. Now the pdf processing takes a while and I do not necessarily want client to wait till the processing is finished and risking a timeout (504). Instead the would like to invoke another method that handles pdf processing while I respond back to the client with appropriate code. What is the best way to implement that using Sinatra?
So there's a few parts to this, so let me break down the various steps that are going to happen:
Client uploads a PDF file: depending on the size of the PDF and the speed of their connection, this could take a while. While you're waiting for the upload your web process is busy receiving the data and is unable to process any other requests for any other clients.
You then need to process the uploaded file, store it somewhere, possibly manipulate it somehow. If you do all that as part of the request process then there is yet more time you're tied up dealing with this one request and unable to serve other clients.
The typical way to solve the latter of those problems, manipulating or processing an uploaded asset, is to use a background job queue such as Sidekiq (http://sidekiq.org). You store the required data somewhere, keep enough information to know what to work on (e.g., the database ID of a model that has stored the relevant information, a filename, etc.), and then pass all of that required information into a background job. You then have separate worker processes that pick up that work and complete it, but they aren't part of your web process so they aren't blocking other clients from receiving information.
This still leaves us with the problem of handling large uploads, fortunately that has a solution too. Take advantage of all of the web capacity Amazon has and have the clients upload the file direct to S3, when it's complete they can post just the filename to you, and you can then queue that up into your worker from the previous step and have it all happen in the background. This blog post has a good explanation of how to wire it together using Paperclip http://blog.littleblimp.com/post/53942611764/direct-uploads-to-s3-with-rails-paperclip-and
Let me start by saying I understand that heroku's dynos are temporary and unreliable. I only need them to persist for at most 5 minutes, and from what I've read that generally won't be an issue.
I am making a tool that gathers files from websites and zips the up for download. My tool does everything and creates the zip - I'm just stuck at the last part: providing the user with a way to download the file. I've tried direct links to the file location, and http GET requests, and Heroku didn't like either. I really don't want to have to set up AWS just to host a file that only needs to persist for a couple of minutes.. Is there another way to download files stored on /tmp?
As far as I know, you have absolutely no guarantee that a request goes to the same dyno as the previous request.
The best way to do this would probably be to either host the file somewhere else, like S3, or to send it immediately in the same request.
If you're generating the file in a background worker, then it most definitely won't work. Every process runs on a separate dyno.
See How Heroku Works for more information on their backend.