How paperclip will process multiple files upload at a time? - ruby

Suppose, i have uploaded 5 files and after some time due to network bandwidth issue it has thrown error.
So in that case, Is my all 5 files upload failed? Infact, I want to know paperclip internal process for
multiple images upload.
Is that sequential order? Or all files at one single stream?
Could you please explain me? If any one have idea about it. Thanks!

The file transportation mechanism for file uploads to the web server is the http multipart request. paperclip would not be used until the server finishes processing this request.
paperclip is not the transportation mechanism, it is a gem to (in small words) handle file data and storage whilst providing helpers to be used on the back-end of your rails application.
When uploading a file or multiple files in the same http request, if the http request fails the webserver halts the transaction, and that happens before any interaction with rails controllers.
An alternative for you would be to process the multiple file uploads separately on the front-end of your application, but that is a separated issue and I recommend you to do some research if you would like to go this path.

Related

Transfer file takes too much time

I have an empty API in laravel code with nginx & apache server. Now the problem is that the API takes a lot of time if I try with different files and the API responds quickly if I try with blank data.
Case 1 : I called the API with a blank request, that time response time will be only 228ms.
Case 2 : I called the API with a 5MB file request, then file transfer taking too much time. that's why response time will be too long that is 15.58s.
So how can we reduce transfer start time in apache or nginx server, Is there any server configuration or any other things that i missed up ?
When I searched on google it said keep all your versions up-to-date and use php-fpm, but when I configure php-fpm and http2 protocol on my server I noticed that it takes more time than above. All server versions are up-to-date with the current version.
This has more to do with the fact one request has nothing to process so the response will be prompt, whereas, the other request requires actual processing and so a response will take as long as the server requires to process the content of your request.
Depending on the size of the file and your server configuration, you might hit a limit which will result in a timeout response.
A solution to the issue you're encountering is to chunk your file upload. There are a few packages available so that you don't have to write that functionality yourself, an example of such a package is the Pionl Laravel Chunk Upload.
An alternative solution would be to offload the file processing to a Queue.
Update
When I searched on google about chunking it's not best solution for
small file like 5-10 MB. It's a best solution for big files like
50-100 MB. So is there any server side chunking configuration or any
other things or can i use this library to chunking a small files ?
According to the library document this is a web library. What should I
use if my API is calling from Android and iOS apps?
True, chunking might not be the best solution for smaller files but it is worthwhile knowing about. My recommendation would be to use some client-side logic to determine if sending the file in chunks is required. On the server use a Queue to process the file upload in the background allowing the request to continue processing without waiting on the upload and a response to be sent back to the client (iOS/Android app) in a timely manner.

How to handle long time processing request

I have a spring boot rest API deployed on AWS Elastic Beanstalk and I am trying to upload pictures through it.
This is what I did : Upload a zip file through a file input from the browser, get the zip file on the server, go through all the files and upload each one on AWS S3.
It works fine but I ran into a problem: When I try to upload lots of pictures, I get an HTTP error (504 Gateway Timeout). I found out this is because the server takes too much time to respond, and I am trying to figure how to set a higher timeout for the requests (didn't find yet).
But in the mean time I am asking myself if it is the best solution.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ? Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
I know how to do the process asynchronously but I would really like to know how to notify the user after it completes.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ?
Yes, asynchronous processing of the uploaded images in the zip file would be better.
Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
Yes there is a better way. To keep everything within EB, you could look at Elastic Beanstalk worker environment. The worker environment is ideal for processing your images.
In this solution, your web based environment would store the images uploaded in S3 and submit it names along with other identifying information to an SQS queue. The queue is an entry point for the worker environment.
Your workers would process the images from the queue independently from the web environment. In the meantime, the web environment would have to check for the results and notify your users once the images get processed.
The EB also supports linking different environments. Thus you could establish a link between web and worker environments for easier integration.

'Fire & Forget' call from Sinatra

I am writing an end-point using Sinatra where I will be receiving raw pdfs from the client and need to process the pdf for internal use. Now the pdf processing takes a while and I do not necessarily want client to wait till the processing is finished and risking a timeout (504). Instead the would like to invoke another method that handles pdf processing while I respond back to the client with appropriate code. What is the best way to implement that using Sinatra?
So there's a few parts to this, so let me break down the various steps that are going to happen:
Client uploads a PDF file: depending on the size of the PDF and the speed of their connection, this could take a while. While you're waiting for the upload your web process is busy receiving the data and is unable to process any other requests for any other clients.
You then need to process the uploaded file, store it somewhere, possibly manipulate it somehow. If you do all that as part of the request process then there is yet more time you're tied up dealing with this one request and unable to serve other clients.
The typical way to solve the latter of those problems, manipulating or processing an uploaded asset, is to use a background job queue such as Sidekiq (http://sidekiq.org). You store the required data somewhere, keep enough information to know what to work on (e.g., the database ID of a model that has stored the relevant information, a filename, etc.), and then pass all of that required information into a background job. You then have separate worker processes that pick up that work and complete it, but they aren't part of your web process so they aren't blocking other clients from receiving information.
This still leaves us with the problem of handling large uploads, fortunately that has a solution too. Take advantage of all of the web capacity Amazon has and have the clients upload the file direct to S3, when it's complete they can post just the filename to you, and you can then queue that up into your worker from the previous step and have it all happen in the background. This blog post has a good explanation of how to wire it together using Paperclip http://blog.littleblimp.com/post/53942611764/direct-uploads-to-s3-with-rails-paperclip-and

How Would I Serve Thousands of Files per Request

I am working on an application where the user has the potential to download thousands of files in one request into a zip file. Obviously, this will not be practical for our server. What would be the best way to go about serving up thousands of files to users?
Right now, what I have been working on is just have the jquery fileDownload library make a request for 100 files, then in the success handler call the fileDownload again for another 100 files offset by 100. The problem with this is that the fileDownload library (or the server) waits about 20 seconds until the fileDownload fail callback is called.
The other problem with this method is it isn't practical for the client to receive hundreds of pop windows asking them if they want to download 100 files.
We also won't be able to send back thousands of files in the response because our server doesn't and won't have that much memory.
This is purely opinion based on my experience but two options i have seen in use:
Option 1:
Batch process files, compress, then advise user of download location. This should be limited number of files and size tho as it can burn out the server resources. I don't recommend this if you have large number of users.
Option 2 (Best):
Batch process files into compressed file, then either enable uses to FTP into the location to obtain the files, or if your users have FTP location, have the file transfered over to the FTP location. I can tell you definitely this is most effective and is used by number of corporations i have been invovled with.

Heroku - letting users download files from tmp

Let me start by saying I understand that heroku's dynos are temporary and unreliable. I only need them to persist for at most 5 minutes, and from what I've read that generally won't be an issue.
I am making a tool that gathers files from websites and zips the up for download. My tool does everything and creates the zip - I'm just stuck at the last part: providing the user with a way to download the file. I've tried direct links to the file location, and http GET requests, and Heroku didn't like either. I really don't want to have to set up AWS just to host a file that only needs to persist for a couple of minutes.. Is there another way to download files stored on /tmp?
As far as I know, you have absolutely no guarantee that a request goes to the same dyno as the previous request.
The best way to do this would probably be to either host the file somewhere else, like S3, or to send it immediately in the same request.
If you're generating the file in a background worker, then it most definitely won't work. Every process runs on a separate dyno.
See How Heroku Works for more information on their backend.

Resources