How best to queue file uploads to S3 in a multi server environment? - laravel

In short, my API will accept file uploads. I (ultimately) store them in S3, but to save uploading them to S3 on the same request, I queue the upload process and do it in the background.
I was originally storing the file on the server, and in my job, I was queueing the file path, and then grabbing the contents with that file path on the server, and then sending to S3.
I develop/stage on a single server. My production environment will sit behind a load balancer, with 2-3 servers. I realised that my jobs will fail 2/3 of the time as the file that I am linking to in my job may be on a different server and not on the server running the job.
I realised I could just base64_encode the file contents, and just store that in Redis (as opposed to just storing the path of the file). Using the following:
$contents = base64_encode(file_get_contents($file));
UploadFileToCloud::dispatch($filePath, $contents, $lead)->onQueue('s3-uploads');
I have quite a large Redis store, so I am confident I can do this for lots of small files (most likely in my case), but some files can be quite large.
I am starting to have concerns that I may run into issues using this method, most likely issues to do with my Redis store running out of memory.
I have thought about a shared drive between all my instances and revert back to my original method of storing the file path, but unsure.
Another issue I have is if a file upload fails, if it's a big file, can the failed_jobs table handle the amount of data (for example) of a base64 encoded 20mb pdf.
Is base64 encoding the file contents and queuing that the best method? Or, can anyone recommend an alternative means to queue uploading a file upload in a multi server environment?

Related

Laravel Lumen directly Download and Extract ZIP file to Google Cloud Storage

My goal is to download a large zip file (15 GB) and extract it to Google Cloud using Laravel Storage (https://laravel.com/docs/8.x/filesystem) and https://github.com/spatie/laravel-google-cloud-storage.
My "wish" is to sort of stream the file to Cloud Storage, so I do not need to store the file locally on my server (because it is running in multiple instances, and I want to have the disk size as small as possible).
Currently, there does not seem to be a way to do this without having to save the zip file on the server. Which is not ideal in my situation.
Another idea is to use a Google Cloud Function (eg with Python) to download, extract and store the file. However, it seems like Google Cloud Functions are limited to a max timeout of 9 mins (540 seconds). I don't think that will be enough time to download and extract 15GB...
Any ideas on how to approach this?
You should be able to use streams for uploading big files. Here’s the example code to achieve it:
$disk = Storage::disk('gcs');
$disk->put($destFile, fopen($sourceZipFile, 'r+'));

Heroku - Redis memory > 'maxmemory' with a 20MB file in hobby:dev where it should be 25MB

So I am trying to upload a file with Celery that uses Redis on my Heroku website. I am trying to upload a .exe type file with the size of 20MB. Heroku is saying in they're hobby: dev section that the max memory that could be uploaded is 25MB. But I, who is trying to upload a file in Celery(turning it from bytes to base64, decoding it and sending it to the function) is getting kombu.exceptions.OperationalError: OOM command not allowed when used memory > 'maxmemory'. error. Keep in mind when I try to upload for e.g a 5MB file it works fine. But 20MB doesn't. I am using Python with the Flask framework
There are two ways to store files in DB (Redis is just an in-memory DB). You can either store a blob in the DB (for small files, say a few KBs), or you can store the file in memory and store a pointer to the file in DB.
So for your case, store the file on disk and place only the file pointer in the DB.
The catch here is that Heroku has a Ephemeral file system that gets erased every 24 hours, or whenever you deploy new version of the app.
So you'll have to do something like this:
Write a small function to store the file on the local disk (this is temporary storage) and return the path to the file
Add a task to Celery with the file path i.e. the parameter to the Celery task will be the "file-path" not a serialized blob of 20MB data.
The Celery worker process picks the task you just enqueued when it gets free and executes it.
If you need to access the file later, and since the local heroku disk only has temporary, you'll have to place the file in some permanent storage like AWS S3.
(The reason we go through all these hoops and not place the file directly in S3 is because access to local disk is fast while S3 disks might be in some other server farm at some other location and it takes time to save the file there. And your web process might appear slow/stuck if you try to write the file to S3 in your main process.)

Spring batch job start processing file not fully uploaded to the SFTP server

I have a spring-batch job scanning the SFTP server at a given interval. When it finds a new file, it starts the processing.
It works fine for most cases, but there is one case when it doesn't work:
User starts uploading a new file to the SFTP server
Batch job checks the server and finds a new file
It start processing it
But since the file is still being uploaded, during the processing it encounters unexpected end of input block, and the error occurs.
How can I check that file was fully uploaded to the SFTP server before batch job processing starts?
Locking files while uploading / Upload to temporary file name
You may have an automated system monitoring a remote folder and you want to prevent it from accidentally picking a file that has not finished uploading yet. As majority of SFTP and FTP servers (WebDAV being an exception) do not support file locking, you need to prevent the automated system from picking the file otherwise.
Common workarounds are:
Upload “done” file once an upload of data files finishes and have
the automated system wait for the “done” file before processing the
data files. This is easy solution, but won’t work in multi-user
environment.
Upload data files to temporary (“upload”) folder and move them atomically to target folder once the upload finishes.
Upload data files to distinct temporary name, e.g. with .filepart extension, and rename them atomically once the upload finishes. Have the automated system ignore the .filepart files.
Got from here
We had similar problem, Our solution was, we configured spring-batch cron trigger to trigger the job every 10min(though we could configure for 5min, as file transfer was taking less than 3min), then we read/process all the files created prior to 10 minutes. We assume the FTP operation completes within 3 minutes. This gave us some additional flexibility such as when spring-batch app was down etc.
For example if the batch job triggered at 10:20AM we read all the files that were created before 10:10AM, like-wise job that runs at 10:30, reads all the files created before 10:20.
Note: Once Read you need to either delete or move to history folder for duplicate reads.

Golang file and folder replication / mirroring across multiple servers

Consider this scenario. In a load-balanced environment, I have 3 separate instances of a CMS running on 3 different physical servers. These 3 separate running instances of the application is sharing the same database.
On each server, the CMS has a /media folder where all media subfolders and files reside. My question is how I'd implement/code a file replication service/functionality in Golang, so when a subfolder or file is added/changed/deleted on one of the servers, it'll get copied/replicated/deleted on all other servers?
What packages would I need to look in to, or perhaps you have a small code snippet to help me get started? That would be awesome.
Edit:
This question has been marked as "duplicate", but it is not. It is however an alternative to setting up a shared network file system. I'm thinking that keeping a copy of the same file on all servers, synchronizing and keeping them updated might be better than sharing them.
You probably shouldn't do this. Use a distributed file system, object storage (ala S3 or GCS) or a syncing program like btsync or syncthing.
If you still want to do this yourself, it will be challenging. You are basically building a distributed database and they are difficult to get right.
At first blush you could checkout something like etcd or raft, but unfortunately etcd doesn't work well with large files.
You could, on upload, also copy the file to every other server using ssh. But then what happens when a server goes down? Or what happens when two people update the same file at the same time?
Maybe you could design it such that every file gets a unique id (perhaps based on the hash of its contents so you can safely dedupe) and those files can never be updated or deleted, only added. That would solve the simultaneous update problem, but you'd still have the downtime problem.
One approach would be for each server to maintain an append-only version log when a file is added:
VERSION | FILE HASH
1 | abcd123
2 | efgh456
3 | ijkl789
With that you can pull every file from a server and a single number would be sufficient to know when a file is added. (For example if you think Server A is on version 5, and you get informed it is now on version 7, you know you need to sync 2 files)
You could do this with a database table:
ID | LOCAL_SERVER_ID | REMOTE_SERVER_ID | VERSION | FILE HASH
Which you could periodically poll and do your syncing via ssh or http between machines. If a server was down you could just retry until it works.
Or if you didn't want to have a centralized database for this you could use a library like memberlist. The local meta data for each node could be its version.
Either way there will be some amount of delay between a file was uploaded to a single server, and when it's available on all of them. Handling that well is hard, which is why you probably shouldn't do this.

WP 7 Isolated Storage

In my WP 7 App, i have to store the images and XML file of two types,
1: first type of files are not updated frequently on server so i want to store them Permanently on local storage so that when ever app starts it can access these files from local storage , and when these files are updated on server , also update local storage files.I want these files not to be deleted on application termination.
2: Second type of files are those that i want to save in isolated storage temporarily e.g. app requested a XML file from server , i stored it locally and next time if app requests same file instead of getting it from server get it from local storage , and Delete these files when the application terminates..
How can i do this ?
Thanks
1) Isolated Storage is designed to be used to store data that should remain permanent (until the user uninstalls the app). There's example code of how to write and save a file on MSDN. Therefore, any file you save (temp or not), will be stored until the user uninstalls the app or your app deletes the file.
2) For temporary data, you can use the PhoneApplicationState property. This will automatically delete the files after your app closes. However, there's a size limit (I belive PhoneApplicationService.State has a limit of 4mb).
Alternatively, if the XML file is too big, you can write it to the Isolated Storage. Then, you can handle your page's Closing event and delete the file from Isolated Storage there using the DeleteFile method.

Resources